AUTOR:     Yaakov HaCohen-Kerner
AFILIACJA: Jerusalem College of Technology
TYTUŁ:     Research in Text Classification


• motivation
• definition of text classification (TC)
• history relevant research domains
• content classification & stylistic classification
• kinds of text classification
• the main stages and components of automatic text classification

Preparation of suitable corpora

Features possibly relevant for classification tasks
• bag of words (BOW)
• stopwords
• n-grams
• more sophisticated feature sets
• a hierarchy of feature sets
• feature selection for TC
• main categories of feature selection methods
• text classification of imbalanced data sets‏

Supervised Machine Learning Methods and their Application to TC
• popular ML methods
• their advantages and disadvantages
• comparison between popular ML methods
• data mining environments that enable application of ML methods to TC

Popular Evaluation measures


What can we do if we have bad or not good results?

How to "generate" more than one paper for one task

Future TC (open questions in TC)