AUTOR:     Yaakov HaCohen-Kerner
AFILIACJA: Jerusalem College of Technology
TYTUŁ:     Research in Text Classification

PLAN REFERATU:

Introduction
• motivation
• definition of text classification (TC)
• history relevant research domains
• content classification & stylistic classification
• kinds of text classification
• the main stages and components of automatic text classification

Preparation of suitable corpora

Features possibly relevant for classification tasks
• bag of words (BOW)
• stopwords
• n-grams
• more sophisticated feature sets
• a hierarchy of feature sets
• feature selection for TC
• main categories of feature selection methods
• text classification of imbalanced data sets‏

Supervised Machine Learning Methods and their Application to TC
• popular ML methods
• their advantages and disadvantages
• comparison between popular ML methods
• data mining environments that enable application of ML methods to TC

Popular Evaluation measures

Cross-validation

What can we do if we have bad or not good results?

How to "generate" more than one paper for one task

Future TC (open questions in TC)