AUTOR: Yaakov HaCohen-Kerner AFILIACJA: Jerusalem College of Technology TYTUŁ: Research in Text Classification PLAN REFERATU: Introduction • motivation • definition of text classification (TC) • history relevant research domains • content classification & stylistic classification • kinds of text classification • the main stages and components of automatic text classification Preparation of suitable corpora Features possibly relevant for classification tasks • bag of words (BOW) • stopwords • n-grams • more sophisticated feature sets • a hierarchy of feature sets • feature selection for TC • main categories of feature selection methods • text classification of imbalanced data sets Supervised Machine Learning Methods and their Application to TC • popular ML methods • their advantages and disadvantages • comparison between popular ML methods • data mining environments that enable application of ML methods to TC Popular Evaluation measures Cross-validation What can we do if we have bad or not good results? How to "generate" more than one paper for one task Future TC (open questions in TC)