In the Proceedings of the 2nd Language & Technology Conference, pp. 191-195, Poznań, Poland 2005.
This article compares and evaluates common statistics used in the process of filtering the hypotheses within the task of automatic valence extraction. A broader range of statistics is compared than the ones usually found in the literature, including Binomial Miscue Probability, Likelihood Ratio, t Test, and various simpler statistics. All experiments are performed on the basis of morphosyntactically annotated but very noisy Polish data. Despite a different experimental methodology, the results confirm Korhonen's findings that statistics based solely on the number of occurrences of a given verb and the number of cooccurrences of the verb and a given frame in general fare much better than statistics comparing such conditional frame frequency with the unconditional frame frequency.
Electronically available format:
BibTeX entry:
@InProceedings{fas:prz:05, author = "Jakub Fast and Adam Przepiórkowski", title = "Automatic Extraction of {P}olish Verb Subcategorization: {A}n Evaluation of Common Statistics", booktitle = "Proceedings of the \emph{2nd Language \& Technology Conference}", pages = "191--195", year = 2005, editor = "Zygmunt Vetulani", address = "Poznań, Poland"}