In the Proceedings of the 2nd Language & Technology Conference, pp. 191-195, Poznań, Poland 2005.
This article compares and evaluates common statistics used in the process of filtering the hypotheses within the task of automatic valence extraction. A broader range of statistics is compared than the ones usually found in the literature, including Binomial Miscue Probability, Likelihood Ratio, t Test, and various simpler statistics. All experiments are performed on the basis of morphosyntactically annotated but very noisy Polish data. Despite a different experimental methodology, the results confirm Korhonen's findings that statistics based solely on the number of occurrences of a given verb and the number of cooccurrences of the verb and a given frame in general fare much better than statistics comparing such conditional frame frequency with the unconditional frame frequency.
Electronically available format:
BibTeX entry:
@InProceedings{fas:prz:05,
author = "Jakub Fast and Adam Przepiórkowski",
title = "Automatic Extraction of {P}olish Verb
Subcategorization: {A}n Evaluation of Common
Statistics",
booktitle = "Proceedings of the \emph{2nd Language \& Technology
Conference}",
pages = "191--195",
year = 2005,
editor = "Zygmunt Vetulani",
address = "Poznań, Poland"}