# Automatic Extraction of Polish Verb Subcategorization: An Evaluation of Common Statistics

## Jakub Fast and Adam Przepiórkowski

In the Proceedings of the 2nd Language & Technology Conference, pp. 191-195, Poznañ, Poland 2005.

### Abstract:

This article compares and evaluates common statistics used in the process of filtering the hypotheses within the task of automatic valence extraction. A broader range of statistics is compared than the ones usually found in the literature, including Binomial Miscue Probability, Likelihood Ratio, t Test, and various simpler statistics. All experiments are performed on the basis of morphosyntactically annotated but very noisy Polish data. Despite a different experimental methodology, the results confirm Korhonen's findings that statistics based solely on the number of occurrences of a given verb and the number of cooccurrences of the verb and a given frame in general fare much better than statistics comparing such conditional frame frequency with the unconditional frame frequency.

