# Automatic Extraction of Polish Verb Subcategorization: An Evaluation of Common Statistics

## Jakub Fast and Adam Przepiórkowski

In the Proceedings of the 2nd Language & Technology Conference, pp. 191-195, Poznañ, Poland 2005.

### Abstract:

This article compares and evaluates common statistics used in the process of filtering the hypotheses within the task of automatic valence extraction. A broader range of statistics is compared than the ones usually found in the literature, including Binomial Miscue Probability, Likelihood Ratio, t Test, and various simpler statistics. All experiments are performed on the basis of morphosyntactically annotated but very noisy Polish data. Despite a different experimental methodology, the results confirm Korhonen's findings that statistics based solely on the number of occurrences of a given verb and the number of cooccurrences of the verb and a given frame in general fare much better than statistics comparing such conditional frame frequency with the unconditional frame frequency.

Electronically available format:

BibTeX entry:

@InProceedings{fas:prz:05,
author =       "Jakub Fast and Adam Przepiórkowski",
title =        "Automatic Extraction of {P}olish Verb
Subcategorization: {A}n Evaluation of Common
Statistics",
booktitle =    "Proceedings of the \emph{2nd Language \& Technology
Conference}",
pages =        "191--195",
year =         2005,
editor =       "Zygmunt Vetulani",