Automatic Extraction of Polish Verb Subcategorization: An Evaluation of Common Statistics

Jakub Fast and Adam Przepiórkowski

In the Proceedings of the 2nd Language & Technology Conference, pp. 191-195, Poznań, Poland 2005.

Abstract:

This article compares and evaluates common statistics used in the process of filtering the hypotheses within the task of automatic valence extraction. A broader range of statistics is compared than the ones usually found in the literature, including Binomial Miscue Probability, Likelihood Ratio, t Test, and various simpler statistics. All experiments are performed on the basis of morphosyntactically annotated but very noisy Polish data. Despite a different experimental methodology, the results confirm Korhonen's findings that statistics based solely on the number of occurrences of a given verb and the number of cooccurrences of the verb and a given frame in general fare much better than statistics comparing such conditional frame frequency with the unconditional frame frequency.

Electronically available format:

PDF (145 KB)

BibTeX entry:

@InProceedings{fas:prz:05,
  author =       "Jakub Fast and Adam Przepiórkowski",
  title =        "Automatic Extraction of {P}olish Verb
                  Subcategorization: {A}n Evaluation of Common
                  Statistics",
  booktitle =    "Proceedings of the \emph{2nd Language \& Technology
                  Conference}",
  pages =        "191--195",
  year =         2005,
  editor =       "Zygmunt Vetulani",
  address =      "Poznań, Poland"}

Creation Date: Monday, May 2, 2005
Last Modified: Tue Jun 7 22:24:13 CEST 2005