Baseline Experiments in the Extraction of Polish Valence Frames

Adam Przepiórkowski and Jakub Fast

In the proceedings of Intelligent Information Systems 2005 (New Trends in Intelligent Information Processing and Web Mining), pp. 511-520.


Abstract:

Initial experiments in learning valence (subcategorisation) frames of Polish verbs from a morphosyntactically annotated corpus are reported here. The learning algorithm consists of a linguistic module, responsible for very simple shallow parsing of the input text (nominal and prepositional phrase recognition) and for the identification of valence frame cues (hypotheses), and a statistical module which implements three well-known inferential statistics (likelihood ratio, t test, binomial miscue probability test). The results of the three statistics are evaluated and compared with a baseline approach of selecting frames on the basis of the relative frequencies of frame/verb co-occurrences. The results, while clearly reflecting the many deficiencies of the linguistic analysis and the inadequacy of the statistical measures employed here for a free word order language rich in ellipsis and morphosyntactic syncretisms, are nevertheless promising.


Electronically available format:


BibTeX entry:

@InCollection{prz:fas:05a,
  author =       "Adam Przepiórkowski and Jakub Fast",
  title =        "Baseline Experiments in the Extraction of {P}olish
                  Valence Frames",
  crossref =     "klo:etal:05:ed",
  pages =        "511--520"}
@string{sv =     "Springer-Verlag"}
@Book{klo:etal:05:ed,
  editor =       "Mieczysław A. Kłopotek and Sławomir T. Wierzchoń and
                  Krzysztof Trojanowski",
  title =        "Intelligent Information Processing and Web Mining",
  booktitle =    "Intelligent {I}nformation {P}rocessing and {W}eb {M}ining",
  publisher =    sv,
  year =         2005,
  series =       "Advances in Soft Computing",
  address =      "Berlin"}

Valid XHTML 1.0! Valid CSS!

Creation Date: Monday, May 2, 2005
Last Modified: Tue Jun 7 22:24:02 CEST 2005
AP