In the proceedings of Intelligent Information Systems 2005 (New Trends in Intelligent Information Processing and Web Mining), pp. 511-520.
Initial experiments in learning valence (subcategorisation) frames of Polish verbs from a morphosyntactically annotated corpus are reported here. The learning algorithm consists of a linguistic module, responsible for very simple shallow parsing of the input text (nominal and prepositional phrase recognition) and for the identification of valence frame cues (hypotheses), and a statistical module which implements three well-known inferential statistics (likelihood ratio, t test, binomial miscue probability test). The results of the three statistics are evaluated and compared with a baseline approach of selecting frames on the basis of the relative frequencies of frame/verb co-occurrences. The results, while clearly reflecting the many deficiencies of the linguistic analysis and the inadequacy of the statistical measures employed here for a free word order language rich in ellipsis and morphosyntactic syncretisms, are nevertheless promising.
Electronically available format:
BibTeX entry:
@InCollection{prz:fas:05a,
author = "Adam Przepiórkowski and Jakub Fast",
title = "Baseline Experiments in the Extraction of {P}olish
Valence Frames",
crossref = "klo:etal:05:ed",
pages = "511--520"}
@string{sv = "Springer-Verlag"}
@Book{klo:etal:05:ed,
editor = "Mieczysław A. Kłopotek and Sławomir T. Wierzchoń and
Krzysztof Trojanowski",
title = "Intelligent Information Processing and Web Mining",
booktitle = "Intelligent {I}nformation {P}rocessing and {W}eb {M}ining",
publisher = sv,
year = 2005,
series = "Advances in Soft Computing",
address = "Berlin"}