In the proceedings of Intelligent Information Systems 2005 (New Trends in Intelligent Information Processing and Web Mining), pp. 511-520.
Initial experiments in learning valence (subcategorisation) frames of Polish verbs from a morphosyntactically annotated corpus are reported here. The learning algorithm consists of a linguistic module, responsible for very simple shallow parsing of the input text (nominal and prepositional phrase recognition) and for the identification of valence frame cues (hypotheses), and a statistical module which implements three well-known inferential statistics (likelihood ratio, t test, binomial miscue probability test). The results of the three statistics are evaluated and compared with a baseline approach of selecting frames on the basis of the relative frequencies of frame/verb co-occurrences. The results, while clearly reflecting the many deficiencies of the linguistic analysis and the inadequacy of the statistical measures employed here for a free word order language rich in ellipsis and morphosyntactic syncretisms, are nevertheless promising.
Electronically available format:
BibTeX entry:
@InCollection{prz:fas:05a, author = "Adam Przepiórkowski and Jakub Fast", title = "Baseline Experiments in the Extraction of {P}olish Valence Frames", crossref = "klo:etal:05:ed", pages = "511--520"} @string{sv = "Springer-Verlag"} @Book{klo:etal:05:ed, editor = "Mieczysław A. Kłopotek and Sławomir T. Wierzchoń and Krzysztof Trojanowski", title = "Intelligent Information Processing and Web Mining", booktitle = "Intelligent {I}nformation {P}rocessing and {W}eb {M}ining", publisher = sv, year = 2005, series = "Advances in Soft Computing", address = "Berlin"}