Information Extraction for Polish Using the SProUT Platform

Jakub Piskorski, Peter Homola, Małgorzata Marciniak, Agnieszka Mykowiecka, Adam Przepiórkowski and Marcin Woliński

In the proceedings of Intelligent Information Systems 2004 (New Trends in Intelligent Information Processing and Web Mining).


The aim of this article is to present the initial results of adapting SProUT, a multi-lingual Natural Language Processing platform developed at DFKI, Germany, to the processing of Polish. The article describes some of the problems posed by the integration of Morfeusz, an external morphological analyzer for Polish, and various solutions to the problem of the lack of extensive gazetteers for Polish. The main sections of the article report on some initial experiments in applying this adapted system to the Information Extraction task of identifying various classes of Named Entities in financial and medical texts, perhaps the first such Information Extraction effort for Polish.

Electronically available formats:

BibTeX entry:

@string{sv =     "Springer-Verlag"}
  author =       "Jakub Piskorski and Peter Homola and Małorzata
                  Marciniak and Agnieszka Mykowiecka and Adam
                  Przepiórkowski and Marcin Woliński",
  title =        "Information Extraction for {P}olish Using the
                  {SProUT} Platform",
  crossref =     "klo:etal:04:ed",
  pages =        "227--236"}
  editor =       "Mieczysław A. Kłopotek and Sławomir T. Wierzchoń and
                  Krzysztof Trojanowski",
  title =        "Intelligent Information Processing and Web Mining",
  booktitle =    "Intelligent {I}nformation {P}rocessing and {W}eb {M}ining",
  publisher =    sv,
  year =         2004,
  series =       "Advances in Soft Computing",
  address =      "Berlin"}

Valid XHTML 1.0! Valid CSS!

Creation Date: Tuesday, December 23, 2003
Last Modified: Tue Jun 7 22:23:56 CEST 2005