AUTOR:     Jakub Waszczuk, Katarzyna Głowińska, Agata Savary,
           Adam Przepiórkowski
AFILIACJA: Narodowy Korpus Języka Polskiego
TYTUŁ:     Tools and Methodologies for Annotating Syntax and Named
           Entities in the National Corpus of Polish

Referat zostanie wygłoszony w języku angielskim.

STRESZCZENIE:

The on-going project aiming at the creation of the National Corpus of
Polish assumes several levels of linguistic annotation. We present the
technical environment and methodological background developed for the
two upper annotation levels: the level of syntactic words and groups
and the level of named entities. We show how knowledge-based platforms
Spejd and Sprout are used for the automatic pre-annotation of the
corpus, and we discuss some particular problems faced during the
elaboration of the syntactic grammar, which contains over 800 rules
and is probably the first chunking grammar for Polish of this
scale. We also show how the tree editor TrEd has been customized for
manual post-editing of annotations, and for further revision of
discrepancies. Our XML format converters and customized archiving
repository ensure the automatic data flow and efficient corpus file
management. We believe that this environment or substantial parts of
it can be reused in or adapted to other corpus annotation tasks.