10:00
Roland Meyer
"Argument Structure from a Diachronic Corpus-based Perspective"

Modern Slavic languages differ considerably in the syntactic
realisation of their external verbal arguments: E.g., Polish and Czech
are typical "pro-drop" languages, while Russian uses an overt
(pronominal) subject in many contexts. Polish and Ukrainian allow for
a participial passive with accusative object while Russian and Czech
do not. Differences like these have a systematic, but also a
historical source. With historical and diachronic electronic corpora
slowly becoming available, the development of syntactic and
morphological phenomena like these can be traced more easily and
precisely than has been possible in the philological tradition, given
appropriate technology. The talk aims at illustrating this new
approach and its specific problems on the basis of the emerging
Regensburg Diachronic Corpus of Russian and similar projects in other
languages. On a more technical note, it will present the current state
of the mentioned corpus (XML annotation, encoding in ACT - Ribarov et
al. 2004) and discuss its perspectives for the research to be carried
out in the cooperation project.


11:00
Ruprecht von Waldenfels
"The Regensburg Parallel Corpus of Slavonic"

In my talk, I report on the development of a parallel corpus of
(mostly) Slavonic languages at our institute, at the moment including
Russian, Ukrainian, Polish, Slovak, Serbian, Croat, English and German
belletristic texts. Our aim is to provide an environment for
qualitative linguistic research, where texts can be easily added to
the corpus, with automatic preprocessing and alignment to all other
languages in the corpus. Two aligners have been evaluated: BSA (Moore
2002), and Hunalign (Varga et. al. 2005), and I will report on
experiments on how lemmatization of the input influences their
alignment quality.


12:00
Christian Wolff
"Collocations, Language Change, and Media Analysis"

Algorithms and metrics for the automatic analysis of collocations in
large digital corpora have been well established and are a typical
part of corpus-oriented projects in various languages. What is less
clear is what possible applications and interpretations collocational
data might provide. In my talk, I will concentrate on current research
on collocation analysis for time-sliced corpora and their application
for investigating language change and language usage in the media. We
are currently working on analysis routines which operate on a series
of timestamped corpora, i.e. text corpora of comparable size and
content from a definite period of time (days, months, years) for which
a complete collocation analysis is available. The comparison of
collocation sets taken from different time slices should yield insight
in changes in lexical usage, and ultimately also in changes of
meaning.  While it is still an open question what approach in
comparing collocation sets will be the most fruitful, preliminary
results are quite promising and shall be presented for further
discussion in the workshop.


13:00 LUNCH BREAK


14:00-16:30
Shorter presentations of current NLP work done at ICS PAS:

Aleksander Buczyński
"Detection of Collocations by Statistical Association Measures"

Agnieszka Mykowiecka, Anna Kupść, Małgorzata Marciniak
"Rule-Based Medical Content Extraction and Classification"

Marcin Woliński
"Świgra: An Implementation of a Large Grammar of Polish"

Adam Przepiórkowski
"Argument Types for Automatic Valence Extraction"

Elżbieta Hajnicz
"Augmenting a Valence Dictionary of Polish Verbs with Semantic
Information"