10:00 Roland Meyer "Argument Structure from a Diachronic Corpus-based Perspective" Modern Slavic languages differ considerably in the syntactic realisation of their external verbal arguments: E.g., Polish and Czech are typical "pro-drop" languages, while Russian uses an overt (pronominal) subject in many contexts. Polish and Ukrainian allow for a participial passive with accusative object while Russian and Czech do not. Differences like these have a systematic, but also a historical source. With historical and diachronic electronic corpora slowly becoming available, the development of syntactic and morphological phenomena like these can be traced more easily and precisely than has been possible in the philological tradition, given appropriate technology. The talk aims at illustrating this new approach and its specific problems on the basis of the emerging Regensburg Diachronic Corpus of Russian and similar projects in other languages. On a more technical note, it will present the current state of the mentioned corpus (XML annotation, encoding in ACT - Ribarov et al. 2004) and discuss its perspectives for the research to be carried out in the cooperation project. 11:00 Ruprecht von Waldenfels "The Regensburg Parallel Corpus of Slavonic" In my talk, I report on the development of a parallel corpus of (mostly) Slavonic languages at our institute, at the moment including Russian, Ukrainian, Polish, Slovak, Serbian, Croat, English and German belletristic texts. Our aim is to provide an environment for qualitative linguistic research, where texts can be easily added to the corpus, with automatic preprocessing and alignment to all other languages in the corpus. Two aligners have been evaluated: BSA (Moore 2002), and Hunalign (Varga et. al. 2005), and I will report on experiments on how lemmatization of the input influences their alignment quality. 12:00 Christian Wolff "Collocations, Language Change, and Media Analysis" Algorithms and metrics for the automatic analysis of collocations in large digital corpora have been well established and are a typical part of corpus-oriented projects in various languages. What is less clear is what possible applications and interpretations collocational data might provide. In my talk, I will concentrate on current research on collocation analysis for time-sliced corpora and their application for investigating language change and language usage in the media. We are currently working on analysis routines which operate on a series of timestamped corpora, i.e. text corpora of comparable size and content from a definite period of time (days, months, years) for which a complete collocation analysis is available. The comparison of collocation sets taken from different time slices should yield insight in changes in lexical usage, and ultimately also in changes of meaning. While it is still an open question what approach in comparing collocation sets will be the most fruitful, preliminary results are quite promising and shall be presented for further discussion in the workshop. 13:00 LUNCH BREAK 14:00-16:30 Shorter presentations of current NLP work done at ICS PAS: Aleksander Buczyński "Detection of Collocations by Statistical Association Measures" Agnieszka Mykowiecka, Anna Kup¶ć, Małgorzata Marciniak "Rule-Based Medical Content Extraction and Classification" Marcin Woliński "¦wigra: An Implementation of a Large Grammar of Polish" Adam Przepiórkowski "Argument Types for Automatic Valence Extraction" Elżbieta Hajnicz "Augmenting a Valence Dictionary of Polish Verbs with Semantic Information"