Program warsztatów ratyzbońsko-warszawskich 10 grudnia 2007
(Pałac Staszica, ul. Nowy Świat 72, sala 304)
09:30-10:10
Sandra Birzer "Drawing the line between adverbs and prepositions"
10:10-10:50
Natalia Kotsyba, Olga Shypnivska, Magdalena Turska "Principles of
organizing a common morphological tagset and a search engine for
PolUKR (Polish-Ukrainian Parallel Corpus)"
10:50-11:10: BREAK
11:10-11:50
Roland Meyer "Annotation projection in (diachronic) parallel corpora"
11:50-12:30
Ruprecht von Waldenfels "The grammaticalization of Polish 'dać się'
from a diachronic and comparative point of view"
12:30-13:30 LUNCH BREAK
13:30-14:10
Adam Przepiórkowski "A Shallow Grammar of Polish for Automatic Valence
Acquisition"
14:10-14:50
Łukasz Dębowski "New methods for verb valence extraction from Polish
texts"
14:50-15:10 BREAK
15:10-15:50
Elżbieta Hajnicz "Towards extending syntactic valence dictionary of
Polish verbs with semantic categories"
15:50-16:30
Markus Heckner, Susanne Mühlbacher, Christian Wolff "User Keywords in
Social Software for Bibliography Management. An Analysis of Tagging
Practices based on a Linguistic Model of Tag Usage"
Streszczenia
AUTOR: Sandra Birzer
AFILIACJA: Universität Regensburg
TYTUŁ: Drawing the line between adverbs and prepositions
STRESZCZENIE:
In Russian exist quite some lexemes that may be used both as adverb or
as preposition. Therefore arises the question where to draw the line
between adverbs and prepositions. I offer some syntactic tests that
allow to define this line a little bit clearer.
AUTOR: Łukasz Dębowski
AFILIACJA: IPI PAN
TYTUŁ: New methods for verb valence extraction from Polish texts
STRESZCZENIE:
We will presents a new method for extracting verb valence information
from raw texts. The method has been designed for a language for which
no verified treebank exists but a large phrase grammar and several
valence dictionaries had been compiled by linguists. The extraction
proceeds in two steps. Firstly, a deterministic grammar parser and a
new instance of expectation-maximization algorithm have been used to
obtain an imperfect reduced treebank from unannotated texts (with
trees reduced to valence frames). In the following, partially
supervised learning has been applied to receive verbal
subcategorization frames from the treebank. The obtained dictionary
features higher precision thanks to adjusting co-occurrence matrices
for verb arguments, which is a novel idea.
AUTOR: Elżbieta Hajnicz
AFILIACJA: IPI PAN
TYTUŁ: Towards extending syntactic valence dictionary of Polish
verbs with semantic categories
STRESZCZENIE:
In this presentation I would like to present an introductory phase of
extending a syntactic valence dictionary of Polish verbs with semantic
categories. I start with a description of data I have at my disposal.
Next I list all the phases needed to obtain such an extension. Finally
I focus on discussing the first step of the process: adding semantic
categories to phrases' (NPs' and PPs') semantic heads and
disambiguating them.
AUTOR: Natalia Kotsyba, Olga Shypnivska, Magdalena Turska
AFILIACJA: IS PAN (i in.)
TYTUŁ: Principles of organizing a common morphological tagset and
a search engine for PolUKR (Polish-Ukrainian Parallel
Corpus)
STRESZCZENIE:
We are going to discuss different tagsets for Slavic languages used in
existing corpora (Russian National Corpus, Czech National Corpus,
Korpus IPI PAN) or described in the literature (for Ukrainian), their
advantages and disadvantages, and possibilities of creating a common
tagset for Polish and Ukrainian, which could be later extended to
other Slavic languages.
We will also compare our approach towards creating a common tagset
with the principles used in the multilingual ACQUIS corpus, as well as
present a project of a user-friendly query engine for the parallel
corpus.
AUTOR: Roland Meyer
AFILIACJA: Universität Regensburg
TYTUŁ: Annotation projection in (diachronic) parallel corpora
STRESZCZENIE:
The task of automatically tagging historical texts is an instance of a
more general problem, namely, how to annotate languages for which few
electronic resources (lexica, morphological analyzers, taggers) are
available. The talk explores the possibility of using annotated
translations of those texts in languages for which resources exist, in
order to "project" the information back into the historical source. It
focuses on the details of the technical steps involved, i.e. word
alignment and projection rules, and discusses their general
implications for tagging in parallel corpora.
AUTOR: Adam Przepiórkowski
AFILIACJA: IPI PAN
TYTUŁ: A Shallow Grammar of Polish for Automatic Valence
Acquisition
STRESZCZENIE:
In this talk I will present the current state of the shallow grammar
of Polish developed within a project of syntactic valence acquisition.
The new formalism for shallow parsing will be introduced through
examples, and preliminary and partial quantitative coverage results of
the current grammar will be adduced.
AUTOR: Ruprecht von Waldenfels
AFILIACJA: Universität Regensburg
TYTUŁ: The grammaticalization of Polish 'dać się' from a
diachronic and comparative point of view
STRESZCZENIE:
In my talk, I present the grammaticalization of Polish 'dać się' as a
modal in sentences such as 'nie da się tego powiedzieć' both from a
diachronic and a comparative point of view. The source construction
involving OS dati used as a permissive formant with the meaning `to
let' can be traced to Old Church Slavonic; an extension to inanimates
subjects in reflexive construction is found in the earliest Polish
texts. A new construction involving object marking of the patient (see
above) becomes frequent only around the turn of the 19th to the 20th
century. This development can be captured as a case of
grammaticalization with gradual loss of valency, semantic specificity
and verbal categories such as person, number and aspect. The
development of the Czech cognate item 'dat se' is more advanced in
respect to these properties and not connected, as in Polish, with
accusative marking of the patient, which is also possible but much
more peripheral.
AUTOR: Markus Heckner, Susanne Mühlbacher, Christian Wolff
AFILIACJA: Universität Regensburg
TYTUŁ: User Keywords in Social Software for Bibliography
Management. An Analysis of Tagging Practices based on a
Linguistic Model of Tag Usage
STRESZCZENIE:
Tagging platforms like Flickr, Delicious or YouTube have become an
import means of information organization. While these systems
primarily address leisure-oriented activities like presenting photos
or describing film and video material, recently social software
platform for the scientific work process have emerged as well. In
these systems users share their bibliographies of scientific articles
and tag them with keywords of their own choice.
We report results from an empirical study of keyword usage in Connotea
(http://www.connotea.org). Starting from a model of morpho-syntactic
as well as pragmatic aspects of keyword usage, 500 documents tagged in
Connotea were matched against our tag category model. The results give
clear indications that user tags are different from author or expert
keywords. We believe that user tagging offers an additional dimension
of subject indexing that may complement traditional means like author
keywords, expert keywords and automatic (fulltext) indexing.
In the talk, our method, the tag category model as well as the main
empirical results from our study will be presented.