AUTOR:     Ruprecht von Waldenfels
AFILIACJA: Uniwersytet w Bernie i IPI PAN
TYTUŁ:     A corpus-driven, usage-based approach to cross-Slavic
           variation using a parallel corpus: aims, tools and first


Differences in the use of similar linguistic variables across a set of
closely related languages such as Slavic are difficult to analyze
because of the multitude of functions and factors that are rarely
categorical or clear cut. In the talk, I present an ongoing research
project I will be continuing at IPI PAN in the coming months that aims
to investigate such cross-Slavic differences and groupings in a
bottom-up fashion using parallel texts.

Using a word aligned, morphologically tagged and lemmatized parallel
corpus of prose in all major Slavic languages (ParaSol), I compare the
use of verbal aspect, middle marking, nominal suffixes and verbal
prefixes in translationally equivalent word forms across
languages. The data are evaluated from a quantitative perspective
using clustering algorithms as well as qualitatively using a web
interface that visualizes the contrasting variables in context. First
results concern generalizations in respect to aspect usage and
reflexive marking, and different role of contact and genetic
influences in patterns of use of nominal and verbal derivational
affixes across Slavic. In the talk, I will also present the tools used
in the research project in some detail, since I plan to partly publish
them for the research community.