Partial Parsing 2008

Call for papers

MOTIVATION

Despite the clear efficiency improvements during the last years, full deep parsing is still error-prone and not sufficiently robust for many NLP applications such as information retrieval, machine translation, or question answering. For this reason, partial parsing has become a standard means of integrating syntactic knowledge into such applications.

However, partial parsing is not a single concept but rather an area ranging from chunking to almost full parsing. Chunking is usually defined as marking tokens (words) as being at the beginning of a chunk (B), inside a chunk (I) or outside a chunk (O), according to the well-known IOB scheme. In this approach, chunking boils down to word-level tagging, and the usual part-of-speech tagger development and evaluation methodologies apply. On the other hand, full, or deep, parsing is based either on manually constructed grammars or on grammars learned --- together with rule probabilities --- from treebanks, with held-out treebank portions used for evaluation.

In comparison with chunking and deep parsing, partial parsing consists in finding structure that is richer than chunks but less exhaustive than full syntactico-semantic parses: partial parsing may involve constructing nested structures (unlike simple chunking) without creating the full parse of a sentence. Hard-to-tackle phenomena are either not handled (e.g., instead of computing all possible interpretations, only underspecified structures are computed) or they are dealt with in a very pragmatic way (e.g., by making use of heuristics).

While partial parsing has been approached from both deep parsing and tagging perspectives, neither is fully applicable: on the one hand, partial parsers are mostly developed for languages which have no treebank that could provide a training resource and an evaluation testbed, and on the other hand, the ad-hoc construction of partially parsed corpora for evaluation is often much more expensive than the manual annotation with simple chunks. Hence, there is a clear need for evaluation metrics specific to partial parsing.

Partial parsing also provides an important contribution to Computational Linguistics: Full, statistical parsing mostly concentrates on English. When these parsers are used for languages with a richer morphology and a freer word order, such as German or Czech, experience shows that performance suffers, and the statistical models must be adapted to the language in question. Partial parsing, in contrast, has been shown to work well for a variety of languages, including languages with non-trivial word order and nominal morphology such as Slavic, Baltic, Finno-Ugric and most Germanic languages.

SCOPE

The main areas of interest of the workshop include (but are not restricted to):

linguistic richness of partial parsers for various applications: syntactic and semantic headedness, the degree of hierarchical structure, semantic information (anaphora, disambiguation);
development methodologies for partial parsers: manual, machine learning, hybrid;
the usability of language resources for the development of partial parsers;
multi-lingual development of partial parsers, etc.;
experience and utilization of existing tools for building partial parsers for new languages;
technical aspects of partial parsers:
- robustness, scalability;
- time and space complexity;
- expressiveness of partial parsing formalisms (regular vs. context-free rules; unification; type hierarchies; etc.);
applications of partial parsers: information extraction, question answering, machine translation, web text mining, acquisition of lexical information, etc.;
evaluation methodologies for partial parsers: gold standards, application-specific, reusability of evaluation resources for different partial parsing tasks, etc.;
ways of combining multiple partial parsers;
comparsion (classification) of partial parsers.

SUBMISSIONS

Authors are invited to submit original research papers. Papers should indicate the state of completion of the reported results. In particular, any overlap with previously published work should be clearly mentioned. Submissions will be judged on correctness, novelty, technical strength, clarity of presentation, and significance/relevance to the workshop.

Submissions should be no longer than 8 pages and they should follow the detailed guidelines at the LREC 2008 web page: http://www.lrec-conf.org/lrec2008/-Submissions-.html. They should not be anonymous.

Sumbmissions should be uploaded using the workshop submission page.

The publication of selected papers in a special issue of a journal is planned.

IMPORTANT DATES

Extended submission deadline: 9 March 2008
Notification of acceptance: 26 March 2008
Camera-ready version due: 7 April 2008
Workshop: 1 June 2008

ORGANISERS

Sandra Kübler (Indiana University)
Jakub Piskorski (Joint Research Center)
Adam Przepiórkowski (Institute of Computer Science, Polish Academy of Sciences)

PROGRAMME COMMITTEE

Salah Aït-Mokhtar (Xerox Research Centre Europe, Grenoble)
Gosse Bouma (Rijksuniversiteit Groningen)
António Branco (University of Lisbon)
Erhard Hinrichs (University of Tübingen)
Hannah Kermes (University of Stuttgart)
Sandra Kübler (Indiana University)
Vladislav Kuboň (Charles University, Prague)
Petya Osenova (Bulgarian Academy of Sciences and Sofia University)
Jakub Piskorski (Joint Research Center)
Adam Przepiórkowski (Institute of Computer Science, Polish Academy of Sciences)
Ulrich Schäfer (DFKI GmbH, Saarbrücken)
Wojciech Skut (Google Inc., Mountain View)
Anssi Yli Jyrä (CSC -- Scientific Computing Ltd., Espoo)

CONTACT

PaPa2008 _at_ bach.ipipan.waw.pl