AUTOR: Benoît Sagot AFILIACJA: INRIA Rocquencourt / Université Paris 7 TYTU£: Parsing French: resources, formalisms, parsers STRESZCZENIE: This talk will present three series of works that share a common goal: develop large-coverage parsers that are both efficient and linguistically relevant, without the help of hand-crafted resources. Indeed, I will focus on French, for which there are currently no such existing resourcess like the Penn TreeBank (both in terms of size and richness), and for which syntactic lexicons are still an active area of research. Firstly, I will present and illustrate the general idea that statistic analysis of automatically generated data (raw corpora, tagged corpora, parsing results from fully symbolic parsers) can be the basis for the development of rich and large-coverage resources (lexicons, grammars,...). To illustrate this point I will briefly describe three of the techniques I developed to apply this idea: a method to acquire automatically morphological lexicon, a simple pattern-matching approach to induce specific syntactic properties and an error mining technique in parsers output. Secondly, I will present SxLFG, an efficient and robust parser generator for LFG developed in collaboration with Pierre Boullier. I will illustrate this by presenting our parsing system for French that relies on SxLFG, on a large-coverage LFG grammar for French we developed, and on the pre-syntactic processing chain SxPipe. In particular, I will show that the efficiency of SxLFG allows the parsing of large corpora without any probabilistic information, thus creating annotated corpora on which probabilistic models can be bootstrapped (for parsers, taggers,...). I will give an insight into ongoing work about learning and using such bootstrapped models into SxLFG, as well as cascading modules that work on the constituency forest (chunk-based filter, probabilistic disambiguation on the forest, f-structures computation, with possible backtrack in case of failure) . Thirdly, I will briefly describe some limitations of standard two- stage formalisms such as LFG, TAG or others ("linear" syntactic backbone + unification-based decorations), and I will propose another possible approach, namely "non-linear" formalisms ("non-linear" meaning here "closed by intersection"). I will introduce Range Concatenation Grammars (RCGs), a non-linear formalism parsable in polynomial time, and explain why it is suitable to model natural languages. In particular, I will shortly describe the medium-coverage grammar I developed in a (still polynomial) syntactic extension of RCGs, and how standard analyses (constituency, dependency, topological boxes, predicate-argument semantics) can be obtained as partial projections of the full analysis. I will conclude with possible ways to merge the efficiency, robustness and large coverage of our SxLFG-based parser with the non-linearity and linguistic relevance of RCGs.