AUTOR: Nathalie Friburger
AFILIACJA: Université François-Rabelais de Tours
TYTUŁ: Finite-state cascade system CASSYS and its application to
rule-based proper name extraction
STRESZCZENIE:
Natural Language Processing uses statistical and/or rule based
techniques to cut texts in syntactic groups (chunking), to extract
information or for other purposes. Rules are a good way to describe
linguistic phenomena: Finite State Transducer (FST) formalism is
particularly efficient to write rules describing patterns in texts. An
FST is a special automaton in which the tags in input can be
associates to output; those outputs can replace the pattern recognized
or can be merge with it. The purpose of this presentation is to
present our Finite State Cascade System: CasSys, whose the very first
version was developed in 2002 to extract named entities in written
texts. In recent years, Cassys has evolved to be sufficiently general
to manipulate texts (even xml texts) in order to do chunking or to
transform texts with the use of transducers. CasSys permits to pass
FSTs on texts in a special way, called "cascade of transducers",
improving the possibilities offered by an FST alone. First of all, we
present principles of a cascade of transducers and the types of
manipulations we can do on texts. Secondly the options offered by
CasSys and the results produced and finally, the example of named
entities extraction.