Seminarium NLP 2010–2011 @ ZIL IPI PAN - Text Segmentation Using Affinity Propagation

AUTOR:     Anna Kazantseva, Stan Szpakowicz
AFILIACJA: University of Ottawa
TYTUŁ:     Text Segmentation Using Affinity Propagation

STRESZCZENIE:

Text segmentation -- as the name aptly suggests -- is the task of
splitting a document into segments, each characterised by a relatively
stable topic. For example, given a transcript of a meeting, one may
want to split it into segments according to the points of the
agenda. A segmenter's output gives a simple picture of the structure
of a document. Text segmentation is therefore a useful intermediate
step in many higher-level language processing tasks, such as text
summarization, question answering, co-reference resolution and so on.

This talk presents a new algorithm for linear test segmentation. It is
an adaptation of a state-of-the-art clustering algorithm, Affinity
Propagation [*]. The algorithm takes as input a (usually sparse)
matrix of pairwise similarities between sentences. It outputs segment
boundaries and also segment centres -- sentences which best capture
the content/topic of a segment.

We tested the algorithm on several demanding benchmark data sets. Even
though it employs a very simple similarity metric, it performs on par
with or outperforms two state-of-the-art segmenters.

[*] Inmar E. Givoni and Brendan J. Frey. "A Binary Variable Model for
Affinity Propagation". Neural Computation 21(6), June 2009, 1589-1600.