Anaphora in a wider context: Tracking discourse referents

Christopher Kennedy and Branimir Boguraev

A number of linguistic and stylistic devices are employed in text-based discourse for the purposes of introducing, defining, refining, and re-introducing discourse entities. This paper looks at one of the most pervasive of these mechanisms, anaphora, and addresses the question of how current computational approaches to anaphora scale up to building, and maintaining, a richer model of text structure, which embodies the notion of a discourse referent's behaviour in the entire text. Given the less than fully robust status of syntactic parsers to date, we question the applicability of current anaphora resolution algorithms to open-ended text types, styles, and genres. We outline an algorithm for anaphora resolution, which modifieds and extends a configurationally-based approach, while working from the output of a part of speech tagger, enriched only with annotations of grammatical function. Without compromising output quality, the algorithm compensates for the shallower level of analysis with mechanisms for identifying different text forms for each discourse referent, and for maintaining awareness of inter-sentential context. A salience measure--for each discourse referent, over the entire text--not only crucially drives the algorithm, but also effectively maintains a record of where and how discourse referents occur in the text. Anaphora resolution thus becomes an integral part of a deeper discourse analysis process, ultimately concerned with tracking discourse referents.