Forschungsdatenbank

Projektübersicht
Login
Suche

Hess

Fakultäten » Philosophische Fakultät » Computerlinguistik, Institut für » Prof. Dr. Michael Hess » Hess

Completed research project

Title / Titel Resolving PP-Attachment Ambiguities in German by using co-occurrence values
PDF Abstract (PDF, 14 KB)
Summary / Zusammenfassung Any computer system for natural language processing has to struggle with the problem of ambiguities. If the system is meant to extract precise information from a text, these ambiguities must be resolved. One of the most frequent ambiguities arises from the attachment of prepositional phrases (PPs).

In this project we develop a method for the resolution of PP-attachment ambiguities. The idea is to derive cooccurrence values from text corpora. We measure on the one hand the cooccurrence strength between German nouns (N) and prepositions (P) and on the other hand between German verbs (V) and prepositions. The competing values of N+P versus V+P are used to decide whether to attach a prepositional phrase (PP) to the noun or to the verb. A variable word order language like German poses special problems for determining the cooccurrence strength between verb and preposition since the verb may occur at different positions in a sentence. We tackle these problems with the help of a lemmatizer, a part-of-speech tagger, and a clause boundary detector.

This method for determining the cooccurrence strength will be gradually refined by distinguishing different verb readings, idiomatic and non-idiomatic usage as well as deverbal versus regular nouns. In order to evaluate the method we use the German treebank (a collection of syntactically annotated sentences) developed at the University of Saarbrücken and we will manually disambiguate 3000 corpus sentences ourselves. We will also vary the corpora in order to determine their influence on the cooccurrence values. The cooccurrence values will be integrated in a working parser for German.
Weitere Informationen
Publications / Publikationen Martin Volk: Exploiting the WWW as a corpus to resolve PP attachment ambiguities. In: Proc. of Corpus Linguistics 2001. Lancaster: 2001.

Martin Volk: Scaling up. Using the WWW to resolve PP attachment ambiguities. In: Proc. of Konvens-2000. Ilmenau. October 2000.

Weitere Informationen

Keywords / Suchbegriffe Robust parsing, Ambiguity resolution, Natural Language Processing
Project leadership and contacts /
Projektleitung und Kontakte
Prof. Michael Hess (Project Leader) hess@cl.uzh.ch
Dr. Martin Volk volk@ifi.uzh.ch
Funding source(s) /
Unterstützt durch
SNF (Personen- und Projektförderung)
 
Duration of Project / Projektdauer Oct 1998 to Oct 2000