  Özlem Çetinoğlu, 2001  [download thesis]    

Thesis Title

A Prolog Based Natural Language Processing Infrastructure for Turkish


Natural Language Processing (NLP) is a subfield of artificial intelligence whose ultimate aim is to enable computers to use natural languages with performance levels comparable to those of native humans. The work reported in this thesis is the design and implementation of a software infrastructure that can be of use in the future testing of different linguistic approaches and the development of new NLP applications involving the understanding of Turkish.

Our software platform processes natural language at the three basic levels of NLP: morphology, syntax and semantics. The morphological level deals with the construction of Turkish words from roots and suffixes, and both analyzes and generates the words with a bi-directional morphological parser. The syntactic part of the thesis defines Turkish sentence structures in terms of phrase structure (PS) rules. The semantic representations of the syntactic constituents are defined as arguments of the Prolog predicates that implement the PS rules. Therefore, the overall semantic representation of a sentence is derived when the sentence is syntactically parsed. It is also possible to generate the sentence from its semantic representation. The semantic representation is based on first order predicate calculus. The semantics of the sentences are represented as simple or nested logic-like expressions. These expressions are transformed into Prolog facts and rules before they are asserted to the knowledge base.

The main application we implemented is TOY, which is a man-machine communication program. The user can ask questions and give new information to TOY by using Turkish sentences. TOY can answer the queries by using the knowledge it has already learned. Other applications are a morphological analyzer, a Turkish verb conjugation program, and a number transducer.
