  Onur Güngör, 2009    

Thesis Title

Morphological Annotation of a Corpus with a Collaborative Multiplayer Game


In most of the natural language processing tasks, state of the art systems usually
rely on machine learning methods for building their mathematical models. Given that
the ma jority of these systems employ supervised learning strategies, a corpus that is
annotated for the problem area is essential. The current method for annotating a
corpus is to hire several experts and make them annotate the corpus manually or -
in its best practice- by using a helper software. However, this method is costly and
time-consuming if not error free. Our work proposes a method that aims to solve these
problems at once. By employing a multiplayer collaborative game that is playable by
ordinary people on the Internet, we think that it is possible to direct the covert labour
force so that people can contribute just by playing a fun game. Through a game site
which incorporates some functionality inherited from social networking sites, people
are motivated to contribute to the annotation process by answering some questions
about the underlying morphological features of a target word. The results reported in
the thesis are compiled from the first eleven days of the experiment which is planned
to continue until an indeterminate date. It is reported that the 63.5% of the actual
question types are successful based on two phases. The current 74 question types cover
58.3% of the corpus completely while increasing this number to only 100 types increases
the coverage rate to 70.7%. Due to the time constraints and the relatively low traffic to
the site, we were not able to annotate the corpus completely, but we can nevertheless
estimate a hypothetical rate of successful morphological disambiguation as 37.0% of
the whole corpus which is calculated to be completed in two and a half months if the
game were to be hosted on a ma jor web site. This is indeed a relatively short duration
for a bootstrapping of this size when compared with the current methods.

