Where academic tradition
meets the exciting future

A Dependency-Based Analysis of Treebank Annotation Errors

Katri Haverinen, Filip Ginter, Veronika Laippala, Samuel Kohonen, Timo Viljanen, Jenna Nyblom, Tapio Salakoski, A Dependency-Based Analysis of Treebank Annotation Errors. In: Kim Gerdes, Eva Hajicova, Leo Wanner (Eds.), Proceedings of International Conference on Dependency Linguistics, 115-124, N/A, 2011.


In this paper, we investigate errors in syntax annotation with the Turku Dependency
Treebank, a recently published treebank of
Finnish, as study material. This treebank
uses the Stanford Dependency scheme as
its syntax representation, and its published
data contains all data created in the full
double annotation as well as timing information, both of which are necessary for
this study.
First, we examine which syntactic structures are the most error-prone for human
annotators, and compare these results to
those of a baseline automatic parser. We
find that annotation decisions involving
highly semantic distinctions, as well as
certain morphological ambiguities, are especially difficult for both human annotators and the parser. Second, we train an
automatic system that offers for inspection sentences ordered by their likelihood
of containing errors. We find that the system achieves a performance that is clearly
superior to the random baseline: for instance, by inspecting 10% of all sentences
ordered by our system, it is possible to
weed out 25% of errors.

BibTeX entry:

  title = {A Dependency-Based Analysis of Treebank Annotation Errors},
  booktitle = {Proceedings of International Conference on Dependency Linguistics},
  author = {Haverinen, Katri and Ginter, Filip and Laippala, Veronika and Kohonen, Samuel and Viljanen, Timo and Nyblom, Jenna and Salakoski, Tapio},
  editor = {Gerdes, Kim and Hajicova, Eva and Wanner, Leo},
  publisher = {N/A},
  pages = {115-124},
  year = {2011},
  keywords = {treebank annotation},

Belongs to TUCS Research Unit(s): Turku BioNLP Group

Edit publication