Where academic tradition
meets the exciting future

A Dependency-Based Analysis of Treebank Annotation Errors

Katri Haverinen, Filip Ginter, Veronika Laippala, Samuel Kohonen, Timo Viljanen, Jenna Nyblom, Tapio Salakoski, A Dependency-Based Analysis of Treebank Annotation Errors. In: Kim Gerdes, Eva Hajičová, Leo Wanner (Eds.), Computational Dependency Theory, Frontiers in Artificial Intelligence and Applications 258, 47–61, IOS Press, 2013.

http://dx.doi.org/10.3233/978-1-61499-352-0-47

Abstract:

In this paper, we investigate errors in syntax annotation with the Turku Dependency Treebank, a recently published treebank of Finnish, as study material. This treebank uses the Stanford Dependency scheme as its syntax representation, and its published data contains all data created in the full double annotation as well as timing information, both of which are necessary for this study.

First, we examine which syntactic structures are the most error-prone for human annotators, and compare these results to those of two baseline parsers. We find that annotation decisions involving highly semantic distinctions, as well as certain morphological ambiguities, are especially difficult for both human annotators and the parsers. Second, we train an automatic system that offers for inspection sentences ordered by their likelihood of containing errors. We find that the system achieves a performance that is clearly superior to the random baseline: for instance, by inspecting 10% of all sentences ordered by our system, it is possible to weed out 25% of errors.

BibTeX entry:

@INBOOK{cHaGiLaKoViNySa13a,
  title = {A Dependency-Based Analysis of Treebank Annotation Errors},
  booktitle = {Computational Dependency Theory},
  author = {Haverinen, Katri and Ginter, Filip and Laippala, Veronika and Kohonen, Samuel and Viljanen, Timo and Nyblom, Jenna and Salakoski, Tapio},
  volume = {258},
  series = {Frontiers in Artificial Intelligence and Applications},
  editor = {Gerdes, Kim and Hajičová, Eva and Wanner, Leo},
  publisher = {IOS Press},
  pages = {47–61},
  year = {2013},
}

Belongs to TUCS Research Unit(s): Turku BioNLP Group

Publication Forum rating of this publication: level 1

Edit publication