Where academic tradition
meets the exciting future

Building the Essential Resources for Finnish: The Turku Dependency Treebank

Katri Haverinen, Jenna Nyblom, Timo Viljanen, Veronika Laippala, Samuel Kohonen, Anna Missilä, Stina Ojala, Tapio Salakoski, Filip Ginter, Building the Essential Resources for Finnish: The Turku Dependency Treebank. Language Resources and Evaluation , 2013.



In this paper, we present the final version of a publicly available treebank of Finnish, the Turku Dependency Treebank. The treebank contains 204,399 tokens (15,126 sentences) from 10 different text sources and has been manually annotated in a Finnish-specific version of the well-known Stanford Dependency scheme. The morphological
analyses of the treebank have been assigned using a novel machine learning method to disambiguate readings given by an existing tool. As the second main contribution, we present the first open source Finnish dependency parser, trained on the newly introduced treebank. The parser achieves a labeled attachment score of 81%. The treebank data as well as the parsing pipeline are available under an open license at http://bionlp.utu.fi/.


Full publication in PDF-format

BibTeX entry:

  title = {Building the Essential Resources for Finnish: The Turku Dependency Treebank},
  author = {Haverinen, Katri and Nyblom, Jenna and Viljanen, Timo and Laippala, Veronika and Kohonen, Samuel and Missilä, Anna and Ojala, Stina and Salakoski, Tapio and Ginter, Filip},
  journal = {Language Resources and Evaluation},
  publisher = {Springer},
  year = {2013},
  keywords = {treebank, Finnish, parsing, morphology},

Belongs to TUCS Research Unit(s): Turku BioNLP Group

Publication Forum rating of this publication: level 2

Edit publication