Where academic tradition
meets the exciting future

Turku BioNLP Group

The Turku BioNLP Group is a group of researchers at the Department of Information technology at the University of Turku as well as the Turku Centre for Computer Science (TUCS) graduate school. The main focus of our research are various aspects of Natural Language Processing, ranging from corpus annotation to machine learning theory and applications. The main application area we've been focusing on is the domain of biological, biomedical, and clinical text.

Research Unit Web Page: http://bionlp.utu.fi/

Leader of the unit

Tapio Salakoski


Jorma Boberg Filip Ginter Tapio Pahikkala Antti Airola Veronika Laippala

Doctoral Students

Jari Björne Katri Haverinen Juho Heimonen Timo Viljanen



We have created the BioInfer corpus to support the development of IE systems in the biomedical domain. The project has its own webpage where you can find the corpus as well as the software relevant to it.

PPI Corpora

We have created and released a conversion software for five well-known protein-protein interaction corpora (AIMed, BioInfer, LLL, IEPA, and HPRD50) into a shared XML-based format. This project has its own webpage where you can find the software as well as a pre-processed release of BioInfer.


The aim of IKITIK is to support producing and using health information and communication by developing innovative, intelligent, state-of-the-art clinical information and language technology solutions. They are based on end-user needs and will be carefully tested using both statistical techniques and genuine end-user feedback. To assure their quality, international applicability, practical relevance and interoperability with existing electronic patient information systems, solutions are developed in interdisciplinary and international collaboration of care providers, clinical documentation and decision-making experts, as well as information and communication technology developers and providers. Outcomes contribute to clarity, understandability and accessibility of patient narratives. This has positive impacts on patient safety, care quality, and efficiency and profitability of health care services. Further, improved patient narratives emphasize customer orientation and individualized care. (Webpage)


RLScore is a Regularized Least-Squares (RLS) based machine learning package. It contains implementations of the RLS and RankRLS learners allowing the optimization of performance measures for the tasks of regression, ranking and classification. Implementations of efficient cross-validation algorithms are integrated to the package, combined together with functionality for fast parallel learning of multiple outputs. (Webpage)

Turku Dependency Treebank

We are building a broad-coverage dependency-annotated treebank of general Finnish. The treebank is annotated in a minor revision of the Stanford dependency scheme (de Marneffe et al. [1,2]). The primary purpose of the treebank is to support Finnish NLP.

Turku Clinical Corpus

We have developed a dependency-annotated treebank of Finnish Intensive Care Nursing Narratives. The treebank is annotated in a minor revision of the Stanford dependency scheme (de Marneffe et al. [1,2]). A PropBank-style predicate argument annotation is built on top of the syntactic annotation, covering 90% of all verb occurrences in the corpus. The argument annotation is tightly bound to the syntax, requiring arguments to be governed by the verb.

Biological Event Extraction

This project concerns the extraction from text of biomolecular events, which are recursively nested, typed associations of arbitrarily many participants (genes / gene products) in specific roles


Click here to see the full list of publications from the TUCS Publication Database