You are here: TUCS > RESEARCH > Research Units > Turku BioNLP Group
Turku BioNLP Group
The Turku BioNLP Group is a group of researchers at the Department of Information technology at the University of Turku as well as the Turku Centre for Computer Science (TUCS) graduate school. The main focus of our research are various aspects of Natural Language Processing, ranging from corpus annotation to machine learning theory and applications. The main application area we've been focusing on is the domain of biological, biomedical, and clinical text.
Research Unit Web Page: http://bionlp.utu.fi/
Leader of the unit
Tapio SalakoskiResearchers
Jorma Boberg Filip Ginter Tapio Pahikkala Antti Airola Veronika LaippalaDoctoral Students
Jari Björne Katri Haverinen Juho Heimonen Timo ViljanenProjects
BioInfer
We have created the BioInfer corpus to support the development of IE systems in the biomedical domain. The project has its own webpage where you can find the corpus as well as the software relevant to it.
PPI Corpora
We have created and released a conversion software for five well-known protein-protein interaction corpora (AIMed, BioInfer, LLL, IEPA, and HPRD50) into a shared XML-based format. This project has its own webpage where you can find the software as well as a pre-processed release of BioInfer.
Ikitik
The aim of IKITIK is to support producing and using health information and communication by developing innovative, intelligent, state-of-the-art clinical information and language technology solutions. They are based on end-user needs and will be carefully tested using both statistical techniques and genuine end-user feedback. To assure their quality, international applicability, practical relevance and interoperability with existing electronic patient information systems, solutions are developed in interdisciplinary and international collaboration of care providers, clinical documentation and decision-making experts, as well as information and communication technology developers and providers. Outcomes contribute to clarity, understandability and accessibility of patient narratives. This has positive impacts on patient safety, care quality, and efficiency and profitability of health care services. Further, improved patient narratives emphasize customer orientation and individualized care. (Webpage)
RLScore
RLScore is a Regularized Least-Squares (RLS) based machine learning package. It contains implementations of the RLS and RankRLS learners allowing the optimization of performance measures for the tasks of regression, ranking and classification. Implementations of efficient cross-validation algorithms are integrated to the package, combined together with functionality for fast parallel learning of multiple outputs. (Webpage)
Turku Dependency Treebank
We are building a broad-coverage dependency-annotated treebank of general Finnish. The treebank is annotated in a minor revision of the Stanford dependency scheme (de Marneffe et al. [1,2]). The primary purpose of the treebank is to support Finnish NLP.
Turku Clinical Corpus
We have developed a dependency-annotated treebank of Finnish Intensive Care Nursing Narratives. The treebank is annotated in a minor revision of the Stanford dependency scheme (de Marneffe et al. [1,2]). A PropBank-style predicate argument annotation is built on top of the syntactic annotation, covering 90% of all verb occurrences in the corpus. The argument annotation is tightly bound to the syntax, requiring arguments to be governed by the verb.
Biological Event Extraction
This project concerns the extraction from text of biomolecular events, which are recursively nested, typed associations of arbitrarily many participants (genes / gene products) in specific roles