Where academic tradition
meets the exciting future

Suwisa Kaewphan

TUCS Department for the doctoral studies: University of Turku, Department of Information Technology

Admitted to TUCS GP on 1.9.2012

Graduate project title:

Text-Mining Resource with Biological Application: Challenges and Implementation

Graduate project abstract:

In Biology, most of discoveries and findings are communicated by means of scientific publications which are written in the form of natural language. The researchers use literature in many steps throughout their works in order to extract relevant information. However, the increasing speed at which new literature is published makes it difficult for the researchers to comprehensively extract information from all of the relevant works.

The field of text-mining in Biomedical domain aims to tackle this fundamental problem by providing an automatic approach to extract the biologically relevant information from text. The main interest of the text-mining community has focused on identifying the biological entities such as gene and protein and indicating biological events such as binding and regulation. Recognizing biological entities in the text is composed of two steps; firstly identifying the biological entities scattered in the text (name entity recognition, NER) and secondly mapping the textual statements of entities to the authoritative identifiers (name entity normalization, NEN). On one hand, the available NER systems which recognize biological entities mentioned in text demonstrated state-of-the-art performance allowing further application such as extracting biological events. On the other hand, the performance of NEN systems is much lower when compared with the NER systems, due to the known issues of am
biguity in the gene/protein names.

Identifying the biological events stating in the text has gained much interest starting from identifying the protein-protein interactions. The development has expanded to capture more types of biological events such as phosphorylation and complex regulations. The steps in extracting the biological event include recognizing the biological entities and identifying the typed relationship between them. The extracted biological events are presented in a form that is composed of biological entities, type of relationship and semantic roles of the entities.

Both directions of text-mining in Biomedical domain have been recently united as a complex system demonstrating the readiness of text-mining resource in performing required tasks from biological research community. Though the performance of each component in the system was measured in isolation for a given task, the performance of the integrated systems has not been previously measured. The research questions thus stemmed from the usability of this integrated system in the biological application. In particular, the research is focused on a particular text-mining resource called EVEX and evaluating its suitability in answering questions in biology.

Supervisors:

Filip Ginter (University of Turku, Department of Information Technology)

Tapio Salakoski (University of Turku, Department of Information Technology)

Latest publications:

Click here to see the full list of publications from the TUCS Publication Database

Update your graduate project title and abstract.