Where academic tradition
meets the exciting future

Knowledge-Lean Text Mining

Samuel Rönnqvist, Knowledge-Lean Text Mining. TUCS Dissertations 227. Åbo Akademi University, 2017.

Abstract:

This thesis explores the process of introducing text mining to new areas of application, which involves both defining appropriate types of analysis and often designing appropriate computational methods to support the analysis. Targeted toward a particular use, text mining resources tend to become highly specialized and require considerable efforts in development. The thesis addresses the question of what computational methods can serve practical text analysis needs, while avoiding costly and narrow development of linguistic resources.

Relying on machine learning and visualization, this knowledge-lean approach assumes minimal encoding of prior knowledge into resources, which is essential in entering uncharted text mining territory, that is, areas too new or too marginal to be well served by traditional text mining approaches. Knowledge-lean text mining is explored within the domain of systemic financial risk, where few text mining efforts have previously been pursued.

Without the support of existing linguistic resources for the task, unsupervised and data-driven methods play a key role in providing flexible means for text analysis. The central theme of representation learning is studied also in the context of fully knowledge-free, domain-independent topic modeling and linguistically resource-lean discourse structure parsing for the refinement of text mining results.

The research has been able to establish the value of knowledge-lean text mining, by exploring the use of text as a source of information for systemic risk analytics. Furthermore, the work on discourse parsing has shown that competitive - and in some cases state-of-the-art - performance can be achieved without relying on explicit encoding of linguistic knowledge.

BibTeX entry:

@PHDTHESIS{phdRonnqvist_Samuel17a,
  title = {Knowledge-Lean Text Mining},
  author = {Rönnqvist, Samuel},
  number = {227},
  series = {TUCS Dissertations},
  school = {Åbo Akademi University},
  year = {2017},
  keywords = {text mining, natural langage processing, machine learning, deep learning, visualization, artificial intelligence, human-computer cooperation, financial risk},
  ISBN = {978-952-12-3621-1},
  ISSN = {1239-1883},
}

Belongs to TUCS Research Unit(s): Data Mining and Knowledge Management Laboratory

Edit publication