Where academic tradition
meets the exciting future

Contextual Weighting for Support Vector Machines in Literature Mining: An Application to Gene Versus Protein Name Disambiguation

Tapio Pahikkala, Filip Ginter, Jorma Boberg, Jouni Järvinen, Tapio Salakoski, Contextual Weighting for Support Vector Machines in Literature Mining: An Application to Gene Versus Protein Name Disambiguation. BMC Bioinformatics 6(157), 2005.

Abstract:

<p><font class="subBHead" size="4">Background</font></p>
<p class="xfull" xmlns:m="http://www.w3.org/1998/Math/MathML">The ability to distinguish between genes and proteins is essential for understanding biological text. Support Vector Machines (SVMs) have been proven to be very efficient in general data mining tasks. We explore their capability for the gene versus protein name disambiguation task.</p>

<p><font class="subBHead" size="4">Results</font></p>
<p class="xfull" xmlns:m="http://www.w3.org/1998/Math/MathML">We incorporated into the conventional SVM a weighting scheme based on distances of context words from the word to be disambiguated. This weighting scheme increased the performance of SVMs by five percentage points giving performance better than 85% as measured by the area under ROC curve and outperformed the Weighted Additive Classifier, which also incorporates the weighting, and the Naive Bayes classifier.</p>

<p><font class="subBHead" size="4">Conclusions</font></p>
<p class="xfull" xmlns:m="http://www.w3.org/1998/Math/MathML">We show that the performance of SVMs can be improved by the proposed weighting scheme. Furthermore, our results suggest that in this study the increase of the classification performance due to the weighting is greater than that obtained by selecting the underlying classifier or the kernel part of the SVM.</p>

BibTeX entry:

@ARTICLE{jPaGiBoJaSa05a,
  title = {Contextual Weighting for Support Vector Machines in Literature Mining: An Application to Gene Versus Protein Name Disambiguation},
  author = {Pahikkala, Tapio and Ginter, Filip and Boberg, Jorma and Järvinen, Jouni and Salakoski, Tapio},
  journal = {BMC Bioinformatics},
  volume = {6},
  number = {157},
  year = {2005},
}

Belongs to TUCS Research Unit(s): Turku BioNLP Group

Publication Forum rating of this publication: level 2

Edit publication