Where academic tradition
meets the exciting future

Document Classification Using Semantic Networks with An Adaptive Similarity Measure

Filip Ginter, Sampo Pyysalo, Tapio Salakoski, Document Classification Using Semantic Networks with An Adaptive Similarity Measure. In: Galia Angelova, Kalina Bontcheva, Ruslan Mitkov, Nicolas Nicolov, Nikolai Nikolov (Eds.), Proceedings of the International Conference on Recent Advances in Natural Language Processing RANLP 05, Borovets, Bulgaria, 204-211, Incoma, Bulgaria, 2005.

Abstract:

We consider supervised document classification where a semantic network is used to augment document features with their hypernyms. A novel document representation is introduced in which the contribution of the hypernyms to document similarity is determined by semantic network edge weights. We argue that the optimal edge weights are not a static property of the semantic network, but should rather be adapted to the given classification task. To determine the optimal weights, we introduce an efficient gradient descent method driven by the misclassifications of the k-nearest neighbor (kNN) classifier. The method iteratively adjusts the weights, increasing or decreasing the similarity of documents depending on their classes.

We thoroughly evaluate the method using ten randomly chosen datasets and seven training set sizes on the problem of classifying PubMed documents indexed with the MeSH biomedical ontology. Using the kNN classifier, the method is shown to statistically significantly outperform the commonly used bag-of-words representation as well as the more advanced hypernym density representation (Scott & Matwin 98).

BibTeX entry:

@INPROCEEDINGS{inpGiPySa05a,
  title = {Document Classification Using Semantic Networks with An Adaptive Similarity Measure},
  booktitle = {Proceedings of the International Conference on Recent Advances in Natural Language Processing RANLP 05, Borovets, Bulgaria},
  author = {Ginter, Filip and Pyysalo, Sampo and Salakoski, Tapio},
  editor = {Angelova, Galia and Bontcheva, Kalina and Mitkov, Ruslan and Nicolov, Nicolas and Nikolov, Nikolai},
  publisher = {Incoma, Bulgaria},
  pages = {204-211},
  year = {2005},
}

Belongs to TUCS Research Unit(s): Turku BioNLP Group

Edit publication