Where academic tradition
meets the exciting future

Applying Permutation Tests for Assessing the Statistical Significance of Wrapper Based Feature Selection

Antti Airola, Tapio Pahikkala, Jorma Boberg, Tapio Salakoski, Applying Permutation Tests for Assessing the Statistical Significance of Wrapper Based Feature Selection. In: Sorin Khoshgoftaar Taghi M. Palade Vasile Pedrycz Witold Wani M. Arif Zhu Xinquan (Hill) Draghici (Ed.), Proceedings of the Ninth International Conference on Machine Learning and Applications (ICMLA 2010), 989-994, IEEE Computer Society, 2010.

Abstract:

Feature selection is commonly used in bioinformatics applications, such as gene selection from DNA microarray data. Recently, wrapper methods have been proposed as an improvement over traditionally used filter based feature selection methods. In wrapper methods, the goodness of a feature set is often measured using the cross-validation performance of a machine learning method trained with the features. This can lead to overfitting, meaning that the cross-validation performance on the final selected feature set may be high even in cases when the selected features in fact are not informative. Evaluating the statistical significance of gained results is therefore of major concern.

Non-parametric permutation tests have been previously used as a univariate filter for selecting individual features. In contrast, we propose using such tests to measure the statistical significance of the whole selection process, which is carried out by a wrapper method. We achieve computational efficiency by using a regularized least-squares based wrapper method, which combines a state-of-the-art classifier with matrix calculus based computational shortcuts for greedy forward feature selection. Permutation tests prove to be a practical tool for estimating the significance of gained results, as shown in simulations and experiments on two DNA microarray data sets.

Files:

Full publication in PDF-format

BibTeX entry:

@INPROCEEDINGS{inpAiPaBoSa10a,
  title = {Applying Permutation Tests for Assessing the Statistical Significance of Wrapper Based Feature Selection},
  booktitle = {Proceedings of the Ninth International Conference on Machine Learning and Applications (ICMLA 2010)},
  author = {Airola, Antti and Pahikkala, Tapio and Boberg, Jorma and Salakoski, Tapio},
  editor = {Draghici, Sorin Khoshgoftaar Taghi M. Palade Vasile Pedrycz Witold Wani M. Arif Zhu Xinquan (Hill)},
  publisher = {IEEE Computer Society},
  pages = {989-994},
  year = {2010},
}

Belongs to TUCS Research Unit(s): Turku BioNLP Group

Edit publication