Where academic tradition
meets the exciting future

Fast and Parallelized Greedy Forward Selection of Genetic Variants in Genome-Wide Association Studies

Sebastian Okser, Tapio Pahikkala, Antti Airola, Tero Aittokallio, Tapio Salakoski, Fast and Parallelized Greedy Forward Selection of Genetic Variants in Genome-Wide Association Studies. In: Yidong Chen, Yufei Huang, Edward Dougherty (Eds.), IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS'11), 214-217, IEEE Signal Processing Society, 2011.

Abstract:

We present the application of a regularized least-squares based algorithm, known as greedy RLS, to perform a wrapper-based feature selection on an entire genome-wide association dataset. Wrapper methods were previously thought to be computationally infeasible on these types of studies. The running time of the method grows linearly in the number of training examples, the number of features in the original data set, and the number of selected features. Moreover, we show how it can be further accelerated using parallel computation on multi-core processors. We test the method on the Wellcome Trust Case Control Consortium's (WTCCC) Type 2 Diabetes - UK National Blood Service dataset consisting of 3,382 subjects and 404,569 single nucleotide polymorphisms (SNPs). Our method is capable of high-speed feature selection, selecting the top 100 predictive SNPs in under five minutes on a high end desktop and outperforms typical filter approaches in terms of predictive performance.

BibTeX entry:

@INPROCEEDINGS{inpOkPaAiSa11a,
  title = {Fast and Parallelized Greedy Forward Selection of Genetic Variants in Genome-Wide Association Studies},
  booktitle = {IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS'11)},
  author = {Okser, Sebastian and Pahikkala, Tapio and Airola, Antti and Aittokallio, Tero and Salakoski, Tapio},
  editor = {Chen, Yidong and Huang, Yufei and Dougherty, Edward},
  publisher = {IEEE Signal Processing Society},
  pages = {214-217},
  year = {2011},
  keywords = {Machine Learning, Regularized Least-Squares, Genome-Wide Association Study, GWAS, SNP, Feature Selection},
}

Belongs to TUCS Research Unit(s): Algorithmics and Computational Intelligence Group (ACI), Turku BioNLP Group

Edit publication