You are here: TUCS > PUBLICATIONS > Publication Search > Filtering Bad Quality Tandem M...
Filtering Bad Quality Tandem Mass Spectra Prior to Protein Identification
Jussi Salmi, Robert Moulder, Jan-Jonas Filén, Olli S. Nevalainen, Riitta Lahesmaa, Tuula A. Nyman, Tero Aittokallio, Filtering Bad Quality Tandem Mass Spectra Prior to Protein Identification. In: GPBM-2006, 2006.
Abstract:
The identification of proteins is an important task in proteomics. Protein or peptide identification from mass spectrometry data is usually accomplished by using database search programs which match the experimental spectral sequences with theoretical sequences. As the search results frequently contain false matches, they need to be validated manually. Many spectra do not contain peptide sequences and could be discarded before the time-consuming search process. We present a filtering method which addresses these problems by classifying the spectra into two classes: (i) the spectra that are unlikely to produce valid matches and (ii) the presumably valid spectra. The filter is based on 9 spectral features, which measure different characteristics of a spectrum. The discriminability of these features is investigated in conjunction with a machine learning algorithm using a training set of instances and tested on real-life data sets. The results show that when removing about half of the spectra, the number of protein identifications dropped 0-20% depending on the material, but the amount of time spent on the identification process was sharply reduced. This article is a continuation of our previous article on filtering mass spectrometry data.
BibTeX entry:
@INPROCEEDINGS{inpSaMoFiNeLaNyAi06a,
title = {Filtering Bad Quality Tandem Mass Spectra Prior to Protein Identification},
booktitle = {GPBM-2006},
author = {Salmi, Jussi and Moulder, Robert and Filén, Jan-Jonas and Nevalainen, Olli S. and Lahesmaa, Riitta and Nyman, Tuula A. and Aittokallio, Tero},
year = {2006},
keywords = {tandem mass spectrometry, filter, peptide identification, bioinformatics},
}
Belongs to TUCS Research Unit(s): Algorithmics and Computational Intelligence Group (ACI), Biomathematics Research Unit (BIOMATH)