Authors
Rabie Saidi, Mondher Maddouri, Engelbert Mephu Nguifo
Publication date
2010/12
Journal
BMC bioinformatics
Volume
11
Pages
1-13
Publisher
BioMed Central
Description
Background
This paper deals with the preprocessing of protein sequences for supervised classification. Motif extraction is one way to address that task. It has been largely used to encode biological sequences into feature vectors to enable using well-known machine-learning classifiers which require this format. However, designing a suitable feature space, for a set of proteins, is not a trivial task. For this purpose, we propose a novel encoding method that uses amino-acid substitution matrices to define similarity between motifs during the extraction step.
Results
In order to demonstrate the efficiency of such approach, we compare several encoding methods using some machine learning classifiers. The experimental results showed that our encoding method outperforms other ones in terms of classification accuracy and number of generated attributes. We …
Total citations
20112012201320142015201620172018201920202021202220232024202545710469751156331