Skip to content
2000
Volume 14, Issue 9
  • ISSN: 1570-1786
  • E-ISSN: 1875-6255

Abstract

Background: The flexibility of protein structures is often related to the function of the protein. Feature selection (FS) is very critical to the application of a lot of machine learning which deals with small sampling and high-dimensional data. For the prediction of the flexible regions by the protein sequences, it is important to build a machine learning methodology which is based on an effective feature selection technology. This may also provide new knowledge to understand the protein folding process. Method: Firstly, the frequencies of the k-spaced amino acid pairs are taken as a representation of the local sequences. Secondly, these representations are processed by feature selection based on incremental of diversity (FSID) to reduce the dimensionality. Finally, the logistic regression approach is applied to integrate the selected features into a scheme to discriminate flexible or rigid (referred to as FSID_FRP). Results: 74 features are selected from the set of 66 sequences, which includes 26 flexible patterns and 48 rigid patterns. Most of the flexible patterns are associated with Glycine or Proline, and the rigid patterns are associated with Leucine or Valine. We obtained 79.41% accuracy and 0.51 MCC using the FSID_FRP method in which we applied logistic regression and used the representation of the 74 features. The results of FSID_FRP method are comparable to that of FlexRP method that includes 95 features. Conclusion: A simple feature selection method FSID is shown to be very efficient in the prediction of the flexible/rigid regions of protein sequences. This method is more appropriate for small-sampling classification than the entropy-based feature selection method. The proposed FSID_FRP method achieved 80% prediction accuracy and stronger generalization ability.

Loading

Article metrics loading...

/content/journals/loc/10.2174/1570178614666170221145333
2017-11-01
2025-09-11
Loading full text...

Full text loading...

/content/journals/loc/10.2174/1570178614666170221145333
Loading
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test