Skip to content
2000
Volume 20, Issue 1
  • ISSN: 0929-8665
  • E-ISSN: 1875-5305

Abstract

Obtaining soluble proteins in sufficient concentrations helps increase the overall success rate in various experimental studies. Protein solubility is an individual trait ultimately determined by its primary protein sequence. Exploring the interconnection between the protein solubility and the compositions of protein sequence is instrumental for setting priorities on targets in large scale proteomics projects. In this paper, amino acid composition (20 dimensions) and the dipeptide composition (400 dimensions) were extracted to form the total candidate feature pool (420 dimensions), and each feature was selected into the feature vectors one by one, which were sorted by the absolute value of the correlation coefficient. Finally, we evaluated and recorded the 420 results of Support Vector Machine (SVM) as the prediction engine. According to the results of SVM, the first 208 features were chosen from the 420 dimensions, which were considered as the efficient ones. By analyzing the composition of the former 208 features, we found that the protein solubility was significantly influenced by the occurrence frequencies of the acidic amino acids, basic amino acids, non-polar hydrophobic amino acids and the two polar neutral amino acids(C, Q) in the protein sequences. Additionally, we detected that the dipeptides composed by the acidic amino acids (D, E) and basic amino acids (K, R and H), especially the dipeptide composed by the acidic amino acids (D, E), had strong interconnection with the protein solubility.

Loading

Article metrics loading...

/content/journals/ppl/10.2174/092986613804096801
2013-01-01
2025-12-09
Loading full text...

Full text loading...

/content/journals/ppl/10.2174/092986613804096801
Loading
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test