Current Bioinformatics - Volume 12, Issue 3, 2017
Volume 12, Issue 3, 2017
-
-
Novel Elements of Bacterial Genomes - Promoter Islands: Intraspecies Polymorphism and Sequence Stability
Background: Being discovered in bacterial genomes as regions with a high density of promoter–like sequences, promoter islands attracted attention due to an unusual combination of their functional features, specific structure and genomic location. Interacting with RNA polymerase, they form transcriptionally competent complexes, which, however, are mainly blocked at the stage of abortive synthesis of short oligonucleotides. Objective: Arrested transcription from promoter islands has an important biological significance, as most of them are associated with horizontally acquired genes, which in the course of adaptive evolution were subjected to specific silencing. Despite the strong evidences substantiating the genomic response to the integration of alien genes, we still do not have a clue on the molecular mechanisms underlying this adaptation. Results: Here we described those features of promoter islands, which had made them transcriptionally inactive, compared their intraspecies polymorphism with other promoters and non-promoter genomic regions, and evaluated the frequency and character of spontaneous mutations in their nucleotide sequences detected in bacterial populations during long-term growth. This analysis revealed several new a priori unexpected features of the islands. The most extraordinary of them was the observation that the frequency and the nature of spontaneous mutations in their sequences depended on the heterogeneity of the bacterial populations, and the number of point mutations in a more diverse community formed during long common growth was lower than in the culture derived from a single cell. Conclusion: Homeostasis of the islands may depend on intercellular communications within bacterial population.
-
-
-
A Review of Gene Selection Tools in Classifying Cancer Microarray Data
Background: The measurement of expression levels of many genes through a single experiment is now possible due to the development of DNA microarray technology. However, many computational methods are having difficulties in selecting a small subset of genes because there are a few samples compared to the huge number of genes, irrelevant genes and noisy genes. Objective: This paper presents a review of existing tools for gene selection divided into four different categories. Method: In addition, most studies focus on selecting a small subset without analysing the genes’ functional and biological characteristics. Many researchers are continuously seeking solutions to this problem. Microarray data analysis has been successfully applied to gene selection algorithms in a different development environment. Results: Many different tools have been generated for gene selection in classifying microarray data. Conclusion: A suitable and user-friendly tool for users and biomedical researchers should be developed to avoid selection biases and allow analysis of multiple solutions.
-
-
-
Co-Clustering Analysis of Protein Secondary Structures
Authors: Lichun Ma, Debby D. Wang, Xinyu Liu, Bin Zou and Hong YanBackground: The protein secondary structure provides a crucial link between a protein sequence and its final 3D structure. Thus, accurate prediction of protein secondary structure becomes very important. Objective: In this study, we try to obtain a subset of highly regular features of the protein secondary structures. Then these features can be used in the prediction of other chains’ secondary structures. Method: The experiment data was obtained from the Dictionary of Protein Secondary Structure (DSSP), in which eight types of secondary structures are defined. We carried out statistical analysis of the amino acids for each type of secondary structure and then concentrated our attention on α helix and β-strand, the two most common regular secondary structures. The features of amino acids, neighbors, and hydrogen bonds (α-helix) were extracted. Then a co-clustering based method was conducted to analyze α-helix and β-strand chain-feature matrices, respectively. Results and Conclusion: By using the features obtained from the co-clustering process, we are able to predict other chains’ structures. The prediction performs well for β-strands and long α-helices but poorly for short α-helices. Then, we further represented the features of each short α helix by a vector. Afterwards, the prediction was made by comparing the testing vector and the training vectors in coclusters. Results show that the testing accuracy for short α-helices can reach 96% when using amino acid features as a vector. Therefore, the secondary structure of a protein sequence can be predicted with a high accuracy by using the co-clustering based method.
-
-
-
Fast and Practical Algorithms for Searching the Gapped Palindromes
Authors: Shivika Gupta, Rajesh Prasad and Sunita YadavBackground: The remarkable gapped palindrome structures can have profound effects on chromosomes and are responsible for neurological diseases in humans. Gapped palindromes refer to the palindromes that have a space (set of characters) between the left and right palindromic arms of the string. Gapped palindromes are divided into two classes: long armed and length constrained. Objective: In practical applications such as DNA sequence analysis, it is desired to cope with the performance of gapped palindromes. Method: This paper presents efficient algorithms of O(n) for solving both types of gapped palindrome problem in biological sequences using enhanced suffix array. Results: Experimental results show that our algorithms are space efficient, faster and easy to implement. We have also provided an open source standalone application called fapa-gp for searching different classes of gapped palindromes in genome sequences. It includes source codes of the proposed algorithm, standalone application and other supplementary materials. Conclusion: The presented algorithms ensure finding long armed and length constrained versions of gapped palindromes in the biological DNA sequence, verifying all the conditions. Our algorithms analyzed short DNA sequences easily.
-
-
-
Deep Convolutional Neural Networks for Predicting Hydroxyproline in Proteins
Authors: HaiXia Long, Mi Wang and HaiYan FuBackground: Protein hydroxyproline is one type of post translational modification (PTM). Because protein sequence contains many uncharacterized residues of P, the question that needs to be answered is: Which ones can be hydroxylated, and which ones cannot? The solution will not only give a deeper understanding of the hydroxylation mechanism but can also lead to drug development. The evergrowing demand for better handling of protein sequences in the post-genomic age presents new prediction challenges. Objective: To address these challenges, developing computational methods to identify these sites quickly and accurately is our objective. Method: We propose a new approach for predicting hydroxyproline using the deep learning model known as the convolutional neural network (CNN), and employed a pseudo amino acid composition (PseAAC) to identify these proteins and used the position-specific scoring matrix (PSSM) to represent samples as input to the CNN model. Results and Conclusion: In our experiment, K-fold cross-validation testing on benchmark datasets further demonstrated the potential for CNN identification of protein hydroxyproline as well as other PTM type proteins.
-
-
-
Integrated Application of Enhanced Replacement Method and Ensemble Learning for the Prediction of BCRP/ABCG2 Substrates
Background: Breast Cancer Resistance Protein (BCRP or ABCG2) is a polyspecific effluxtransporter which belongs to the ATP-binding Cassette superfamily. Up-regulation of BCRP is associated to multi-drug resistance in a number of conditions, e.g. cancer and epilepsy. Recent proteomic studies show that high expression levels of BCRP are found in healthy human intestine and at the blood-brain barrier, limiting the absorption and brain distribution of its substrates. Therefore, the early recognition of BCRP substrates seems to be crucial in the early phase of drug discovery. Objective: The development of computational models that allow the early detection of BCRP substrates and non-substrates. Method: We have jointly applied the Enhanced Replacement Method and ensemble learning approaches to obtain combinations of 2D linear classifiers capable of discriminating among substrates and nonsubstrates of the wild type human BCRP. Results: The ensemble learning approach combining the 10-Enhanced Replacement Method best individual models obtained through MAX Operator displayed the best ability to discriminate between BCRP substrates and non-substrates across all the validation sets/libraries used. Conclusion: The best model ensemble obtained outperforms previously reported 2D linear classifiers, showing the ability of the Enhanced Replacement Method and ensemble learning schemes to optimize the performance of individual models. This is the first application of the Enhanced Replacement Method to solve classification problems.
-
-
-
Using the Residue Interaction Network Improve the Classification of Thermophilic and Mesophilic Proteins
Authors: Xiaomei Gao and Yanrui DingBackground: The residue interaction network contains a large amount of protein three dimensional spatial information determined by sequence characteristics. It is an effective way to study protein thermosability from network perspective. Objective: We use residue interaction network information to improve the performance of machine learning methods trained to discriminate thermostable proteins from mesophilic proteins. Method: We compared Support Vector Machines (SVM), BayesNet, Artificial Neural Network (ANN) and Logistic Regression (LR) and selected the best machine learning method to identify thermostable proteins from mesophilic ones. Results: After combining the residue network topology parameters (the average connection strength, average degree, characteristic path length, clustering coefficient, weighted clustering coefficient, closeness centrality, residue centrality) with sequence characteristics as feature vectors, we found the SVM-based method gave better performance, and the average discrimination accuracy of five-fold cross validation of SVM increased to 87.5% compared with the result using sequence characteristics as feature vectors. 89.71% of mesophilic proteins were classified correctly, and 85.29% of thermophilic proteins were classified correctly. Conclusion: We found the characteristic path length and closeness centrality greatly improved the discrimination rate of thermophilic proteins. The main reason is thermophilic proteins have more rigid structure, highly stable and strong interaction between residues, which causes them to have shorter characteristic path length and closeness centrality. Residue network characteristics offer an innovation and reliable method for identifying and analyzing the factors related to the protein thermostability.
-
-
-
The mAP-KL Algorithm Combined with Mutual Information Network Used to Screen Hub Genes in Osteosarcoma
Authors: Yan-Sheng Wu, Jian-Jun Liu and Chang-Jun ZhengObjective: To identify potential biomarkers of osteosarcoma (OS) to further elaborate the molecular mechanisms underlying OS through mAP-KL algorithm and mutual information network. Methods: E-GEOD-33382 and E-GEOD-28974 were downloaded from EMBL-EBI database and then were merged. Afterwards, microarray data of 84 OS samples and 15 controls were obtained. Next, affinity propagation clustering (APCluster) package was utilized to perform the cluster analysis to identify a list of the most representative genes in each cluster, named as exemplars. Support vector machine (SVM) with linear kernel was employed to assess the classification performance of mAP-KL method. Finally, identification of hub genes was implemented based on mutual information network. Results: Based on the pre-defined genes numbers (gene counts ≤ 50), 10 clusters were identified among the top 200 genes, and 10 cluster genes were screened as exemplars. Particularly, O-Fucosylpeptide 3- Beta-N-Acetylglucosaminyltransferase (MFNG, degree = 154), hepatitis A virus cellular receptor 2(HAVCR2, degree = 138), and lymphocyte cytosolic protein 2 (SH2 domain containing leukocyte protein of 76kDa, LCP2, degree = 133) exhibited higher degrees of connectivity in mutual information network of the top 200 genes. During the 5-CV evaluation, the classification results were ideal to distinguish all samples correctly. The mAP-KL method achieved the highest AUC score of 1.00, MCC score of 1.00, specificity of 1.00, and sensitivity of 1.00. Conclusion: Several pivotal genes identified in our study, such as MFNG, HAVCR2, and LCP2 might be potential biomarkers for predicting OS development and therapeutic targets for OS patients.
-
-
-
Understanding the Evolutionary Relationship of M2 Channel Protein of Influenza A Virus and its Structural Variation and Drug Resistance
Authors: Krishnasamy Gopinath and Muthusamy KarthikeyanBackground: M2 channel protein of influenza A virus is one of the specific targets for the anti-influenza drugs amantadine and rimantadine. These drugs have lost their efficacy because of the mutations in their drug interaction sites. Large-scale analysis of these influenza surface proteins may give better elucidation for understanding the evolution of the proteins toward the drug resistant mechanism. Objective: The current investigation aimed to understand the evolutionary lineage and to enlighten the mechanism of drug resistance in newly emerging strains. Method: Combined sequence, secondary structural, evolutionary conservation, and phylogenetic analyses were carried out with 2010 influenza A M2 channel protein sequences. Results: The structural information provides enough details for understanding the drug resistance in the target proteins. Herein, secondary structural analysis of M2 sequences predicted the variation only in the drug binding region. The rate of mutation in S31N is high in swine/H3N2 than in human/H1N1, human/H3N2, swine/H1N1, and avian/H5N influenza A viruses. This confirms that antigenic drift does not affect the functional mechanism of the protein. Also, it reports that the avian influenza virus is the source for the M2 gene segment and has transferred from the avian to human and swine. Our findings show that the M2 gene segment has interchanged between swine and human. Conclusion: This study proves that rapid mutation and frequent reassortment play a major role in drug resistant strains. Phylogenetic and secondary structural analysis confirms the existence of a genetic lineage between avian, swine, and human influenza A viruses.
-
-
-
Enhancing Efficiency of Protein Functional Prediction Through Association Network Using Greedy Weighting Method
Authors: Atabak Kheirkhah, Salwani Mohd Daud and Kamilia KamardinBackground: In spite of the significant data surrounding complex gene networks including gene function, the occurrence of huge redundancy affects the efficiency. Objective: This work proposes a mining method to reduce the number of redundant nodes in a composite weighted network. Method: The idea is to eliminate the redundancies of nodes via a hybrid approach, i.e. the integration of multiple functional association networks using a Greedy Algorithm. This is achieved by mining the gene function from weighted gene co-expression networks based on neighbor similarity, as per the available datasets. Subsequently, Linear Regression and Greedy Algorithm are applied simultaneously for exclusion of the redundant nodes. Then, assigning the indexing rates for the remaining nodes in the dataset further assists the process. Results and Conclusion: In comparison with other well-known algorithms, this method is 93% more efficient, as per three selected benchmarks.
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month
