Current Bioinformatics - Volume 12, Issue 6, 2017
Volume 12, Issue 6, 2017
-
-
MetalExplorer, a Bioinformatics Tool for the Improved Prediction of Eight Types of Metal-Binding Sites Using a Random Forest Algorithm with Two- Step Feature Selection
Authors: Jiangning Song, Chen Li, Cheng Zheng, Jerico Revote, Ziding Zhang and Geoffrey I. WebbBackground: Metalloproteins are highly involved in many biological processes, including catalysis, recognition, transport, transcription, and signal transduction. The metal ions they bind usually play enzymatic or structural roles in mediating these diverse functional roles. Thus, the systematic analysis and prediction of metal-binding sites using sequence and/or structural information are crucial for understanding their sequence-structure-function relationships. Objective: The objective of this work is to develop a new computational algorithm for improved prediction of major types of metal-binding sites. Method: We propose MetalExplorer (http://metalexplorer.erc.monash.edu.au/), a new machine learning-based method for predicting eight different types of metal-binding sites (Ca, Co, Cu, Fe, Ni, Mg, Mn, and Zn) in proteins. Our approach combines heterogeneous sequence-, structure-, and residue contact network-based features in a random forest machine-learning framework. Results: The predictive performance of MetalExplorer was tested by cross-validation and independent tests using non-redundant datasets of known structures. This method applies a two-step feature selection approach based on the maximum relevance minimum redundancy and forward feature selection to identify the most informative features that contribute to the prediction performance. With a precision of 60%, MetalExplorer achieved high recall values, which ranged from 59% to 88% for the eight metal ion types in fivefold cross-validation tests. Moreover, the common and type-specific features in the optimal subsets of all metal ions were characterized in terms of their contributions to the overall performance. Conclusion: In terms of both benchmark and independent datasets at the 60% precision control level, MetalExplorer compared favorably with an existing metalloprotein prediction tool, SitePredict. MetalExplorer is expected to be a powerful tool for the accurate prediction of potential metal-binding sites and it should facilitate the functional analysis and rational design of novel metalloproteins.
-
-
-
Information Content Estimate of Model Proteomes: A Primary Structure Perspective
More LessBackground: The mathematical foundation for the information theory in communication engineering was developed by Claude Shannon in 1948. Since then the information theory has been utilized to investigate various information carrying systems including biomolecules such as DNA and proteins. Objective: In this study, a measure for the structural information content estimate of proteomes is proposed. The considered primary structure feature for the information content investigation is the sequence length organization of proteomic proteins, as opposed to the amino acid order in individual protein sequences. Method: We analyzed and compared the information content estimates of a representative proteome set of ten proteomes for measured, model-predicted (linguistic distribution model) and simulated (random sequence length) cases. Results: Excellent agreement was observed in the measured and model-predicted information contents of the proteomes. The overall average information per proteomic protein was obtained as 8 and 7 bits for the measured/model-predicted and the simulated proteomic collection data, respectively. Conclusion: The study reveals that the biological interaction mechanisms may primarily rely on the number of amino acids than the amino acid order of an interaction-initiating protein sequence. The approach presented here may serve as a practical tool for studying and comparing biological processes taking place in an organism or in a collection of organisms, and is anticipated to offer numerous promises for the exploration of proteomic information characteristics present in different structural hierarchies such as the secondary and tertiary structures.
-
-
-
A Novel Method for Better Bacterial Genome Assembly from Illumina Data
Authors: Peixiang Ni, Wenkui Dai, Yongfeng Liu, Zhenyu Yang, Tao Zhou, Shuqing Liang, Tong Wang, Jing Xu and Yun ZhaoBackground: With the rapid development of next generation sequencing technology, a great many individual genomes have been generated. Genome sequence of bacterium, as the foundation of microbiology research, is of great value. Due to the diversity and complexity of bacterium, assembling genome short reads is still challenging. Objective: A new solution has been developed based on SOAPdenovo assembler to increase the fineness and accuracy of bacterial genome sequence. Method: The method mainly contains four steps: preliminary genome assembly via SOAPdenovo, super scaffold construction, gap closure and final sequence revision. Results: Seventeen fine genomes have been generated through this solution. Meanwhile, 23 sequenced strains are chosen to evaluate the advantage of this method, and the assembly result shows that 16 of them are better than the original ones in contiguity and accuracy. Conclusion: With more and more individual bacterial genomes generated by this method, we can infer that this work provides a cost-effective and time-saving method for the acquisition of bacterial genomes.
-
-
-
Computational Modeling of Small Molecule Inhibitors of Mitochondrial Fusion
Authors: Sonam Arora, Salma Jamal, Sonam Gaba, Yasha Hasija and Vinod ScariaBackground: Mitochondria are membrane bound structures found in most eukaryotic cells.The most prominent function of this essential organelle is the generation of ATP and the regulation of cell metabolism. However, being a vital part of the cell, mitochondrial dysfunction has been associated to many diseases due to its influence on cellular metabolism. A range of disorders and diseases have been reported as a result of damage and dysfunction in mitochondria which include cancer, diabetes mellitus and neurodegenerative diseases that affect millions of people worldwide. This has made mitochondrial processes an attractive and novel target for potential therapeutic intervention. The application of cheminformatics tools has made possible prioritization and in-depth understanding of small molecules with mitochondrial phenotypes at a much faster rate and reduced cost compared to traditional high-throughput screening. Methods: We have used a publicly available dataset of inhibitors of mitochondrial fusion to build accurate predictive cheminformatics models. We have used the machine learning based classification algorithms and further enhanced this approach using a maximum common substructure (MCS) approach. Three classification algorithms, namely Naive Bayes, Random forest and J48 were used in the present study. Results: Random forest based model was found to be the most accurate, with an accuracy of about 80%. As a proof of application, themodel was further used to prioritize a subset of drug like molecules from a large chemical library, ZINC as well as used to annotate potential new mechanisms of action of molecules with anti-cancer activities. Conclusions: We show that machine learning approaches could be effectively used to build highly accurate classification models for high-throughput screen datasets. We show as proof of concept that such models could be used to screen and prioritize large datasets in silico, for further experimental validation and also assign potential mechanism of action for molecules.
-
-
-
Molecular Beacon Based Biosensing for Detection of Pathogenic Water Borne Multiple Fungal Strains: An In-Silico Approach
Authors: Sonali Mishra and Krishna MisraBackground: The water borne pathogenic fungi have recently become a big threat and lead cause of many hazardous infectious diseases in immuno challenged people. Objective: A universal standardized method has to be developed for instant, specific and easier detection and diagnosis of water borne pathogenic fungi. Method: None of the methods so far known for the detection of pathogenic microorganisms is as handy, economic, specific and sensitive as molecular beacons. Computational approach has been employed in the present work to detect the conserved and oft repeat sequence pattern in the r-RNA sequences of twenty three water borne pathogenic fungal species. These species were classified in three groups and models of probes containing secondary stem loop structures were designed for each group. Finally a common model capable of specifically detecting all species in one hit has been designed. Molecular beacons have been proposed by attaching donor and acceptor dyes to the designed probes Results and Conclusion: In the present work a molecular beacon based probe has been modelled which is capable of detecting 23 pathogenic and water borne fungal strains. This approach of designing molecular beacons with specific sequences of rRNA can prove to be a sensitive diagnostic technique for detecting water borne pathogenic, fungal strains in miniscule amounts.
-
-
-
Identification of Drug-Drug Interactions Using Chemical Interactions
Authors: Lei Chen, Chen Chu, Yu-Hang Zhang, Mingyue Zheng, LiuCun Zhu, XiangYin Kong and Tao HuangBackground: One drug can affect the activity of another when they are administered together, which can cause adverse drug reactions or sometimes improve therapeutic effects. Therefore, correct identification of drug-drug interactions (DDIs) can help medical workers use various drugs effectively, avoiding adverse effects and improving therapeutic effects. Methods: This study proposed a novel prediction model to identify DDIs. A new metric was constructed to evaluate the similarity of two pairs of drugs using chemical interaction information retrieved from STITCH. Validated DDIs retrieved from DrugBank were employed, from which we constructed all possible pairs of drugs that were deemed as negative samples. The whole dataset was divided into one training dataset and one test dataset. To address the imbalanced data, a complicated dataset compilation strategy was adopted to construct nine training datasets from the original training dataset, reducing the ratio of positive samples and negative samples. Nine predictors based on the nearest neighbor algorithm were built based on these training datasets. The proposed model integrated the above nine predictors by majority voting and its performance was evaluated on the test dataset. Results: The predicted results indicate that the method is quite effective for identification of DDIs. Finally, we also discussed the ability of the method for identifying novel DDIs by investigating the likelihood of some negative samples in the test dataset that were predicted as DDIs being novel DDIs. Conclusion: The proposed method has a good ability for identification of potential DDIs.
-
-
-
Implication on the Function of Novel Xn-relE Toxin Structure of Xenorhabdus nematophila Using Homology Modeling
Authors: Lalit K. Gautam, Ragothaman M. Yennamalli and Jitendra S. RathoreBackground: Bacterial chromosomal toxin-antitoxin systems are involved in various cell functions such as stress response, promoting cell cycle arrest and bringing about the onset of programmed cell death. Unlike RelBE TA module of Escherichia coli, genome of Xenorhabdus nematophila has two separate TA modules for RelB and RelE. Here RelE being the toxin bears its own antitoxin and RelB antitoxin bears its separate toxin counterpart. More interestingly these modules are located distantly in genome. Objective: In this study, Xn-relE toxin model structure from X. nematophila is explored for the first time. Toxic effect of Xn-relE has already been shown by endogenous killing in our earlier report. Methods: Since no crystallographic structure for Xn-relE toxin is available till date. The models of X. nematophila Xn-relE toxin and its antitoxin Xn-relEAT were developed using the I-TASSER server and analyzed to define gene ontology. The models were validated by using VERIFY-3D. Results: Homology models for X. nematophila Xn-relE toxin and its antitoxin Xn-relEAT was obtained and interactions were established. The structural and functional annotation of this TA system designate it as Type II TA module. Conclusion: The present study sheds light on the structure and function of Xn-relE toxin of Xn RelE TA module, whose applicability in the area of agricultural sciences is pronounced.
-
-
-
Improved Algorithm for the Detection of Cancerous Cells Using Discrete Wavelet Transformation of Genomic Sequences
Authors: Inbamalar T. Mariapushpam and Sivakumar RajagopalBackground: Cancer is the leading cause of mortality in worldwide. Cancer occurs due to anomalous mutations in a cell. Precise cancer diagnosis and specific course of treatment is essential for saving human lives. Objective: The main aim is to use digital signal processing techniques for the detection of cancer cells. Method: A method to classify the normal and the cancerous cells using discrete wavelet transformation has been developed. Here, the Deoxyribo nucleic acid sequences have been converted into numeric sequences using electron ion interaction potential values. Then wavelet transform is obtained. The cross correlation values of the wavelet coefficients of normal and cancerous cells have been calculated. The maximum cross correlation amplitude in transformed domain is calculated in order to detect the abnormality present in the nucleotides of the cells. Results: The test has been conducted on 82 cancerous Deoxyribo nucleic acid sequences and 82 normal Deoxyribo nucleic acid sequences. Standard performance metrics have been evaluated and the values obtained are sensitivity - 98.78%, specificity - 100%, accuracy - 99.39%, Positive precision - 98.78% and negative precision - 100%. Conclusion: Comparing the performance metrics obtained with the methods in literature, it is found that the wavelet transformation method is better. Hence, this approach can be considered as an efficient solution for cancer detection. This method aids in early cancer detection and cancer therapeutics.
-
-
-
DeepInteract: Deep Neural Network Based Protein-Protein Interaction Prediction Tool
Authors: Sunil Patel, Rashmi Tripathi, Vandana Kumari and Pritish VaradwajBackground: Proteins form specific molecular complexes and the specificity of its interaction is highly essential for discovering and analyzing cellular mechanisms. Aim: The development of large-scale high-throughput experiments using in silico approach has resulted in the production of accurate data which has accelerated the uncovering of novel proteinprotein interactions (PPIs). Method: In this work we present an integrative domain-based method, ‘DeepInteract’ for predicting PPIs using Deep Neural Network (DNN). The interacting set of PPIs was extracted from the Database of Interacting Proteins (DIP) and Kansas University Proteomics Service (KUPS). Results: When validating the performance on an independent dataset of 34100 PPIs of Saccharomyces cerevisiae the proposed classifier achieved promising prediction result with accuracy, precision, sensitivity and specificity of 92.67%, 98.31%, 86.85% and 98.51%, respectively. Similar classifiers were implemented on protein complexes for Escherichia coli, Drosophila melanogaster, Homo sapiens and Caenorhabditis elegans, with prediction accuracy achieved of 97.01%, 90.85%, 94.47% and 88.91% respectively. Conclusion: The performance of this proposed method is found to be better than the existing domain-based machine learning PPI prediction approaches. Recommendation: The DeepInteract server interface along with the train/test datasets, source codes and supplementary files are freely available on: http://bioserver.iiita.ac.in/deepinteract.
-
-
-
Identification of Robust Clustering Methods in Gene Expression Data Analysis
Authors: Md. B. Hossen and Md. Siraj-Ud-DoulahBackground: Cluster analysis techniques of gene expression microarray data is of increasing interest in the field of current bioinformatics. One of the reasons for this is the need for molecular-based refinement of broadly defined biological classes, with implications in cancer diagnosis, prognosis and treatment. And many algorithms have been developed for this problem. Objective: However microarray data frequently include outliers, and how to treat these outlier's effects in the subsequent analysis-clustering. Method: In this paper, we present the large-scale analysis of seven different agglomerative hierarchical clustering methods and five proximity measures for the analysis of 33 cancer gene expression datasets. As a case study, we used two experimental datasets: Affymetrix and cDNA, and different percent outliers were artificially added to these datasets. Results: We found that ward method gives the highest corrected Rand index value with respect to the spearman proximity measures when datasets contain with and without outliers. Conclusion: This study proves that ward method is more robust clustering methods in gene expression data analysis among other methods.
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month
