Current Bioinformatics - Volume 16, Issue 3, 2021
Volume 16, Issue 3, 2021
-
-
Introduction of Advanced Methods for Structure-based Drug Discovery
Authors: Bilal Shaker, Kha M. Tran, Chanjin Jung and Dokyun NaStructure-based drug discovery has become a promising and efficient approach for identifying novel and potent drug candidates with less time and cost than conventional drug discovery approaches. It has been widely used in the pharmaceutical industry since it uses the 3D structure of biological protein targets and thereby allows us to understand the molecular basis of diseases. For the virtual identification of drug candidates based on structure, there are a few steps for protein and compound preparations to obtain accurate results. In this review, the software and webtools for the preparation and structure-based simulation are introduced. In addition, recent improvements in structure-based virtual screening, target library designing for virtual screening, docking, scoring, and post-processing of top hits are also introduced.
-
-
-
Characterization and Prediction of Presynaptic and Postsynaptic Neurotoxins Based on Reduced Amino Acids and Biological Properties
Authors: Yiyin Cao, Chunlu Yu, Shenghui Huang, Shiyuan Wang, Yongchun Zuo and Lei YangBackground: Presynaptic and postsynaptic neurotoxins are two important categories of neurotoxins. Due to the important role of presynaptic and postsynaptic neurotoxins in pharmacology and neuroscience, their identification has become very important biologically. Methods: In this study, statistical tests and F-scores were used to calculate differences between amino acids and biological properties. The support vector machine was used to predict presynaptic and postsynaptic neurotoxins using reduced amino acid alphabet types. Results: Using the reduced amino acid alphabet as input parameters of the support vector machine, the overall accuracy of our classifier increased to 91.07%, which was the highest overall accuracy observed in this study. When compared with the other published methods, better predictive results were obtained by our classifier. Conclusion: In summary, we analyzed the differences between two neurotoxins with respect to amino acids and biological properties, constructing a classifier that predicts these two neurotoxins using the reduced amino acid alphabet.
-
-
-
Fusing Multiple Biological Networks to Effectively Predict miRNA-disease Associations
Authors: Qingqi Zhu, Yongxian Fan and Xiaoyong PanBackground: MicroRNAs (miRNAs) are a class of endogenous non-coding RNAs with about 22 nucleotides, and they play a significant role in a variety of complex biological processes. Many researches have shown that miRNAs are closely related to human diseases. Although the biological experiments are reliable in identifying miRNA-disease associations, they are timeconsuming and costly. Objective: Thus, computational methods are urgently needed to effectively predict miRNA-disease associations. Methods: In this paper, we proposed a novel method, BIRWMDA, based on a bi-random walk model to predict miRNA-disease associations. Specifically, in BIRWMDA, the similarity network fusion algorithm is used to combine the multiple similarity matrices to obtain a miRNA-miRNA similarity matrix and a disease-disease similarity matrix, then the miRNA-disease associations were predicted by the bi-random walk model. Results: To evaluate the performance of BIRWMDA, we ran the leave-one-out cross-validation and 5-fold cross-validation, and their corresponding AUCs were 0.9303 and 0.9223 ± 0.00067, respectively. To further demonstrate the effectiveness of the BIRWMDA, from the perspective of exploring disease-related miRNAs, we conducted three case studies of breast neoplasms, prostate neoplasms and gastric neoplasms, where 48, 50 and 50 out of the top 50 predicted miRNAs were confirmed by literature, respectively. From the perspective of exploring miRNA-related diseases, we conducted two case studies of hsa-mir-21 and hsa-mir-155, where 7 and 5 out of the top 10 predicted diseases were confirmed by literatures, respectively. Conclusion: The fusion of multiple biological networks could effectively predict miRNA-diseases associations. We expected BIRWMDA to serve as a biological tool for mining potential miRNAdisease associations.
-
-
-
Exploring miRNA Sponge Networks of Breast Cancer by Combining miRNA-disease-lncRNA and miRNA-target Networks
Authors: Lei Tian and Shu-Lin WangBackground: Recently, ample researches show that microRNAs (miRNAs) not only interact with coding genes but interact with a pool of different RNAs. Those RNAs are called miRNA sponges, including long non-coding RNAs (lncRNAs), circular RNA, pseudogenes and various messenger RNAs. Understanding regulatory networks of miRNA sponges can better help researchers to study the mechanisms of breast cancers. Objective: We develop a new method to explore miRNA sponge networks of breast cancer by combining miRNA-disease-lncRNA and miRNA-target networks (MSNMDL). Methods: Firstly, MSNMDL infers miRNA-lncRNA functional similarity networks from miRNAdisease- lncRNA networks. Secondly, MSNMDL forms lncRNA-target networks by using lncRNA to replace the role of matched miRNA in miRNA-target networks according to the lncRNA-miRNA pair of miRNA-lncRNA functional similarity networks. And MSNMDL only retains the genes of breast cancer in lncRNA-target networks to construct candidate miRNA sponge networks. Thirdly, MSNMDL merges these candidate miRNA sponge networks with other miRNA sponge interactions and then selects top-hub lncRNA and its interactions to construct miRNA sponge networks. Result: MSNMDL is superior to other methods in terms of biological significance and its identified modules might act as module signatures for prognostication of breast cancer. Conclusion: MiRNA sponge networks identified by MSNMDL are biologically significant and are closely associated with breast cancer, which makes MSNMDL a promising way for researchers to study the pathogenesis of breast cancer.
-
-
-
Extracting Gradual Rules to Reveal Regulation Between Genes
Authors: Manel Gouider, Ines Hamdi and Henda B. GhezalaBackground: Gene regulation represents a very complex mechanism in the cell initiated to increase or decrease gene expression. This regulation of genes forms a Gene regulatory Network GRN composed of a collection of genes and products of genes in interaction. The high throughput technologies that generate a huge volume of gene expression data are useful for analyzing the GRN. The biologists are interested in the relevant genetic knowledge hidden in these data sources. Although, the knowledge extracted by the different data mining approaches of the literature is insufficient for inferring the GRN topology or does not give a good representation of the real genetic regulation in the cell. Objective: In this work, we performed the extraction of genetic interactions from the high throughput technologies, such as the microarrays or DNA chips. Methods: In this paper, in order to extract expressive and explicit knowledge about the interactions between genes, we used the method of gradual patterns and rules extraction applied on numerical data that extracts the frequent co-variations between gene expression values. Furthermore, we choose to integrate experimental biological data and biological knowledge in the process of knowledge extraction of genetic interactions. Results: The validation results on real gene expression data of the model plant Arabidopsis and human lung cancer showed the performance of this approach. Conclusion: The extracted gradual rules express the genetic interactions composed of a GRN. These rules help to understand complex systems and cellular functions.
-
-
-
Gene Set Correlation Analysis and Visualization Using Gene Expression Data
Authors: Chen-An Tsai and James J. ChenBackground: Gene set enrichment analyses (GSEA) provide a useful and powerful approach to identify differentially expressed gene sets with prior biological knowledge. Several GSEA algorithms have been proposed to perform enrichment analyses on groups of genes. However, many of these algorithms have focused on the identification of differentially expressed gene sets in a given phenotype. Objective: In this paper, we propose a gene set analytic framework, Gene Set Correlation Analysis (GSCoA), that simultaneously measures within and between gene sets variation to identify sets of genes enriched for differential expression and highly co-related pathways. Methods: We apply co-inertia analysis to the comparisons of cross-gene sets in gene expression data to measure the co-structure of expression profiles in pairs of gene sets. Co-inertia analysis (CIA) is one multivariate method to identify trends or co-relationships in multiple datasets, which contain the same samples. The objective of CIA is to seek ordinations (dimension reduction diagrams) of two gene sets such that the square covariance between the projections of the gene sets on successive axes is maximized. Simulation studies illustrate that CIA offers superior performance in identifying corelationships between gene sets in all simulation settings when compared to correlation-based gene set methods. Result and Conclusion: We also combine between-gene set CIA and GSEA to discover the relationships between gene sets significantly associated with phenotypes. In addition, we provide a graphical technique for visualizing and simultaneously exploring the associations between and within gene sets and their interaction and network. We then demonstrate the integration of within and between gene sets variation using CIA and GSEA, applied to the p53 gene expression data using the c2 curated gene sets. Ultimately, the GSCoA approach provides an attractive tool for the identification and visualization of novel associations between pairs of gene sets by integrating corelationships between gene sets into gene set analysis.
-
-
-
An Ensembled SVM Based Approach for Predicting Adverse Drug Reactions
Authors: Pratik Joshi, Masilamani Vedhanayagam and Raj RameshBackground: Preventing adverse drug reactions (ADRs) is imperative for the safety of the people. The problem of under-reporting the ADRs has been prevalent across the world, making it difficult to develop the prediction models, which are unbiased. As a result, most of the models are skewed to the negative samples leading to high accuracy but poor performance in other metrics such as precision, recall, F1 score, and AUROC score. Objective: In this work, we have proposed a novel way of predicting the ADRs by balancing the dataset. Methods: The whole data set has been partitioned into balanced smaller data sets. SVMs with optimal kernel have been learned using each of the balanced data sets and the prediction of given ADR for the given drug has been obtained by voting from the ensembled optimal SVMs learned. Results: We have found that results are encouraging and comparable with the competing methods in the literature and obtained the average sensitivity of 0.97 for all the ADRs. The model has been interpreted and explained with SHAP values by various plots. Conclusion: A novel way of predicting ADRs by balancing the dataset has been proposed thereby reducing the effect of unbalanced datasets.
-
-
-
A Computational Framework to Identify Cross Association Between Complex Disorders by Protein-protein Interaction Network Analysis
Authors: Nikhila T. Suresh, Vimina E. Ravindran and Ullattil KrishnakumarObjective: It is a known fact that numerous complex disorders do not happen in isolation indicating the plausible set of shared causes common to several different sicknesses. Hence, analysis of comorbidity can be utilized to explore the association between several disorders. In this study, we have proposed a network-based computational approach, in which genes are organized based on the topological characteristics of the constructed Protein-Protein Interaction Network (PPIN) followed by a network prioritization scheme, to identify distinctive key genes and biological pathways shared among diseases. Methods: The proposed approach is initiated from constructed PPIN of any randomly chosen disease genes in order to infer its associations with other diseases in terms of shared pathways, coexpression, co-occurrence etc. For this, initially, proteins associated to any disease based on random choice were identified. Secondly, PPIN is organized through topological analysis to define hub genes. Finally, using a prioritization algorithm a ranked list of newly predicted multimorbidity-associated proteins is generated. Using Gene Ontology (GO), cellular pathways involved in multimorbidity-associated proteins are mined. Result and Conclusion: The proposed methodology is tested using three disorders, namely Diabetes, Obesity and blood pressure at an atomic level and the results suggest the comorbidity of other complex diseases that have associations with the proteins included in the disease of present study through shared proteins and pathways. For diabetes, we have obtained key genes like GAPDH, TNF, IL6, AKT1, ALB, TP53, IL10, MAPK3, TLR4 and EGF with key pathways like P53 pathway, VEGF signaling pathway, Ras Pathway, Interleukin signaling pathway, Endothelin signaling pathway, Huntington disease etc. Studies on other disorders such as obesity and blood pressure also revealed promising results.
-
-
-
PoGB-pred: Prediction of Antifreeze Proteins Sequences Using Amino Acid Composition with Feature Selection Followed by a Sequential-based Ensemble Approach
Authors: Affan Alim, Abdul Rafay and Imran NaseemBackground: Proteins contribute significantly in every task of cellular life. Their functions encompass the building and repairing of tissues in human bodies and other organisms. Hence they are the building blocks of bones, muscles, cartilage, skin, and blood. Similarly, antifreeze proteins are of prime significance for organisms that live in very cold areas. With the help of these proteins, the cold water organisms can survive below zero temperature and resist the water crystallization process, which may cause the rupture in the internal cells and tissues. AFP’s have also attracted attention and interest in food industries and cryopreservation. Objective: With the increase in the availability of genomic sequence data of protein, an automated and sophisticated tool for AFP recognition and identification is in dire need. The sequence and structures of AFP are highly distinct, therefore, most of the proposed methods fail to show promising results on different structures. A consolidated method is proposed to produce a competitive performance on a highly distinct AFP structure. Methods: In this study, machine learning-based algorithms including Principal Component Analysis (PCA) followed by Gradient Boosting (GB) were proposed to be used for anti-freeze protein identification. To analyze the performance and validation of the proposed model, various combinations of two segments' composition of amino acid and dipeptides are used. PCA, in particular, is proposed for dimension reduction and high variance retaining of data, which is followed by an ensemble method named gradient boosting for modeling and classification. Results: The proposed method obtained a superfluous performance on PDB, Pfam, and Uniprot datasets as compared to the RAFP-Pred method. In experiment-3, by utilizing only 150 PCA components, a high accuracy of 89.63% was achieved, which is superior to 87.41% utilizing 300 significant features reported for the RAFP-Pred method. Experiment-2 is conducted using two different datasets such that non-AFP from the PISCES server and AFPs from Protein data bank. In this experiment-2, the proposed method attained high sensitivity of 79.16% which is 12.50% better than state-of-the-art RAFP-pred method. Conclusion: AFPs have a common function with a distinct structure. Therefore, the development of a single model for different sequences often fails for AFPs. Robust results have been shown by the proposed model on the diversity of training and testing datasets. The results of the proposed model outperformed compared to the previous AFPs prediction method, such as RAFP-Pred. The proposed model consists of PCA for dimension reduction, followed by gradient boosting for classification. Due to simplicity, scalability properties, and high performance result, this model can be easily extended for analyzing the proteomic and genomic datasets.
-
-
-
Deep-BSC: Predicting Raw DNA Binding Pattern in Arabidopsis Thaliana
Authors: Syed A. S. Bukhari, Abdul Razzaq, Javeria Jabeen, Shaheer Khan and Zulqurnain KhanBackground: With the rapid development of the sequencing methods in recent years, binding sites have been systematically identified in such projects as Nested-MICA and MEME. Prediction of DNA motifs with higher accuracy and precision has been a very important task for bioinformaticians. Nevertheless, experimental approaches are still time-consuming for big data set, making computational identification of binding sites indispensable. Objective: To facilitate the identification of the binding site, we proposed a deep learning architecture, named Deep-BSC (Deep-Learning Binary Search Classification), to predict binding sites in a raw DNA sequence with more precision and accuracy. Methods: Our proposed architecture purely relies on the raw DNA sequence to predict the binding sites for protein by using a convolutional neural network (CNN). We trained our deep learning model on binding sites at the nucleotide level. DNA sequence of A. thaliana is used in this study because it is a model plant. Results: The results demonstrate the effectiveness and efficiency of our method in the classification of binding sites against random sequences, using deep learning. We construct a CNN with different layers and filters to show the usefulness of max-pooling technique in the proposed method. To gain the interpretability of our approach, we further visualized binding sites in the saliency map and successfully identified similar motifs in the raw sequence. The proposed computational framework is time and resource efficient. Conclusion: Deep-BSC enables the identification of binding sites in the DNA sequences via a highly accurate CNN. The proposed computational framework can also be applied to problems such as operator, repeats in the genome, DNA markers, and recognition sites for enzymes, thereby promoting the use of Deep-BSC method in life sciences.
-
-
-
Deep Learning Model for Pathogen Classification Using Feature Fusion and Data Augmentation
Authors: Fareed Ahmad, Amjad Farooq and Muhammad U. G. KhanBackground: Bacterial pathogens are deadly for animals and humans. The ease of their dissemination, coupled with their high capacity for ailments and death in infected individuals, makes them a threat to society. Objective: Due to the high similarity among genera and species of pathogens, it is sometimes difficult for microbiologists to differentiate between them. Their automatic classification using deeplearning models can help in gaining reliable and accurate outcomes. Methods: Deep-learning models, namely; AlexNet, GoogleNet, ResNet101, and InceptionV3 are used with numerous variations including training model from scratch, fine-tuning without pre-trained weights, fine-tuning along with freezing weights of initial layers, fine-tuning along with adjusting weights of all layers and augmenting the dataset by random translation and reflection. Moreover, as the dataset is small, fine-tuning and data augmentation strategies are applied to avoid overfitting and produce a generalized model. A merged feature vector is produced using two best-performing models and accuracy is calculated by xgboost algorithm on the feature vector by applying cross-validation. Results: Fine-tuned models where augmentation is applied produces the best results. Out of these, two-best-performing deep models i.e. (ResNet101, and InceptionV3) selected for feature fusion, produced a similar validation accuracy of 95.83 with a loss of 0.0213 and 0.1066, and testing accuracy of 97.92 and 93.75, respectively. The proposed model used xgboost to attain a classification accuracy of 98.17% by using 35-folds cross-validation. Conclusion: The automatic classification using these models can help experts in the correct identification of pathogens. Consequently, they can help in controlling epidemics and thereby minimizing the socio-economic impact on the community.
-
-
-
Incorporating K-mers Highly Correlated to Epigenetic Modifications for Bayesian Inference of Gene Interactions
Authors: Dariush Salimi and Ali MoeiniObjective: A gene interaction network, along with its related biological features, has an important role in computational biology. Bayesian network, as an efficient model, based on probabilistic concepts is able to exploit known and novel biological casual relationships between genes. The success of Bayesian networks in predicting the relationships greatly depends on selecting priors. Methods: K-mers have been applied as the prominent features to uncover the similarity between genes in a specific pathway, suggesting that this feature can be applied to study genes dependencies. In this study, we propose k-mers (4,5 and 6-mers) highly correlated with epigenetic modifications, including 17 modifications, as a new prior for Bayesian inference in the gene interaction network. Result: Employing this model on a network of 23 human genes and on a network based on 27 genes related to yeast resulted in F-measure improvements in different biological networks. Conclusion: The improvements in the best case are 12%, 36%, and 10% in the pathway, coexpression, and physical interaction, respectively.
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month
