Current Bioinformatics - Volume 16, Issue 10, 2021
Volume 16, Issue 10, 2021
-
-
Prediction of Drug-target Binding Affinity by An Ensemble Lear ning System with Network Fusion Information
Authors: Cheng L. Zhang, You Zhi Zhang, Bing Wang and Peng ChenBackground: Verifying interactions between drugs and targets is key to discover new drugs. Many computational methods have been developed to predict drug-target interactions and performed successfully, but challenges still exist in the field. Objective: We have made an attempt to develop a machine learning method to predict drug-target affinity, which can determine the strength of the binding relationship between drug and target. Methods: This paper proposes an integrated machine learning system for drug-target binding affinity prediction based on network fusion. First, multiple similarity networks representing drugs or targets are calculated. Second, multiple networks representing drugs (targets) are fused separately. Finally, the characteristic information of splicing drugs and targets was used for model construction and training. By integrating multiple similarity networks, the model fully embodies the complementarity of network information, and the most complete features of information can be obtained after the redundancy is removed. Results: Experimental results showed that our model obtained good results for DTI binding affinity. Conclusion: It is still challenging to predict drug-target affinity. This paper proposes to use an integrated system of fusion network information for addressing the issue, and the proposed method performs well, which can provide a certain data basis for the subsequent work.
-
-
-
Identifying Functional Modules Using Energy Minimization with Graph Cuts
Authors: Yuanyuan Chen, Xiaodan Fan and Cong PianAims: The aim of this article was to find functional (or disease-relevant) modules using gene expression data. Background: Biotechnological developments are leading to a rapid increase in the volume of transcriptome data and thus driving the growth of interactome data. This has made it possible to perform transcriptomic analysis by integrating interactome data. Considering that genes do not exist nor operate in isolation, and instead participate in biological networks, interactomics is equally important to expression profiles. Objective: We constructed a network-based method based on gene expression data in order to identify functional (or disease-relevant) modules. Methods: We used the energy minimization with graph cuts method by integrating gene interaction networks under the assumption of the ‘guilt by association’ principle. Results: Our method performs well in an independent simulation experiment and has the ability to identify strongly disease-relevant modules in real experiments. Our method is able to find important functional modules associated with two subtypes of lymphoma in a lymphoma microarray dataset. Moreover, the method can identify the biological subnetworks and most of the genes associated with Duchenne muscular dystrophy. Conclusion: We successfully adapted the energy minimization with the graph cuts method to identify functionally important genes from genomic data by integrating gene interaction networks. Other: This study can help us to identify disease-relevant modules which can not be identified by different expression analysis.
-
-
-
MBMM: Moment Estimating Beta Mixture Model-based Clustering Algorithm for m6A Co-methylation Module Mining
Authors: Zhaoyang Liu, Hongsheng Yin, Shutao Chen, Hui Liu, Jia Meng, HongLei Wang and Lin ZhangBackground: m6A methylation is a ubiquitous post-transcriptional modification that exists in mammals. MeRIP-seq technology makes the acquisition of m6A data in the whole transcriptome under different conditions realizable. The specific regulation of the enzyme will present comethylation module on m6A methylation level data. Thus, mining the co-methylation module from which can help to unveil the mechanism of m6A methylation modification and its mechanism in the occurrence and development of complex diseases such as cancer. Objective: To develop a clustering algorithm that can effectively realize the mining of m6A comethylation module. Methods: In this study, a novel beta mixture model-based clustering algorithm named MBMM was proposed, which is based on the EM framework and introduces the method of moment estimating in M-step for parameter estimation to tackle the high-dimensional small sample m6A data. Simulation research was employed to evaluate the clustering performance of the proposed algorithm, and by which the co-methylation module mining was done based on real data. Biological significance correlation analysis was employed to explore whether the clustering results are co-methylation modules. Results and Conclusion: Simulation research demonstrated that MBMM performed out than other clustering algorithms. In real data, seven co-methylation modules were found by MBMM. Six m6Arelated pathways specific analysis showed that six co-methylation modules were enriched in the pathway and were different. Five enzymes substrate-specific analysis revealed that seven comethylation modules expressed varying degrees of enrichment. Gene Ontology enrichment analysis indicated that these modules may be regulated by enzymes while having potential functional specificity.
-
-
-
Early Prediction of Malignant Mesothelioma: An Approach Towards Non-invasive Method
Authors: Shakir Shabbir, Muhammad S. Asif, Talha Mahboob Alam and Zeeshan RamzanBackground: Malignant Mesothelioma (MM) is a rare but aggressive tumor that arises in the lungs. Commonly, costly imaging and laboratory resources, i.e. (X-rays imaging, Magnetic Resonance Imaging (MRI), Positron Emission Tomography (PET) scans, biopsies, and blood tests) have already been utilized for the diagnosis of MM. Even though these diagnostic measures are expensive and unavailable in distant areas, some of these diagnosis methods are also very painful for the patient, i.e., biopsy and cytology of pleural fluid. Objective: In this study, we proposed a diagnosis model for early identification of MM via machine learning techniques. We explored the health records of 324 Turkish patients, which show the symptoms related to MM. The data of patients include socio-economic, geographical, and clinical features. Methods: Different feature selection methods have been employed for the selection of significant features. To overcome the data imbalance problem, various data-level resampling techniques have been utilized to obtain efficient results. The Gradient Boosted Decision Tree (GBDT) method has been used to develop the diagnostic model. The performance of the GBDT model is also compared with traditional machine learning algorithms. Results: Our model's results outperformed other models, both on balance and imbalance data. The results clearly show that undersampling techniques outperformed by imbalanced data even without resampling based on accuracy and Receiving Operating Characteristic (ROC) value. Conversely, it has also been observed that oversampling techniques outperformed undersampling and imbalanced data based on accuracy and ROC. All classifiers employed in this study achieved efficient results utilizing feature selection-based methods (OneR, information gain, and Relief-F), but the results of the other two methods (gain ratio and Correlation) were not entirely promising. Finally, when the combination of Synthetic Minority Oversampling Technique (SMOTE) and OneR was applied with GBDT, it gave the most favorable results based on accuracy, F-measure, and ROC. Conclusion: The diagnosis model has also been deployed to assist doctors, patients, medical practitioners, and other healthcare professionals for early diagnosis and better treatment of MM.
-
-
-
Identification of Disease-specific Single Amino Acid Polymorphisms Using a Simple Random Forest at Protein-level
Authors: Jian He, Rongao Yuan, Lei Xu, Yanzhi Guo and Menglong LiBackground: The number of human genetic variants deposited into publicly available databases has been increasing exponentially. Among these variants, non-synonymous single nucleotide polymorphisms (nsSNPs), also known as single Amino Acid Polymorphisms (SAPs), have been demonstrated to be strongly correlated with phenotypic variations of traits/diseases. Objective: However, the detailed mechanisms governing the disease association of SAPs remain unclear. Thus, further investigation of new attributes and improvement of the prediction becomes more and more urgent since amount of unknown disease-related SAPs need to be investigated. Methods: Based on the principle of Random Forest (RF), we firstly constructed a new effective prediction model for SAPs associated with a particular disease from protein sequences. Four usual sequence signature extractions were separately performed to select the optimal features. Then SAP peptide lengths from 12 to 202 were also optimized. Results: The optimal models achieve higher than 90% accuracy and Area Under the Curve (AUC) of over 0.9 on all 11 external testing datasets. Finally, the good performance on an independent test set with an accuracy higher than 95% proves the superiority of our method. Conclusion: In this paper, based on Random Forest (RF), we constructed 11 disease-association prediction models for SAPs from the protein sequence level. All models yield prediction accuracy higher than 90% and Area Under the Curve (AUC) more than 0.9. Our method only using the information of protein sequences are more universal than those that depend on some additional information or predictions about the proteins.
-
-
-
Identifying Critical States of Hepatocellular Carcinoma Based on Single- Sample Dynamic Network Biomarkers Combined with Simulated Annealing Algorithm
Authors: Hongqian Zhao, Jie Gao, Yichen Sun, Yujie Wang, Tianhao Guan and Gang ZhouBackground: Hepatocellular Carcinoma (HCC) is one of the most common malignant tumors. Due to the insidious onset and poor prognosis, most patients have reached the advanced stage at the time of diagnosis. Objective: Studies have shown that Dynamic Network Biomarkers (DNB) can effectively identify the critical state of complex diseases such as HCC from normal state to disease state. Therefore, it is very important to detect DNB efficiently and reliably. Methods: This paper selects a dataset containing eight HCC disease states. First, an individual-specific network is constructed for each sample and features are extracted. In the context of this network, a simulated annealing algorithm is used to search for potential dynamic network biomarker modules, and the evolution of HCC is determined. Results: In fact, in the period of Low-Grade Dysplasia (LGD) and High-Grade Dysplasia (HGD), DNB sends an indicative warning signal, which means that liver dysplasia is a very important critical state in the development of HCC disease. Compared with landscape dynamic network biomarkers method (LDNB), our method can not only describe the statistical characteristics of each disease state, but also yield better results including getting more DNBs enriched in HCC related pathways. Conclusion: The results of this study may be of great significance to the prevention and early diagnosis of HCC.
-
-
-
New Method for Sequence Similarity Analysis Based on the Position and Frequency of Statistically Significant Repeats
More LessBackground: The analysis of DNA nucleotide sequence similarity among different species is crucial in identifying their functional, structural or evolutionary relationships. The number of bioinformatics tools designed to perform the similarity analysis of nucleotide sequences has been growing rapidly. According to the current literature, alignment-free methods have not been performed on repetitive nucleotide sequence of different lengths. Objective: To develop a new algorithm for determining sequence characteristics and similarity based on statistically significant repetitive elements of different lengths, which are located in analyzed sequences. Methods: This paper presents Repeats-Position/Frequency method (R-P/F method), for determining nucleotide sequence similarity which takes into consideration statistically significant repetitive parts of analyzed sequences. It is based on information theory and the fact that both position and frequency of repeated sequences are not expected to occur with the identical presence in a random sequence of the same length. Nucleotide sequences are presented in rn-dimensional vector space and their hierarchy is constructed by applying hierarchical clustering algorithm. Results: R-P/F method has been validated on multiple data sets of nucleotide sequences and compared with results obtained from alignment-based algorithms BLAST and Clustal Omega, and multiple wellestablished alignment-free dissimilarity measures. Presented method provides results comparable with other commonly used methods focused on resolving the same problem, with the novel view on the used repetitive parts of sequences in these calculations. Conclusion: The presented, novel algorithm for calculating sequence similarity measure is effective in discovering relationships among the sequences and makes a powerful and complementary addition to existing sequence similarity methods.
-
-
-
Predicting Chromosome Flexibility from the Genomic Sequence Based on Deep Learning Neural Networks
Authors: Jinghao Peng, Jiajie Peng, Haiyin Piao, Zhang Luo, Kelin Xia and Xuequn ShangBackground: The open and accessible regions of the chromosome are more likely to be bound by transcription factors which are important for nuclear processes and biological functions. Studying the change of chromosome flexibility can help to discover and analyze disease markers and improve the efficiency of clinical diagnosis. Current methods for predicting chromosome flexibility based on Hi-C data include the Flexibility-Rigidity Index (FRI) and the Gaussian Network Model (GNM), which have been proposed to characterize chromosome flexibility. However, these methods require the chromosome structure data based on 3D biological experiments, which is time-consuming and expensive. Objective: Generally, the folding and curling of the double helix sequence of DNA have a great impact on chromosome flexibility and function. Motivated by the success of genomic sequence analysis in biomolecular function analysis, we hope to propose a method to predict chromosome flexibility only based on genomic sequence data. Methods: We propose a new method (named "DeepCFP") using deep learning models to predict chromosome flexibility based on only genomic sequence features. The model has been tested in the GM12878 cell line. Results: The maximum accuracy of our model has reached 91%. The performance of DeepCFP is close to FRI and GNM. Conclusion: The DeepCFP can achieve high performance only based on genomic sequence.
-
-
-
Identification of Potential Inhibitors Against SARS-CoV-2 Using Computational Drug Repurposing Study
Authors: Hasan Zulfiqar, Fu-Ying Dao, Hao Lv, Hui Yang, Peng Zhou, Wei Chen and Hao LinBackground: SARS-Cov-2 is a newly emerged coronavirus and causes a severe type of pneumonia in the host organism. So, it is an urgent need to find some inhibitors against SARS-Cov-2. Therefore, drug repurposing study is an effective strategy for treating pneumonia to find the inhibitors of SARS-Cov-2 proteins. Methods: For this purpose, a library of 2500 verified drug chemical compounds was generated and the compounds were docked against Nucleocapsid, Membrane and Envelope protein structures of SARSCov- 2 to determine the binding affinity of the chemical compounds against targeting binding pockets. Moreover, cheminformatics properties and ADMET of these compounds were assessed to find the druglikeness behavior of compounds. The chemical compounds with the lowest S-score were identified as potential inhibitors. Results: Our findings showed that the compound ids 1212, 1019 and 1992 could interact inside the active sites of membrane protein, nucleocapsid protein and envelope protein. Conclusion: This in silico knowledge will be helpful for the design of novel, safe and less expensive drugs against the SARS-Cov-2.
-
-
-
Prediction Model of Thermophilic Protein Based on Stacking Method
Authors: Xian-Fang Wang, Fan Lu, Zhi-Yong Du and Qi-Meng LiBackground: Through the in-depth study of the thermophilic protein heat resistance principle, it is of great significance for people to deeply understand the folding, structure, function, and the evolution of proteins, and the directed design and modification of protein molecules in protein processing. Objective: Aiming at the problem of low accuracy and low efficiency of thermophilic protein prediction, a thermophilic protein prediction model based on the Stacking method is proposed. Methods: Based on the idea of Stacking, this paper uses five features extraction methods, including amino acid composition, g-gap dipeptide, encoding based on grouped weight, entropy density, and autocorrelation coefficient to characterize protein sequences for the selected standard data set. Then, the SVM based on the Gaussian kernel function is used to design the classification prediction model; by taking the prediction results of the five methods as the second layer input, the logistic regression model is used to integrate the experimental results to build a thermophilic protein prediction model based on the Stacking method. Results: The accuracy of the proposed method was found up to 93.75% when verified by the Jackknife method, and a number of performance evaluation indexes were observed to be higher than those of other models, and the overall performance better than that of most of the reported methods. Conclusion: The model presented in this paper has shown strong robustness and can significantly improve the prediction performance of thermophilic proteins.
-
-
-
Construction and Comprehensive Analysis of a Special Competitive Endogenous RNAs Network to Reveal Potential Prognostic Biomarkers for Endometrial Carcinoma
More LessBackground: Endometrial carcinoma (EC) is one of the most common malignancies in women worldwide. For EC patients discovered at an early stage, the prognosis is good. However, the advanced EC patients (stage III-IV) have very poor prognoses. The competitive endogenous RNAs (ceRNA) regulatory network in EC remains unclear, and the relationship between hub RNAs and important clinical characters (clinical stage) has not been strictly studied yet. Objective: In order to study the development of endometrial carcinoma and the identification of early diagnostic markers, the relationship between hub RNAs and important clinical traits (clinical stage) was strictly studied. Methods: The co-expression networks of mRNA, lncRNA, and miRNA were constructed by weighted gene co-expression network analysis. Gene ontology (GO) biological process terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were carried out for DEmRNA. AceRNA regulated network was constructed based on miRcode, miRDB, TargetScan, and miRTarBase. Furthermore, survival analysis, regression analysis of mRNA-lncRNA pairs, and gene set enrichment analysis were carried out. Results: A ceRNA network containing 11 mRNAs, 4 miRNAs, and 18 lncRNAs was constructed based on aberrantly expressed RNAs in the co-expression modules. In this network, 7 mRNAs, 4 lncRNAs, and 1 miRNA were found closely related to the overall survival of EC. The positive correlations of 35 pairs of mRNA and lncRNA in the ceRNA network were obtained. Notably, 5 mRNAs, 3 lncRNAs, and 1 miRNA were identified as potential prognostic biomarkers for EC. Single gene GSEA analysis revealed that the signal pathways related to cell cycle and cancer were highly enriched. Conclusion: Identification of five mRNAs (CBX6, PIM1, RIMS3, SOX11, and XKR7), three lncRNA (WT1-AS, LINC00494, and LINC00501), and one miRNA (miR-195) as potential prognostic biomarkers for EC was helpful for the early diagnosis, prognosis, and development of new treatment strategies of EC patients.
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month
