Current Bioinformatics - Volume 17, Issue 9, 2022
Volume 17, Issue 9, 2022
-
-
Microarray Analysis Workflow Based on a Genetic Algorithm to Discover Potential Hub Genes
More LessThis paper presents a sequence of steps oriented to gain biological knowledge from microarray gene expression data. The pipeline's core is a canonical multi-objective Genetic Algorithm (GA), which takes a gene expression matrix and a factor as input. The factor groups samples according to different criteria, e.g., healthy tissue and diseased tissue samples. The result of one run of the GA is a gene set with good properties both at the individual level, in terms of differential expression, and at the aggregate level, in terms of correlation between expression profiles. Microarray experiment data are obtained from GEO (Gene Expression Omnibus dataset). As for the pipeline structure, independent runs of the GA are analyzed, genes in common between all the runs are collected, and over-representation analysis is performed. At the end of the process, a small number of genes of interest arise. The methodology is exemplified with a leukemia benchmark dataset, and a group of genes of interest is obtained for the illustrative example.
-
-
-
Combining Network-based and Matrix Factorization to Predict Novel Drug-target Interactions: A Case Study Using the Brazilian Natural Chemical Database
More LessBackground: Chemogenomic techniques use mathematical calculations to predict new Drug- Target Interactions (DTIs) based on drugs' chemical and biological information and pharmacological targets. Compared to other structure-based computational methods, they are faster and less expensive. Network analysis and matrix factorization are two practical chemogenomic approaches for predicting DTIs from many drugs and targets. However, despite the extensive literature introducing various chemogenomic techniques and methodologies, there is no consensus for predicting interactions using a drug or a target, a set of drugs, and a dataset of known interactions. Methods: This study predicted novel DTIs from a limited collection of drugs using a heterogeneous ensemble based on network and matrix factorization techniques. We examined three network-based approaches and two matrix factorization-based methods on benchmark datasets. Then, we used one network approach and one matrix factorization technique on a small collection of Brazilian plant-derived pharmaceuticals. Results: We have discovered two novel DTIs and compared them to the Therapeutic Target Database to detect linked disorders, such as breast cancer, prostate cancer, and Cushing syndrome, with two drugs (Quercetin and Luteolin) originating from Brazilian plants. Conclusion: The suggested approach allows assessing the performance of approaches only based on their sensitivity, independent of their unfavorable interactions. Findings imply that integrating network and matrix factorization results might be a helpful technique in bioinformatics investigations involving the development of novel medicines from a limited range of drugs.
-
-
-
Identification of Plasmodium Secreted Proteins Based on MonoDiKGap and Distance-Based Top-n-Gram Methods
Authors: Xinyi Liao, Xiaomei Gu and Dejun PengBackground: Many malarial infections are caused by Plasmodium falciparum. Accurate classification of the proteins secreted by the malaria parasite, which are essential for the development of anti-malarial drugs, is necessary. Objective: This study aimed at accurately classifying the proteins secreted by the malaria parasite. Methods: Therefore, in order to improve the accuracy of the prediction of Plasmodium secreted proteins, we established a classification model MGAP-SGD. MonodikGap features (k=7) of the secreted proteins were extracted, and then the optimal features were selected by the AdaBoost method. Finally, based on the optimal set of secreted proteins, the model was used to predict the secreted proteins using the Stochastic Gradient Descent (SGD) algorithm. Results: We used a 10-fold cross-validation set and independent test set in the stochastic gradient descent (SGD) classifier to validate the model, and the accuracy rates were found to be 98.5859% and 97.973%, respectively. Conclusion: This study confirms the effectiveness and robustness of the prediction results of the MGAP-SGD model that can meet the prediction requirements of the secreted proteins of Plasmodium.
-
-
-
iATC-NFMLP: Identifying Classes of Anatomical Therapeutic Chemicals Based on Drug Networks, Fingerprints, and Multilayer Perceptron
Authors: Shunrong Tang and Lei ChenBackground: The Anatomical Therapeutic Chemicals (ATC) classification system is a widely accepted drug classification system. It classifies drugs according to the organ or system in which they can operate and their therapeutic, pharmacological, and chemical properties. Assigning drugs into 14 classes in the first level of the system is an essential step to understanding drug properties. Several multi-label classifiers have been proposed to identify drug classes. Although their performance was good, most classifiers directly only adopted drug relationships or the features derived from these relationships, but the essential properties of drugs were not directly employed. Thus, classifiers still have a space for improvement. Objective: The aim of this study was to build a novel and powerful multilabel classifier for identifying classes in the first level of the ATC classification system for given drugs. Methods: A powerful multi-label classifier, namely, iATC-NFMLP, was proposed. Two feature types were adopted to encode each drug. The first type was derived from drug relationships via a network embedding algorithm, whereas the second one represented the fingerprints of drugs. Multilayer perceptron using sigmoid as the activating function was used to learn these features for the construction of the classifier. Results: The 10-fold cross-validation results indicated that a combination of the two feature types could improve the performance of the classifier. The jackknife test on the benchmark dataset with 3883 drugs showed that the accuracy and absolute true were 82.76% and 79.27%, respectively. Conclusion: The performance of iATC-NFMLP was best compared with all previous classifiers.
-
-
-
A Tagging SNP Set Method Based on Network Community Partition of Linkage Disequilibrium and Node Centrality
Authors: Yulin Zhang, Qiang Wan, Xiaochun Cheng, Guangyang Lu, Shudong Wang and Sicheng HeAims: Solving the tagSNP selection problem by network method and reconstructing unknown individual from tagSNPs by a prediction method. Background: As a genetic marker, SNP has been used for linkage analysis of genetic diseases in genome- wide association studies. The genetic information carried by SNPs is redundant in regions of high linkage disequilibrium in the human genome. Therefore, a subset of informative SNPs (tagSNP set) is sufficient to represent the rest of the SNPs, reducing the genotyping cost and computational complexity greatly. Methods: A novel tagSNP set selection method named NCCRT is proposed, which combines the ideas of the network community partition of the SNP network and node centrality ranking to select tagSNPs of genotype data. Results: The method is tested on three data sets, including 176 SNPs, 169 SNPs, and 56 SNPs of gene ASAH1, HTR2A, and OLFM4. The experimental results show that our method achieves the best effect in terms of prediction accuracy and stability for ASAH1 and HTR2A. Conclusion: Compared with random sampling, greedy algorithm, and TSMI algorithm, our method does not rely on causal SNP selection, but it can also quickly identify the tagSNP nodes and improve the prediction accuracy.
-
-
-
NeuMF: Predicting Anti-cancer Drug Response Through a Neural Matrix Factorization Model
Authors: Hui Liu, Jian Yu, Xiangzhi Chen and Lin ZhangBackground: Anti-cancer drug response is urgently required for individualized therapy. Measurements with wet experiments are costly and time-consuming. Artificial intelligence-based models are currently available for predicting drug response but still have challenges in prediction accuracy. Objective: Construct a model to predict drug response values for unknown cell lines and analyze drug potential association properties in sparse data. Methods: Propose a Neural Matrix Factorization (NeuMF) framework to help predict the unknown responses of cell lines to drugs. The model uses a deep neural network to figure out drug and cell lines' latent variables. In NeuMF, the inputs and the parameters of the multi-layer neural network are simultaneously optimized by gradient descent to minimize the reconstruction errors between the predicted and natural values of the observed entries. Then the unknown entries can be readily recovered by propagating the latent variables to the output layer. Results: Experiments on the Cancer Cell Line Encyclopedia (CCLE) dataset and Genomics of Drug Sensitivity in Cancer (GDSC) dataset compare NeuMF with the other three state-of-the-art methods. NeuMF reduces constructing drug or cell line similarity and mines the response matrix itself for correlations in the network, avoiding the inclusion of redundant noise. NeuMF obtained drug averaged PCC_sr of 0.83 and 0.84 on both datasets. It demonstrates that NeuMF substantially improves the prediction. Some essential parameters in NeuMF, such as the global effect removal strategy and the input layer scales, are also discussed. Finally, case studies have shown that NeuMF can better learn the latent characteristics of drugs, e.g., Irinotecan and Topotecan are found to act on the same pathway TOP1. The conclusions are in line with some existing biological findings. Conclusion: NeuMF achieves better prediction accuracy than existing models, and its output is biologically interpretable. NeuMF also helps analyze the correlations between drugs.
-
-
-
SIMEON: Prediction of Chemical-protein Interaction via Stacked Bi-GRU-normalization Network and External Biomedical Knowledge
Authors: Xiaolei Ma, Yang Lu, Yinan Lu and Mingyang JiangBackground: Chemical compounds and proteins/genes are an important class of entities in biomedical research, and their interactions play a key role in precision medicine, drug discovery, basic clinical research, and building knowledge bases. Many computational methods have been proposed to identify chemical–protein interactions. However, the majority of these proposed models cannot model long-distance dependencies between chemical and protein, and the neural networks used to suffer from gradient descent, with little taking into account the characteristics of the chemical structure characteristics of the compound. Methods: To address the above limitations, we propose a novel model, SIMEON, to identify chemical– protein interactions. First, an input sequence is represented with pre-trained language model and an attention mechanism is used to uncover contribution degree of different words to entity relations and potential semantic information. Secondly, key features are extracted by a multi-layer stacked Bidirectional Gated Recurrent Units (Bi-GRU)-normalization residual network module to resolve higherorder dependencies while overcoming network degradation. Finally, the representation is introduced to be enhanced by external knowledge regarding the chemical structure characteristics of the compound external knowledge. Results: Excellent experimental results show that our stacked integration model combines the advantages of Bi-GRU, normalization methods, and external knowledge to improve the performance of the model by complementing each other. Conclusion: Our proposed model shows good performance in chemical-protein interaction extraction, and it can be used as a useful complement to biological experiments to identify chemical-protein interactions.
-
-
-
Identification of Key Prognosis-related microRNAs in Early- and Late- Stage Gynecological Cancers Based on TCGA Data
Authors: Venugopala R. Mekala, Chiang Hui-Shan, Chang Jan-Gowth and Ka-Lok NgBackground: Gynecological cancers (GCs), mainly diagnosed in the late stages of the disease, remain the leading cause of global mortality in women. microRNAs (miRNAs) have been explored as diagnostic and prognostic biomarkers of cancer. Evaluating miRNA signatures to develop prognostic models could be useful in predicting high-risk patients with GC. Specifically, the identification of miRNAs associated with different stages of cancer can be beneficial in patients diagnosed with cancer. Objective: This study aimed to identify potential miRNA signatures for constructing optimal prognostic models in three major GCs using The Cancer Genome Atlas (TCGA) database. Methods: Stage-specific Differentially Expressed microRNAs (DEmiRs) were identified and validated in public and in-house expression datasets. Moreover, various bioinformatics investigations were used to identify potential DEmiRs associated with the disease. All DEmiRs were analyzed using three penalized Cox regression models: lasso, adaptive lasso, and elastic net algorithms. The combined outcomes were evaluated using Best Subset Regression (BSR). Prognostic DEmiR models were evaluated using Kaplan–Meier plots to predict risk scores in patients. The biological pathways of the potential DEmiRs were identified using functional enrichment analysis. Results: A total of 65 DEmiRs were identified in the three cancer types; among them, 17 demonstrated dysregulated expression in public datasets of cervical cancer, and the expression profiles of 9 DEmiRs were changed in CCLE-OV cells, whereas those of 10 are dysregulated in CCLE-UCEC cells. Additionally, ten miRNA expression profiles were observed to be the same as DEmiRs in three OV cancer cell lines. Approximately 30 DEmiRs were experimentally validated in particular cancers. Furthermore, 23 DEmiRs were correlated with the overall survival of the patients. The combined analysis of the three penalized Cox models and BSR analysis predicted eight potential DEmiRs. A total of five models based on five DEmiRs (hsa-mir-526b, hsa-mir-508, and hsa-mir-204 in CESC and hsa-mir-137 and hsa-mir- 1251 in UESC samples) successfully differentiated high-risk and low-risk patients. Functional enrichment analysis revealed that these DEmiRs play crucial roles in GCs. Conclusion: We report potential DEmiR-based prognostic models to predict the high-risk patients with GC and demonstrate the roles of miRNA signatures in the early- and late-stage of GCs.
-
-
-
Prediction and Motif Analysis of 2’-O-methylation Using a Hybrid Deep Learning Model from RNA Primary Sequence and Nanopore Signals
Authors: Shiyang Pan, Yuxin Zhang, Zhen Wei, Jia Meng and Daiyun HuangBackground: 2’-O-Methylation (2’-O-Me) is a post-transcriptional RNA modification that occurs in the ribose sugar moiety of all four nucleotides and is abundant in both coding and non-coding RNAs. Accurate prediction of each subtype of 2’-O-Me (Am, Cm, Gm, Um) helps understand their role in RNA metabolism and function. Objective: This study aims to build models that can predict each subtype of 2’-O-Me from RNA sequence and nanopore signals and exploit the model interpretability for sequence motif mining. Methods: We first propose a novel deep learning model DeepNm to better capture the sequence features of each subtype with a multi-scale framework. Based on DeepNm, we continue to propose HybridNm, which combines sequences and nanopore signals through a dual-path framework. The nanopore signalderived features are first passed through a convolutional layer and then merged with sequence features extracted from different scales for final classification. Results: A 5-fold cross-validation process on Nm-seq data shows that DeepNm outperforms two stateof- the-art 2’-O-Me predictors. After incorporating nanopore signal-derived features, HybridNm further achieved significant improvements. Through model interpretation, we identified not only subtypespecific motifs but also revealed shared motifs between subtypes. In addition, Cm, Gm, and Um shared motifs with the well-studied m6A RNA methylation, suggesting a potential interplay among different RNA modifications and the complex nature of epitranscriptome regulation. Conclusion: The proposed frameworks can be useful tools to predict 2’-O-Me subtypes accurately and reveal specific sequence patterns.
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month
