Current Bioinformatics - Volume 18, Issue 3, 2023
Volume 18, Issue 3, 2023
-
-
Drug Design and Disease Diagnosis: The Potential of Deep Learning Models in Biology
Early prediction and detection enable reduced transmission of human diseases and provide healthcare professionals ample time to make subsequent diagnoses and treatment strategies. This, in turn, aids in saving more lives and results in lower medical costs. Designing small chemical molecules to treat fatal disorders is also urgently needed to address the high death rate of these diseases worldwide. A recent analysis of published literature suggested that deep learning (DL) based models apply more potential algorithms to hybrid databases of chemical data. Considering the above, we first discussed the concept of DL architectures and their applications in drug development and diagnostics in this review. Although DL-based approaches have applications in several fields, in the following sections of the article, we focus on recent developments of DL-based techniques in biology, notably in structure prediction, cancer drug development, COVID infection diagnostics, and drug repurposing strategies. Each review section summarizes several cutting-edge, recently developed DL-based techniques. Additionally, we introduced the approaches presented in our group, whose prediction accuracy is relatively comparable with current computational models. We concluded the review by discussing the benefits and drawbacks of DL techniques and outlining the future paths for data collecting and developing efficient computational models.
-
-
-
Predicting COVID-19 Severity Integrating RNA-Seq Data Using Machine Learning Techniques
A fundamental challenge in the fight against COVID-19 is the development of reliable and accurate tools to predict disease progression in a patient. This information can be extremely useful in distinguishing hospitalized patients at higher risk for needing UCI from patients with low severity. How SARS-CoV-2 infection will evolve is still unclear. Methods: A novel pipeline was developed that can integrate RNA-Seq data from different databases to obtain a genetic biomarker COVID-19 severity index using an artificial intelligence algorithm. Our pipeline ensures robustness through multiple cross-validation processes in different steps. Results: CD93, RPS24, PSCA, and CD300E were identified as COVID-19 severity gene signatures. Furthermore, using the obtained gene signature, an effective multi-class classifier capable of discriminating between control, outpatient, inpatient, and ICU COVID-19 patients was optimized, achieving an accuracy of 97.5%. Conclusion: In summary, during this research, a new intelligent pipeline was implemented to develop a specific gene signature that can detect the severity of patients suffering COVID-19. Our approach to clinical decision support systems achieved excellent results, even when processing unseen samples. Our system can be of great clinical utility for the strategy of planning, organizing and managing human and material resources, as well as for automatically classifying the severity of patients affected by COVID-19.
-
-
-
Identifying Diagnostic Biomarkers of Breast Cancer Based on Gene Expression Data and Ensemble Feature Selection
Authors: Lingyu Li, Yousif A. Algabri and Zhi-Ping LiuBackground: In recent years, the identification of biomarkers or signatures based on gene expression profiling data has attracted much attention in bioinformatics. The successful discovery of breast cancer (BRCA) biomarkers will be beneficial in reducing the risk of BRCA among patients for early detection. Methods: This paper proposes an Ensemble Feature Selection method to screen biomarkers (abbreviated as EFSmarker) for BRCA from publically available gene expression data. Firstly, we employ twelve filter feature selection methods, namely median, variance, Chi-square, Relief, Pearson and Spearman correlation, mutual information, minimal-redundancy-maximal-relevance criterion, ridge regression, decision tree and random forest with Gini index and accuracy index, to calculate the importance (weights or coefficients) of all features on the training dataset. Secondly, we apply the logistic regression classifier on the test dataset to calculate the classification AUC value of each feature subset individually selected by twelve methods. Thirdly, we provide an ensemble feature selection method by aggregating feature importance with classification AUC value. In particular, we establish a feature importance score (FIS) to evaluate the importance of each feature underlying all feature selection methods. Finally, the features with higher FIS are taken as identified biomarkers. Results: With the direction of the FIS index induced by the EFSmarker method, 12 genes (COL10A1, COL11A1, MMP11, LOC728264, FIGF, GJB2, INHBA, CD300LG, IGFBP6, PAMR1, CXCL2 and FXYD1) are regarded as diagnostic biomarkers for BRCA. Especially, COL10A1, ranked first with a FIS value of 0.663, is identified as the most credible biomarker. The findings justified via gene and protein expression validation, functional enrichment analysis, literature checking and independent dataset validation verify the effectiveness and efficiency of these selected biomarkers. Conclusion: Our proposed biomarker discovery strategy not only utilizes the feature contribution but also considers the prediction accuracy simultaneously, which may also serve as a model for identifying unknown biomarkers for other diseases from high-throughput gene expression data. The source code and data are available at https://github.com/zpliulab/EFSmarker.
-
-
-
The Most Accurate Way of Predicting Birth Weight in China: Zhuo’s Formula
Authors: Wei Zhang, Hong Yang, Xiaoyi Guo, Yijie Ding, Jingbo Qiu and Xiaohua WangBackground: Pregnancy body mass index (BMI) influences fetal weight, yet no studies focused on the comparison of formulas’ predictive accuracy after considering it. Objective: This study aimed to find out the most accurate formula for predicting birth weight, especially in different BMI pregnant women. Methods: It is a prospective observational study. Using a convenient sampling, the participants who met the criteria for inclusion were recruited in a tertiary hospital from January to March 2019. BMI was calculated according to the pregnant woman’s weight and height at the first obstetric visit. The estimated birth weights were predicted by five formulas based on participants’ uterine height and abdominal circumference of the last obstetric examination. The actual birth weight was scaled in the delivery room. The root mean square error (RMSE), empirical cumulative distribution map (ECDP) and Bland–Altman plot were used to determine the accuracy of the formulas in predicting birth weight. Results: A total of 1197 pregnant women were recruited. The RMSE, median value and difference of Zhuo’s formula in predicting the actual birth weight were the smallest (348.7), the closest to 0 (20.0) g, and the smallest (-0.141 ± 11.511) g, respectively. In subgroup analysis, the RMSE of Zhuo’s formula was the smallest in the low and normal BMI groups, and the difference of Zhuo’s formula by Bland- Altman plot was the smallest (only 0.729±10.440) g in the overweight and obese group. Conclusion: Zhuo’s formula for predicting birth weight has the highest accuracy in different BMI groups. Thus, it is worth recommending for clinical use.
-
-
-
Refining Protein Interaction Network for Identifying Essential Proteins
Authors: Houwang Zhang, Zhenan Feng and Chong WuAim: The study aimed to reconstruct the protein-protein interaction network for the identification of essential proteins. Background: In a living organism, essential proteins play an indispensable role in its survival and development. Hence, how to identify essential proteins from the protein interaction network (PIN) has become a hot topic in the field of bioinformatics. However, existing methods’ accuracies for identifying essential proteins are still limited due to the false positives of the protein-protein interaction data. Objective: The objective of the study was to propose an efficient algorithm for the reconstruction of a protein-protein interaction network. Methods: In this paper, a method for the refinement of PIN based on three kinds of biological data (subcellular localization data, protein complex data, and gene expression data) is proposed. Through evaluating each interaction within the original PIN, a refined clean PIN could be obtained. To verify the effectiveness of the refined PIN for the identification of essential proteins, we applied eight networkbased essential protein discovery methods (DC, BC, CC, LC, HC, SC, LAC, and NC) to it. Results: Based on the obtained experimental results, we demonstrated that the precision for identifying essential proteins could be greatly improved by refining the original PIN using our method. Conclusion: Our method could effectively enhance the protein-protein interaction network and improve the accuracy of identifying essential proteins. In the future, we plan to integrate more biological information to enhance our refinement method and apply it to more species and more PIN-based discovery tasks, like the identification of protein complexes or functional modules.
-
-
-
Drug Repositioning Based on a Multiplex Network by Integrating Disease, Gene, and Drug Information
More LessBackground: The research of new drugs is very expensive and the cycle is relatively long, so it has broad development prospects and good economic benefits to use validated drugs in the treatment of other diseases. Objective: The purpose of drug repositioning is to identify other indications for existing drugs. In addition to using disease and drug information for drug repositioning, other biomolecular information can also be integrated for drug repositioning. Integrating multiple biomolecular data of different types can improve the predictive performance of drug repositioning models. Methods: This paper proposes a drug repositioning algorithm based on a multiplex network (DRMN algorithm) by integrating disease, gene, and drug information. DRMN algorithm utilizes known diseasegene and gene-drug associations to connect disease phenotype similarity network, gene expression similarity network, and drug response similarity network. Then they are constructed into a multiplex network, and the importance score of each node is calculated by PageRank (PR) algorithm. Finally, disease- drug association scores are sorted to achieve drug repositioning. Results: DRMN algorithm is applied to two sets of sample data. Disease-drug association scores are calculated separately from disease PR values and drug PR values in both datasets. In top 50% of association scores, lots of disease-drug association prediction results have been verified by existing results. Compared with other algorithms, DRMN algorithm also shows better performance. Conclusion: DRMN algorithm can effectively integrate multi-omics data for drug repositioning and obtain better prediction results.
-
-
-
LPLSG: Prediction of lncRNA-protein Interaction Based on Local Network Structure
Authors: Wei Wang, Yongqing Wang, Bin Sun, Shihao Liang, Dong Liu, Hongjun Zhang and Xianfang WangBackground: The interaction between RNA and protein plays an important role in life activities. Long ncRNAs (lncRNAs) are large non-coding RNAs, and have received extensive attention in recent years. Because the interaction between RNA and protein is tissue-specific and condition-specific, it is time-consuming and expensive to predict the interaction between lncRNA and protein based on biological wet experiments. Objective: The contribution of this paper is to propose a method for prediction based on the local structural similarity of lncRNA-protein interaction (LPI) network. Methods: The method computes the local structure similarity of network space, and maps it to LPI space, and uses an innovative algorithm that combined Resource Allocation and improved Collaborative Filtering algorithm to calculate the potential LPI. Conclusion: AUPR and AUC are significantly better than the five popular baseline methods. In addition, the case study shows that some results of LPLSG prediction on the actual data set have been verified by NPInterV4.0 database and some literatures.
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month
