Volume 18, Issue 3

Current Bioinformatics - Volume 18, Issue 3, 2023

Volume 18, Issue 3, 2023

- Explainable Artificial Intelligence for Protein Function Prediction: A Perspective View
  
  By Nguyen Quoc Khanh Le
  
  https://doi.org/10.2174/1574893618666230220120449
  More Less
  
  Add to my favourites
  
  Email this

- Drug Design and Disease Diagnosis: The Potential of Deep Learning Models in Biology
  
  Authors: Sarojini Sreeraman, Mayuri P. Kannan, Raja Babu Singh Kushwah, Vickram Sundaram, Alaguraj Veluchamy, Anand Thirunavukarasou and Konda M. Saravanan
  
  https://doi.org/10.2174/1574893618666230227105703
  More Less
  
  Early prediction and detection enable reduced transmission of human diseases and provide healthcare professionals ample time to make subsequent diagnoses and treatment strategies. This, in turn, aids in saving more lives and results in lower medical costs. Designing small chemical molecules to treat fatal disorders is also urgently needed to address the high death rate of these diseases worldwide. A recent analysis of published literature suggested that deep learning (DL) based models apply more potential algorithms to hybrid databases of chemical data. Considering the above, we first discussed the concept of DL architectures and their applications in drug development and diagnostics in this review. Although DL-based approaches have applications in several fields, in the following sections of the article, we focus on recent developments of DL-based techniques in biology, notably in structure prediction, cancer drug development, COVID infection diagnostics, and drug repurposing strategies. Each review section summarizes several cutting-edge, recently developed DL-based techniques. Additionally, we introduced the approaches presented in our group, whose prediction accuracy is relatively comparable with current computational models. We concluded the review by discussing the benefits and drawbacks of DL techniques and outlining the future paths for data collecting and developing efficient computational models.
  
  Add to my favourites
  
  Email this

- Predicting COVID-19 Severity Integrating RNA-Seq Data Using Machine Learning Techniques
  
  Authors: Javier Bajo-Morales, Daniel Castillo-Secilla, Luis J. Herrera, Octavio Caba, Jose Carlos Prados and Ignacio Rojas
  
  https://doi.org/10.2174/1574893617666220718110053
  More Less
  
  A fundamental challenge in the fight against COVID-19 is the development of reliable and accurate tools to predict disease progression in a patient. This information can be extremely useful in distinguishing hospitalized patients at higher risk for needing UCI from patients with low severity. How SARS-CoV-2 infection will evolve is still unclear. Methods: A novel pipeline was developed that can integrate RNA-Seq data from different databases to obtain a genetic biomarker COVID-19 severity index using an artificial intelligence algorithm. Our pipeline ensures robustness through multiple cross-validation processes in different steps. Results: CD93, RPS24, PSCA, and CD300E were identified as COVID-19 severity gene signatures. Furthermore, using the obtained gene signature, an effective multi-class classifier capable of discriminating between control, outpatient, inpatient, and ICU COVID-19 patients was optimized, achieving an accuracy of 97.5%. Conclusion: In summary, during this research, a new intelligent pipeline was implemented to develop a specific gene signature that can detect the severity of patients suffering COVID-19. Our approach to clinical decision support systems achieved excellent results, even when processing unseen samples. Our system can be of great clinical utility for the strategy of planning, organizing and managing human and material resources, as well as for automatically classifying the severity of patients affected by COVID-19.
  
  Add to my favourites
  
  Email this

- Identifying Diagnostic Biomarkers of Breast Cancer Based on Gene Expression Data and Ensemble Feature Selection
  
  Authors: Lingyu Li, Yousif A. Algabri and Zhi-Ping Liu
  
  https://doi.org/10.2174/1574893618666230111153243
  More Less
  
  Background: In recent years, the identification of biomarkers or signatures based on gene expression profiling data has attracted much attention in bioinformatics. The successful discovery of breast cancer (BRCA) biomarkers will be beneficial in reducing the risk of BRCA among patients for early detection. Methods: This paper proposes an Ensemble Feature Selection method to screen biomarkers (abbreviated as EFSmarker) for BRCA from publically available gene expression data. Firstly, we employ twelve filter feature selection methods, namely median, variance, Chi-square, Relief, Pearson and Spearman correlation, mutual information, minimal-redundancy-maximal-relevance criterion, ridge regression, decision tree and random forest with Gini index and accuracy index, to calculate the importance (weights or coefficients) of all features on the training dataset. Secondly, we apply the logistic regression classifier on the test dataset to calculate the classification AUC value of each feature subset individually selected by twelve methods. Thirdly, we provide an ensemble feature selection method by aggregating feature importance with classification AUC value. In particular, we establish a feature importance score (FIS) to evaluate the importance of each feature underlying all feature selection methods. Finally, the features with higher FIS are taken as identified biomarkers. Results: With the direction of the FIS index induced by the EFSmarker method, 12 genes (COL10A1, COL11A1, MMP11, LOC728264, FIGF, GJB2, INHBA, CD300LG, IGFBP6, PAMR1, CXCL2 and FXYD1) are regarded as diagnostic biomarkers for BRCA. Especially, COL10A1, ranked first with a FIS value of 0.663, is identified as the most credible biomarker. The findings justified via gene and protein expression validation, functional enrichment analysis, literature checking and independent dataset validation verify the effectiveness and efficiency of these selected biomarkers. Conclusion: Our proposed biomarker discovery strategy not only utilizes the feature contribution but also considers the prediction accuracy simultaneously, which may also serve as a model for identifying unknown biomarkers for other diseases from high-throughput gene expression data. The source code and data are available at https://github.com/zpliulab/EFSmarker.
  
  Add to my favourites
  
  Email this

- The Most Accurate Way of Predicting Birth Weight in China: Zhuo’s Formula
  
  Authors: Wei Zhang, Hong Yang, Xiaoyi Guo, Yijie Ding, Jingbo Qiu and Xiaohua Wang
  
  https://doi.org/10.2174/1574893618666230126095738
  More Less
  
  Background: Pregnancy body mass index (BMI) influences fetal weight, yet no studies focused on the comparison of formulas’ predictive accuracy after considering it. Objective: This study aimed to find out the most accurate formula for predicting birth weight, especially in different BMI pregnant women. Methods: It is a prospective observational study. Using a convenient sampling, the participants who met the criteria for inclusion were recruited in a tertiary hospital from January to March 2019. BMI was calculated according to the pregnant woman’s weight and height at the first obstetric visit. The estimated birth weights were predicted by five formulas based on participants’ uterine height and abdominal circumference of the last obstetric examination. The actual birth weight was scaled in the delivery room. The root mean square error (RMSE), empirical cumulative distribution map (ECDP) and Bland–Altman plot were used to determine the accuracy of the formulas in predicting birth weight. Results: A total of 1197 pregnant women were recruited. The RMSE, median value and difference of Zhuo’s formula in predicting the actual birth weight were the smallest (348.7), the closest to 0 (20.0) g, and the smallest (-0.141 ± 11.511) g, respectively. In subgroup analysis, the RMSE of Zhuo’s formula was the smallest in the low and normal BMI groups, and the difference of Zhuo’s formula by Bland- Altman plot was the smallest (only 0.729±10.440) g in the overweight and obese group. Conclusion: Zhuo’s formula for predicting birth weight has the highest accuracy in different BMI groups. Thus, it is worth recommending for clinical use.
  
  Add to my favourites
  
  Email this

- Refining Protein Interaction Network for Identifying Essential Proteins
  
  Authors: Houwang Zhang, Zhenan Feng and Chong Wu
  
  https://doi.org/10.2174/1574893618666230217140446
  More Less
  
  Aim: The study aimed to reconstruct the protein-protein interaction network for the identification of essential proteins. Background: In a living organism, essential proteins play an indispensable role in its survival and development. Hence, how to identify essential proteins from the protein interaction network (PIN) has become a hot topic in the field of bioinformatics. However, existing methods’ accuracies for identifying essential proteins are still limited due to the false positives of the protein-protein interaction data. Objective: The objective of the study was to propose an efficient algorithm for the reconstruction of a protein-protein interaction network. Methods: In this paper, a method for the refinement of PIN based on three kinds of biological data (subcellular localization data, protein complex data, and gene expression data) is proposed. Through evaluating each interaction within the original PIN, a refined clean PIN could be obtained. To verify the effectiveness of the refined PIN for the identification of essential proteins, we applied eight networkbased essential protein discovery methods (DC, BC, CC, LC, HC, SC, LAC, and NC) to it. Results: Based on the obtained experimental results, we demonstrated that the precision for identifying essential proteins could be greatly improved by refining the original PIN using our method. Conclusion: Our method could effectively enhance the protein-protein interaction network and improve the accuracy of identifying essential proteins. In the future, we plan to integrate more biological information to enhance our refinement method and apply it to more species and more PIN-based discovery tasks, like the identification of protein complexes or functional modules.
  
  Add to my favourites
  
  Email this

- Drug Repositioning Based on a Multiplex Network by Integrating Disease, Gene, and Drug Information
  
  Authors: Gang Zhou, Chenxu Xuan, Yan Wang, Bai Zhang, Hanwen Wu and Jie Gao
  
  https://doi.org/10.2174/1574893618666230223114427
  More Less
  
  Background: The research of new drugs is very expensive and the cycle is relatively long, so it has broad development prospects and good economic benefits to use validated drugs in the treatment of other diseases. Objective: The purpose of drug repositioning is to identify other indications for existing drugs. In addition to using disease and drug information for drug repositioning, other biomolecular information can also be integrated for drug repositioning. Integrating multiple biomolecular data of different types can improve the predictive performance of drug repositioning models. Methods: This paper proposes a drug repositioning algorithm based on a multiplex network (DRMN algorithm) by integrating disease, gene, and drug information. DRMN algorithm utilizes known diseasegene and gene-drug associations to connect disease phenotype similarity network, gene expression similarity network, and drug response similarity network. Then they are constructed into a multiplex network, and the importance score of each node is calculated by PageRank (PR) algorithm. Finally, disease- drug association scores are sorted to achieve drug repositioning. Results: DRMN algorithm is applied to two sets of sample data. Disease-drug association scores are calculated separately from disease PR values and drug PR values in both datasets. In top 50% of association scores, lots of disease-drug association prediction results have been verified by existing results. Compared with other algorithms, DRMN algorithm also shows better performance. Conclusion: DRMN algorithm can effectively integrate multi-omics data for drug repositioning and obtain better prediction results.
  
  Add to my favourites
  
  Email this

- LPLSG: Prediction of lncRNA-protein Interaction Based on Local Network Structure
  
  Authors: Wei Wang, Yongqing Wang, Bin Sun, Shihao Liang, Dong Liu, Hongjun Zhang and Xianfang Wang
  
  https://doi.org/10.2174/1574893618666230223143914
  More Less
  
  Background: The interaction between RNA and protein plays an important role in life activities. Long ncRNAs (lncRNAs) are large non-coding RNAs, and have received extensive attention in recent years. Because the interaction between RNA and protein is tissue-specific and condition-specific, it is time-consuming and expensive to predict the interaction between lncRNA and protein based on biological wet experiments. Objective: The contribution of this paper is to propose a method for prediction based on the local structural similarity of lncRNA-protein interaction (LPI) network. Methods: The method computes the local structure similarity of network space, and maps it to LPI space, and uses an innovative algorithm that combined Resource Allocation and improved Collaborative Filtering algorithm to calculate the potential LPI. Conclusion: AUPR and AUC are significantly better than the five popular baseline methods. In addition, the case study shows that some results of LPLSG prediction on the actual data set have been verified by NPInterV4.0 database and some literatures.
  
  Add to my favourites
  
  Email this

Most Cited Most Cited RSS feed

- A Review of Ensemble Methods in Bioinformatics
  
  Authors: Pengyi Yang, Yee Hwa Yang, Bing B. Zhou and Albert Y. Zomaya
- Bioinformatics Tools for Mass Spectroscopy-Based Metabolomic Data Processing and Analysis
  
  Authors: Masahiro Sugimoto, Masato Kawakami, Martin Robert, Tomoyoshi Soga and Masaru Tomita
- Distance-based Support Vector Machine to Predict DNA N6- methyladenine Modification
  
  Authors: Haoyu Zhang, Quan Zou, Ying Ju, Chenggang Song and Dong Chen
- A Review on the Recent Developments of Sequence-based Protein Feature Extraction Methods
  
  Authors: Jun Zhang and Bin Liu
- Molecular Genetic Markers: Discovery, Applications, Data Storage and Visualisation
  
  Authors: Chris Duran, Nikki Appleby, David Edwards and Jacqueline Batley
- A Brief Survey of Machine Learning Methods in Protein Sub-Golgi Localization
  
  Authors: Wuritu Yang, Xiao-Juan Zhu, Jian Huang, Hui Ding and Hao Lin
- Cancer Diagnosis Through IsomiR Expression with Machine Learning Method
  
  Authors: Zhijun Liao, Dapeng Li, Xinrui Wang, Lisheng Li and Quan Zou
- Relevance of Molecular Docking Studies in Drug Designing
  
  Authors: Ritu Jakhar, Mehak Dangi, Alka Khichi and Anil K. Chhillar
- The Advances and Challenges of Deep Learning Application in Biological Big Data Processing
  
  Authors: Li Peng, Manman Peng, Bo Liao, Guohua Huang, Weibiao Li and Dingfeng Xie
- Gene Expression Profile Classification: A Review
  
  Authors: Musa H. Asyali, Dilek Colak, Omer Demirkaya and Mehmet S. Inan
More Less

Current Bioinformatics - Volume 18, Issue 3, 2023

Volume 18, Issue 3, 2023

Volumes & issues

Most Read This Month

Most Cited Most Cited RSS feed