Volume 18, Issue 8

Current Bioinformatics - Volume 18, Issue 8, 2023

Volume 18, Issue 8, 2023

- An Overview of Protein Function Prediction Methods: A Deep Learning Perspective
  
  Authors: Emilio Ispano, Federico Bianca, Enrico Lavezzo and Stefano Toppo
  
  https://doi.org/10.2174/1574893618666230505103556
  More Less
  
  Predicting the function of proteins is a major challenge in the scientific community, particularly in the post-genomic era. Traditional methods of determining protein functions, such as experiments, are accurate but can be resource-intensive and time-consuming. The development of Next Generation Sequencing (NGS) techniques has led to the production of a large number of new protein sequences, which has increased the gap between available raw sequences and verified annotated sequences. To address this gap, automated protein function prediction (AFP) techniques have been developed as a faster and more cost-effective alternative, aiming to maintain the same accuracy level. Several automatic computational methods for protein function prediction have recently been developed and proposed. This paper reviews the best-performing AFP methods presented in the last decade and analyzes their improvements over time to identify the most promising strategies for future methods. Identifying the most effective method for predicting protein function is still a challenge. The Critical Assessment of Functional Annotation (CAFA) has established an international standard for evaluating and comparing the performance of various protein function prediction methods. In this study, we analyze the best-performing methods identified in recent editions of CAFA. These methods are divided into five categories based on their principles of operation: sequence-based, structure-based, combined-based, ML-based and embeddings-based. After conducting a comprehensive analysis of the various protein function prediction methods, we observe that there has been a steady improvement in the accuracy of predictions over time, mainly due to the implementation of machine learning techniques. The present trend suggests that all the bestperforming methods will use machine learning to improve their accuracy in the future. We highlight the positive impact that the use of machine learning (ML) has had on protein function prediction. Most recent methods developed in this area use ML, demonstrating its importance in analyzing biological information and making predictions. Despite these improvements in accuracy, there is still a significant gap compared with experimental evidence. The use of new approaches based on Deep Learning (DL) techniques will probably be necessary to close this gap, and while significant progress has been made in this area, there is still more work to be done to fully realize the potential of DL.
  
  Add to my favourites
  
  Email this

- A Comparison of Mutual Information, Linear Models and Deep Learning Networks for Protein Secondary Structure Prediction
  
  Authors: Saida S. M. Mahmoud, Beatrice Portelli, Giovanni D'Agostino, Gianluca Pollastri, Giuseppe Serra and Federico Fogolari
  
  https://doi.org/10.2174/1574893618666230417103346
  More Less
  
  Background: Over the last several decades, predicting protein structures from amino acid sequences has been a core task in bioinformatics. Nowadays, the most successful methods employ multiple sequence alignments and can predict the structure with excellent performance. These predictions take advantage of all the amino acids at a given position and their frequencies. However, the effect of single amino acid substitutions in a specific protein tends to be hidden by the alignment profile. For this reason, single-sequence-based predictions attract interest even after accurate multiple-alignment methods have become available: the use of single sequences ensures that the effects of substitution are not confounded by homologous sequences. Objective: This work aims at understanding how the single-sequence secondary structure prediction of a residue is influenced by the surrounding ones. We aim at understanding how different prediction methods use single-sequence information to predict the structure. Methods: We compare mutual information, the coefficients of two linear models, and three deep learning networks. For the deep learning algorithms, we use the DeepLIFT analysis to assess the effect of each residue at each position in the prediction. Results: Mutual information and linear models quantify direct effects, whereas DeepLIFT applied on deep learning networks quantifies both direct and indirect effects. Conclusion: Our analysis shows how different network architectures use the information of single protein sequences and highlights their differences with respect to linear models. In particular, the deep learning implementations take into account context and single position information differently, with the best results obtained using the BERT architecture.
  
  Add to my favourites
  
  Email this

- Screening and Identification of Key Genes for Cervical Cancer, Ovarian Cancer and Endometrial Cancer by Combinational Bioinformatic Analysis
  
  Authors: Feng Pang, Dong Shi and Lin Yuan
  
  https://doi.org/10.2174/1574893618666230428095114
  More Less
  
  Introduction: Cervical cancer, ovarian cancer and endometrial cancer are the top three cancers in women. With the rapid development of gene chip and high-throughput sequencing technology, it has been widely used to study genomic functional omics data and identify markers for disease diagnosis and treatment. At the same time, more and more public databases containing genetic data have appeared. The result of the bioinformatic analysis can provide a diagnosis of new perspectives on cell origin and differences. Methods: In this paper, three datasets about cervical cancer, ovarian cancer and endometrial cancer from GEO were used to dig out common DEGs (differentially expressed genes) among cervical cancer/ovarian cancer/endometrial cancer. DEGs contain 400 up-regulation genes and 157 down-regulation genes. Results: The results of GO (gene ontology) functional enrichment analysis show that the BP (biological process) changes of DEGs are mainly in cell division, mitotic nuclear division, sister chromatid cohesion, and DNA replication. The CC (cell component) function enrichments of DEGs were mainly in the nucleoplasm, nucleus, condensed chromosome kinetochore, chromosome, centromeric region. The MF (molecular function) function enrichments of DEGs were mainly in protein binding. The results of the KEGG pathway analysis showed that the upregulation DEGs were mainly enriched in retinoblastoma gene in the cell cycle, cellular senescence, oocyte meiosis, and pathways in cancer, while the downregulation DEGs enriched in thiamine metabolism, protein processing in endoplasmic reticulum. Similarly, the function of the most significant module was enriched in cell division, condensed chromosome kinetochore, and microtubule motor activity. Conclusion: In the result, 4 of the top 10 hub genes (CCNA2, CCNB1, CDC6 and CDK1) will provide help for future biomedical experimental research.
  
  Add to my favourites
  
  Email this

- Non-small Cell Lung Cancer Survival Estimation Through Multi-omic Two-layer SVM: A Multi-omics and Multi-Sources Integrative Model
  
  Authors: Lorenzo Manganaro, Gianmarco Sabbatini, Selene Bianco, Paolo Bironzo, Claudio Borile, Davide Colombi, Paolo Falco, Luca Primo, Shaji Vattakunnel, Federico Bussolino and Giorgio Vittorio Scagliotti
  
  https://doi.org/10.2174/1574893618666230502102712
  More Less
  
  Background: The new paradigm of precision medicine brought an increasing interest in survival prediction based on the integration of multi-omics and multi-sources data. Several models have been developed to address this task, but their performances are widely variable depending on the specific disease and are often poor on noisy datasets, such as in the case of non-small cell lung cancer (NSCLC). Objective: The aim of this work is to introduce a novel computational approach, named multi-omic twolayer SVM (mtSVM), and to exploit it to get a survival-based risk stratification of NSCLC patients from an ongoing observational prospective cohort clinical study named PROMOLE. Methods: The model implements a model-based integration by means of a two-layer feed-forward network of FastSurvivalSVMs, and it can be used to get individual survival estimates or survival-based risk stratification. Despite being designed for NSCLC, its range of applicability can potentially cover the full spectrum of survival analysis problems where integration of different data sources is needed, independently of the pathology considered. Results: The model is here applied to the case of NSCLC, and compared with other state-of-the-art methods, proving excellent performance. Notably, the model, trained on data from The Cancer Genome Atlas (TCGA), has been validated on an independent cohort (from the PROMOLE study), and the results were consistent. Gene-set enrichment analysis of the risk groups, as well as exome analysis, revealed well-defined molecular profiles, such as a prognostic mutational gene signature with potential implications in clinical practice.
  
  Add to my favourites
  
  Email this

- A Pan-cancer Analysis Reveals the Tissue Specificity and Prognostic Impact of Angiogenesis-associated Genes in Human Cancers
  
  Authors: Zhenshen Bao, Minzhen Liao, Wanqi Dong, Yanhao Huo, Xianbin Li, Peng Xu and Wenbin Liu
  
  https://doi.org/10.2174/1574893618666230518163353
  More Less
  
  Introduction: Angiogenesis is one of the hallmarks of cancer and can impact the processes of cancer initiation, progression, and response to therapy. Background: Anti-angiogenic therapy is thus an encouraging therapeutic option to treat cancers, but the detailed angiogenic mechanisms and the association between angiogenesis and clinical outcome remain unknown in different cancers. Methods: Here, we systematically assess the impacts of 82 angiogenesis-associated genes (AAGs) in tumor tissue specificity and prognosis across 16 cancer types. Results: Results demonstrate that the expression patterns of the 82 AAGs can reflect the tumor tissue specificity, and high expressions of up-regulated AAGs are significantly associated with poor prognosis of cancer. We further define a prognostic score for predicting overall survival (OS) based on the expressions of up-regulated AAGs and confirm its reliable predictive ability. Results indicate that a low prognostic score demonstrates a superior OS and vice versa. Conclusion: The results of this study will contribute to the understanding of different tumor angiogenesis mechanisms in various tissues and cancer-personalized anti-angiogenic treatment. The code of our analysis can be accessed at https://github.com/ZhenshenBao/AAGs_analysis.git.
  
  Add to my favourites
  
  Email this

- Multi-channel Partial Graph Integration Learning of Partial Multi-omics Data for Cancer Subtyping
  
  Authors: Qing-Qing Cao, Jian-Ping Zhao and Chun-Hou Zheng
  
  https://doi.org/10.2174/1574893618666230519145545
  More Less
  
  Background: The appearance of cancer subtypes with different clinical significance fully reflects the high heterogeneity of cancer. At present, the method of multi-omics integration has become more and more mature. However, in the practical application of the method, the omics of some samples are missing. Objective: The purpose of this study is to establish a depth model that can effectively integrate and express partial multi-omics data to accurately identify cancer subtypes. Methods: We proposed a novel partial multi-omics learning model for cancer subtypes, MPGIL (Multichannel Partial Graph Integration Learning). MPGIL has two main components. Firstly, it obtains more lateral adjacency information between samples within the omics through the multi-channel graph autoencoders based on high-order proximity. To reduce the negative impact of missing samples, the weighted fusion layer is introduced to replace the concatenate layer to learn the consensus representation across multi-omics. Secondly, a classifier is introduced to ensure that the consensus representation is representative of clustering. Finally, subtypes were identified by K-means. Results: This study compared MPGIL with other multi-omics integration methods on 16 datasets. The clinical and survival results show that MPGIL can effectively identify subtypes. Three ablation experiments are designed to highlight the importance of each component in MPGIL. A case study of AML was conducted. The differentially expressed gene profiles among its subtypes fully reveal the high heterogeneity of cancer. Conclusion: MPGIL can effectively learn the consistent expression of partial multi-omics datasets and discover subtypes, and shows more significant performance than the state-of-the-art methods.
  
  Add to my favourites
  
  Email this

Most Cited Most Cited RSS feed

- A Review of Ensemble Methods in Bioinformatics
  
  Authors: Pengyi Yang, Yee Hwa Yang, Bing B. Zhou and Albert Y. Zomaya
- Bioinformatics Tools for Mass Spectroscopy-Based Metabolomic Data Processing and Analysis
  
  Authors: Masahiro Sugimoto, Masato Kawakami, Martin Robert, Tomoyoshi Soga and Masaru Tomita
- Distance-based Support Vector Machine to Predict DNA N6- methyladenine Modification
  
  Authors: Haoyu Zhang, Quan Zou, Ying Ju, Chenggang Song and Dong Chen
- A Review on the Recent Developments of Sequence-based Protein Feature Extraction Methods
  
  Authors: Jun Zhang and Bin Liu
- Molecular Genetic Markers: Discovery, Applications, Data Storage and Visualisation
  
  Authors: Chris Duran, Nikki Appleby, David Edwards and Jacqueline Batley
- A Brief Survey of Machine Learning Methods in Protein Sub-Golgi Localization
  
  Authors: Wuritu Yang, Xiao-Juan Zhu, Jian Huang, Hui Ding and Hao Lin
- Cancer Diagnosis Through IsomiR Expression with Machine Learning Method
  
  Authors: Zhijun Liao, Dapeng Li, Xinrui Wang, Lisheng Li and Quan Zou
- Relevance of Molecular Docking Studies in Drug Designing
  
  Authors: Ritu Jakhar, Mehak Dangi, Alka Khichi and Anil K. Chhillar
- The Advances and Challenges of Deep Learning Application in Biological Big Data Processing
  
  Authors: Li Peng, Manman Peng, Bo Liao, Guohua Huang, Weibiao Li and Dingfeng Xie
- Gene Expression Profile Classification: A Review
  
  Authors: Musa H. Asyali, Dilek Colak, Omer Demirkaya and Mehmet S. Inan
More Less

Current Bioinformatics - Volume 18, Issue 8, 2023

Volume 18, Issue 8, 2023

Volumes & issues

Most Read This Month

Most Cited Most Cited RSS feed