Volume 20, Issue 8

Current Bioinformatics - Volume 20, Issue 8, 2025

Volume 20, Issue 8, 2025

- A Review of Biosequences Alignment, Matching, and Mining Based on GPU
  
  Authors: Xianghua Kong, Cong Shen and Jijun Tang
  
  https://doi.org/10.2174/0115748936353476241230105816
  More Less
  
  Sequence alignment, pattern matching, and mining are important cornerstones in bioinformatics, and they include identifying genome structure, protein function, and biological metabolic regulatory network. However, because it helps speed up the dealing process, the parallel sequential pattern recognition method has gained attention as data volume has increased. This review summarizes the GPU-based sequence alignment, pattern matching, and mining with the tools and their applications in bioinformatics. After giving an overview of the background, this review first introduces the concept and database of sequence alignment, pattern matching, and mining. Then, the basic architecture and parallel computing principle of GPU are briefly described. Next, the design of GPU-based algorithms and optimization strategies in sequence alignment, pattern matching, and mining are listed in detail. By comparing and analyzing the existing research, the summarization of the advantages and challenges of GPU application in bioinformatics are given. Finally, the future research direction is prospected, including the further development of the algorithm combined with machine learning and deep learning.
  
  Add to my favourites
  
  Email this

- Exploiting Gene Expression Signatures in Breast Cancer Cell Lines to Unveil Novel Drug Candidates and Synergistic Combinations
  
  Authors: Hsueh-Chuan Liu, Chia-Wei Weng and Ka-Lok Ng
  
  https://doi.org/10.2174/0115748936341887241122090418
  More Less
  
  Aim
  This study aimed to study breast cancer, the most common cancer affecting women worldwide, using one primary and two metastatic breast tumor cell lines to identify therapeutic drugs.
  Background
  Investigating the changes in gene expression triggered by drugs offers a robust method for uncovering potential new treatments. Through the analysis of the impacts of drugs on gene activity, scientists can unravel the molecular mechanisms within cells, comprehend the effects of drugs, identify chances for drug repositioning, and foresee patient outcomes to treatments.
  Objective
  Our approach has involved two main strategies: analyzing drug-perturbed gene expression profiles and leveraging drug-induced gene expression profiles. Firstly, we have assessed how drugs affect the expression of target genes in a dose-dependent manner, determining whether they inhibit or activate gene expression. This analysis could inform the identification of new potential drugs. Secondly, we have grouped drugs based on their expression profiles to explore potential synergistic effects.
  Methods
  Our methodology has involved quantifying gene profile changes relative to drug dosage, categorizing effects as up-regulating or down-regulating, and employing functional enrichment with cancer hallmark annotations to predict drugs with potential for cancer treatment. Additionally, we have determined the optimal number of drug groups with similar effects on gene expression and explored their mechanisms of action through cancer hallmark annotations.
  Results
  By analyzing dose-dependent gene expression, we have found that seven, three, and five drugs may induce similar sets of up-regulated and down-regulated genes in Hs-578-T, MCF7, and MDA-MB-231 cell lines, respectively. Clustering and functional enrichment analyses have suggested a shared molecular mechanism of action among these drug candidates.
  Conclusion
  We have thus categorized drugs with opposing gene expression profiles and proposed new drug candidates for breast cancer treatment based on cancer hallmark annotations. Moreover, our study has uncovered synergistic drug combinations, including those utilizing FDA-approved drugs, for primary and metastatic breast cancer cell lines.
  
  Add to my favourites
  
  Email this

- PKE-Ubsite: A Ubiquitylation Site Predictor for Plants Based on Multiple Encoders and Ensemble Deep Learning Framework
  
  Authors: Xin Wang, Zi Meng Zhang and Chang Liu
  
  https://doi.org/10.2174/0115748936347236241119045342
  More Less
  
  Introduction
  Ubiquitylation, a key post-translational modification (PTM), has significant influences on the structures, activities, and functions of proteins and is linked to various diseases. Traditional experimental identification and characterization methods for identifying ubiquitylation sites (Ubsites) are time-consuming, expensive, and labor-intensive if prior knowledge concerning ubiquitylation is absent. Nevertheless, most methods reported for predictions of Ubsites are based on traditional machine learning. Owing to the increased availability of genomic and proteomic samples, deep learning-based recognition methods for Ubsites are becoming increasingly popular.
  Methods
  In this study, we propose a new feature extraction method, pKcode, based on only seven physicochemical features of amino acids (AAs). The pKcode captures both the biochemical context and precise sequence locations of AAs around the Ubsites, improving the predictive capability for ubiquitination. We created the pKPAP encoding scheme by integrating the pKcode with PSDAAP, AAC, and PWAA, resulting in an all-encompassing feature representation. Concurrently, we developed the PKE-Ubsite model.
  Results
  PKE-Ubsite model, a new ensemble prediction framework, amalgamates the power of classifiers in five pipelines: three bidirectional long short-term memory (BiLSTM) networks, one convolutional neural network (CNN), and one random forest (RF) classifier. Each classifier uses an optimized combination of encoding features, and an integrated classification is achieved through a voting mechanism.
  Conclusion
  Finally, compared with existing models on an independent test set, our model has an accuracy of 0.8368, an F1-score of 0.8430, a precision of 0.8124, a recall of 0.8760, and an AUC of 0.9103, which are superior to all methods reported to date. Overall, PKE-Ubsite may facilitate a thorough understanding of ubiquitylation.
  
  Add to my favourites
  
  Email this

- scZIGVAE: A Variational Graph Attention Autoencoder Based on the Zero-Inflated Negative Binomial Distribution for Clustering Single-cell RNA-Seq Data
  
  Authors: Yutian Wang, Ke Gao, Zhaomei Li, Chuanxin Liu, Cunmei Ji, Lijuan Qiao and Chunhou Zheng
  
  https://doi.org/10.2174/0115748936348851241230113213
  More Less
  
  Background
  Single-cell RNA sequencing (scRNA-seq) technology has opened new horizons in studying cellular diversity, helping researchers distinguish the gene expression patterns of each cell, identify rare cell types, and explore the dynamics of gene expression in specific cells under different environments. Clustering plays a central role in revealing unknown cell types and downstream analysis of scRNA-seq. However, the high dimensionality, high noise, and common data missing issues in scRNA-seq data significantly limit the performance of clustering. Traditional embedding algorithms often ignore the characteristics of the underlying distribution when dealing with scRNA-seq data.
  Aims
  In this study, we aim to achieve clustering analysis of single-cell RNA sequencing (scRNA-seq) data by developing and applying a variational graph attention autoencoder model based on the zero-inflated negative binomial (ZINB) distribution.
  Methods
  Therefore, we propose a scRNA-seq data clustering analysis method, scZIGVAE, which integrates the zero-inflated negative binomial (ZINB) model and variational graph attention autoencoder. It enhances the learning of complex topological structures between cells while modeling missing events. By jointly optimizing the ZINB loss and cell graph reconstruction loss to estimate missing data, scZIGVAE generates cell representations that are more suitable for clustering. Furthermore, through the method of self-optimizing embedded clustering, the clustering centers are iteratively updated to fine-tune the clustering effect of the model further.
  Results
  Extensive testing on twelve datasets from different single-cell RNA sequencing platforms has demonstrated that the scZIGVAE method outperforms current sota clustering techniques.
  Conclusion
  In summary, our research findings demonstrate that by incorporating the Zero-Inflated Negative Binomial (ZINB) distribution strategy into the Variational Graph Autoencoder (VGAE) architecture, we are able to achieve better estimation of missing values during decoding. Furthermore, the utilization of multiple loss constraints on the generated latent representations renders them more conducive to downstream analyses.
  
  Add to my favourites
  
  Email this

- InConTPSS: Multi-scale Module Based Temporal Convolutional Networks for Accurate Protein Secondary Prediction
  
  Authors: Xun Wang, Yuan Gao, Haonan Song, Zhiyi Pan and Xianjin Xie
  
  https://doi.org/10.2174/0115748936330905241220203450
  More Less
  
  Background
  Protein secondary structure prediction is an important task in bioinformatics and structural biology. Protein’s structure is the basis for its corresponding function. Experimental methods for determining the tertiary structure of proteins are both costly and time-consuming. Since the tertiary structure of proteins is further formed by secondary structure, leveraging computational approaches for efficient prediction of protein secondary structure is important. Both local and global interactions between amino acids affect the prediction results.
  Objective
  We propose a module aimed at processing sequence profile features for deep feature extraction and constructing a lightweight network to extract fused features.
  Methods
  To enhance the network’s ability to capture both local and global interactions, we propose an efficient method InConTPSS, which integrates convolution operation with different receptive fields and temporal convolutional networks in the inception architecture. Concurrently, InConTPSS takes into account the issue of distribution imbalance across various states of secondary structures and improves the predictive performance of scarce categories.
  Results
  Experimental results on six benchmark datasets (including CASP12, CASP13, CASP14, CB513, TEST2016, and TEST2018) demonstrate our method achieves state-of-the-art performance with a simpler model on both 3-state and 8-state secondary structure prediction.
  Conclusion
  Through the combination of the convolutional layer and temporal convolutional network, the inception network structure can effectively process the fused features and improve the prediction results. InConTPSS achieves the most advanced performance in protein secondary structure prediction, and the reasonable use of label-distribution-aware margin loss in our method can effectively improve the prediction accuracy of scarce secondary structures.
  
  Add to my favourites
  
  Email this

- PredART: Uncertainty-quantified Machine Learning Prediction of Androgen Receptor Agonists Overcoming Imbalanced Dataset
  
  Authors: Jidon Jang, Dokyun Na and Kwang-Seok Oh
  
  https://doi.org/10.2174/0115748936355551241220190451
  More Less
  
  Aim
  This study aims to develop and validate a machine learning-based model for the accurate prediction of Androgen Receptor (AR) agonistic toxicity, addressing the challenges posed by data imbalance in existing predictive models.
  Background
  Anomalous agonistic activity of the androgen receptor is a known major indicator of reproductive toxicity, which can lead to prostate cancer. Machine learning-based models have been developed for the rapid prediction of such agonists. However, the existing models have exhibited biased learning outcomes and low sensitivity due to the imbalance in the available training data. In the early screening process of drug discovery, low sensitivity caused by data imbalance can hinder the detection of potentially toxic compounds.
  Objective
  The objective of this study is to develop a machine learning prediction model that classifies whether a drug candidate is an androgen receptor agonist or not with highly balanced performance compared to existing models.
  Methods
  PredART is a bootstrap aggregated k-nearest neighbor model for the balanced prediction of androgen receptor agonistic toxicity using 381 active and 8,089 inactive datasets with structural features of them.
  Results
  In this work, we propose an advanced model that combines the bootstrap aggregating algorithm with machine learning binary classifiers to identify androgen receptor-based reproductive toxicity while avoiding biased prediction results. The optimal model using k-nearest neighbor classifiers achieved an accuracy of 0.831, Positive Predictive Value (PPV) of 0.882, sensitivity of 0.625, specificity of 0.951, Mathews Correlation Coefficient (MCC) of 0.633 on external test data, demonstrating a significant improvement in sensitivity compared to the previous study and achieving balanced learning. Furthermore, by calculating the standard deviation among outputs of the classifiers and employing this prediction uncertainty as a screening metric to select reliable predictions, the model's performance could be further enhanced.
  Conclusion
  Based on the bootstrap aggregating algorithm, our prediction model effectively addressed data imbalance while evaluating the performance of various machine learning and deep learning classifiers for a benchmark. Additionally, by quantifying uncertainty, our model provided an intuitive assessment of prediction reliability during large-scale screening processes.
  
  Add to my favourites
  
  Email this

Most Cited Most Cited RSS feed

- A Review of Ensemble Methods in Bioinformatics
  
  Authors: Pengyi Yang, Yee Hwa Yang, Bing B. Zhou and Albert Y. Zomaya
- Bioinformatics Tools for Mass Spectroscopy-Based Metabolomic Data Processing and Analysis
  
  Authors: Masahiro Sugimoto, Masato Kawakami, Martin Robert, Tomoyoshi Soga and Masaru Tomita
- Distance-based Support Vector Machine to Predict DNA N6- methyladenine Modification
  
  Authors: Haoyu Zhang, Quan Zou, Ying Ju, Chenggang Song and Dong Chen
- A Review on the Recent Developments of Sequence-based Protein Feature Extraction Methods
  
  Authors: Jun Zhang and Bin Liu
- Molecular Genetic Markers: Discovery, Applications, Data Storage and Visualisation
  
  Authors: Chris Duran, Nikki Appleby, David Edwards and Jacqueline Batley
- A Brief Survey of Machine Learning Methods in Protein Sub-Golgi Localization
  
  Authors: Wuritu Yang, Xiao-Juan Zhu, Jian Huang, Hui Ding and Hao Lin
- Cancer Diagnosis Through IsomiR Expression with Machine Learning Method
  
  Authors: Zhijun Liao, Dapeng Li, Xinrui Wang, Lisheng Li and Quan Zou
- Relevance of Molecular Docking Studies in Drug Designing
  
  Authors: Ritu Jakhar, Mehak Dangi, Alka Khichi and Anil K. Chhillar
- The Advances and Challenges of Deep Learning Application in Biological Big Data Processing
  
  Authors: Li Peng, Manman Peng, Bo Liao, Guohua Huang, Weibiao Li and Dingfeng Xie
- Gene Expression Profile Classification: A Review
  
  Authors: Musa H. Asyali, Dilek Colak, Omer Demirkaya and Mehmet S. Inan
More Less

Current Bioinformatics - Volume 20, Issue 8, 2025

Volume 20, Issue 8, 2025

Volumes & issues

Most Read This Month

Most Cited Most Cited RSS feed