Current Bioinformatics - Volume 20, Issue 8, 2025
Volume 20, Issue 8, 2025
-
-
A Review of Biosequences Alignment, Matching, and Mining Based on GPU
More LessAuthors: Xianghua Kong, Cong Shen and Jijun TangSequence alignment, pattern matching, and mining are important cornerstones in bioinformatics, and they include identifying genome structure, protein function, and biological metabolic regulatory network. However, because it helps speed up the dealing process, the parallel sequential pattern recognition method has gained attention as data volume has increased. This review summarizes the GPU-based sequence alignment, pattern matching, and mining with the tools and their applications in bioinformatics. After giving an overview of the background, this review first introduces the concept and database of sequence alignment, pattern matching, and mining. Then, the basic architecture and parallel computing principle of GPU are briefly described. Next, the design of GPU-based algorithms and optimization strategies in sequence alignment, pattern matching, and mining are listed in detail. By comparing and analyzing the existing research, the summarization of the advantages and challenges of GPU application in bioinformatics are given. Finally, the future research direction is prospected, including the further development of the algorithm combined with machine learning and deep learning.
-
-
-
Exploiting Gene Expression Signatures in Breast Cancer Cell Lines to Unveil Novel Drug Candidates and Synergistic Combinations
More LessAuthors: Hsueh-Chuan Liu, Chia-Wei Weng and Ka-Lok NgAimThis study aimed to study breast cancer, the most common cancer affecting women worldwide, using one primary and two metastatic breast tumor cell lines to identify therapeutic drugs.
BackgroundInvestigating the changes in gene expression triggered by drugs offers a robust method for uncovering potential new treatments. Through the analysis of the impacts of drugs on gene activity, scientists can unravel the molecular mechanisms within cells, comprehend the effects of drugs, identify chances for drug repositioning, and foresee patient outcomes to treatments.
ObjectiveOur approach has involved two main strategies: analyzing drug-perturbed gene expression profiles and leveraging drug-induced gene expression profiles. Firstly, we have assessed how drugs affect the expression of target genes in a dose-dependent manner, determining whether they inhibit or activate gene expression. This analysis could inform the identification of new potential drugs. Secondly, we have grouped drugs based on their expression profiles to explore potential synergistic effects.
MethodsOur methodology has involved quantifying gene profile changes relative to drug dosage, categorizing effects as up-regulating or down-regulating, and employing functional enrichment with cancer hallmark annotations to predict drugs with potential for cancer treatment. Additionally, we have determined the optimal number of drug groups with similar effects on gene expression and explored their mechanisms of action through cancer hallmark annotations.
ResultsBy analyzing dose-dependent gene expression, we have found that seven, three, and five drugs may induce similar sets of up-regulated and down-regulated genes in Hs-578-T, MCF7, and MDA-MB-231 cell lines, respectively. Clustering and functional enrichment analyses have suggested a shared molecular mechanism of action among these drug candidates.
ConclusionWe have thus categorized drugs with opposing gene expression profiles and proposed new drug candidates for breast cancer treatment based on cancer hallmark annotations. Moreover, our study has uncovered synergistic drug combinations, including those utilizing FDA-approved drugs, for primary and metastatic breast cancer cell lines.
-
-
-
PKE-Ubsite: A Ubiquitylation Site Predictor for Plants Based on Multiple Encoders and Ensemble Deep Learning Framework
More LessAuthors: Xin Wang, Zi Meng Zhang and Chang LiuIntroductionUbiquitylation, a key post-translational modification (PTM), has significant influences on the structures, activities, and functions of proteins and is linked to various diseases. Traditional experimental identification and characterization methods for identifying ubiquitylation sites (Ubsites) are time-consuming, expensive, and labor-intensive if prior knowledge concerning ubiquitylation is absent. Nevertheless, most methods reported for predictions of Ubsites are based on traditional machine learning. Owing to the increased availability of genomic and proteomic samples, deep learning-based recognition methods for Ubsites are becoming increasingly popular.
MethodsIn this study, we propose a new feature extraction method, pKcode, based on only seven physicochemical features of amino acids (AAs). The pKcode captures both the biochemical context and precise sequence locations of AAs around the Ubsites, improving the predictive capability for ubiquitination. We created the pKPAP encoding scheme by integrating the pKcode with PSDAAP, AAC, and PWAA, resulting in an all-encompassing feature representation. Concurrently, we developed the PKE-Ubsite model.
ResultsPKE-Ubsite model, a new ensemble prediction framework, amalgamates the power of classifiers in five pipelines: three bidirectional long short-term memory (BiLSTM) networks, one convolutional neural network (CNN), and one random forest (RF) classifier. Each classifier uses an optimized combination of encoding features, and an integrated classification is achieved through a voting mechanism.
ConclusionFinally, compared with existing models on an independent test set, our model has an accuracy of 0.8368, an F1-score of 0.8430, a precision of 0.8124, a recall of 0.8760, and an AUC of 0.9103, which are superior to all methods reported to date. Overall, PKE-Ubsite may facilitate a thorough understanding of ubiquitylation.
-
-
-
scZIGVAE: A Variational Graph Attention Autoencoder Based on the Zero-Inflated Negative Binomial Distribution for Clustering Single-cell RNA-Seq Data
More LessAuthors: Yutian Wang, Ke Gao, Zhaomei Li, Chuanxin Liu, Cunmei Ji, Lijuan Qiao and Chunhou ZhengBackgroundSingle-cell RNA sequencing (scRNA-seq) technology has opened new horizons in studying cellular diversity, helping researchers distinguish the gene expression patterns of each cell, identify rare cell types, and explore the dynamics of gene expression in specific cells under different environments. Clustering plays a central role in revealing unknown cell types and downstream analysis of scRNA-seq. However, the high dimensionality, high noise, and common data missing issues in scRNA-seq data significantly limit the performance of clustering. Traditional embedding algorithms often ignore the characteristics of the underlying distribution when dealing with scRNA-seq data.
AimsIn this study, we aim to achieve clustering analysis of single-cell RNA sequencing (scRNA-seq) data by developing and applying a variational graph attention autoencoder model based on the zero-inflated negative binomial (ZINB) distribution.
MethodsTherefore, we propose a scRNA-seq data clustering analysis method, scZIGVAE, which integrates the zero-inflated negative binomial (ZINB) model and variational graph attention autoencoder. It enhances the learning of complex topological structures between cells while modeling missing events. By jointly optimizing the ZINB loss and cell graph reconstruction loss to estimate missing data, scZIGVAE generates cell representations that are more suitable for clustering. Furthermore, through the method of self-optimizing embedded clustering, the clustering centers are iteratively updated to fine-tune the clustering effect of the model further.
ResultsExtensive testing on twelve datasets from different single-cell RNA sequencing platforms has demonstrated that the scZIGVAE method outperforms current sota clustering techniques.
ConclusionIn summary, our research findings demonstrate that by incorporating the Zero-Inflated Negative Binomial (ZINB) distribution strategy into the Variational Graph Autoencoder (VGAE) architecture, we are able to achieve better estimation of missing values during decoding. Furthermore, the utilization of multiple loss constraints on the generated latent representations renders them more conducive to downstream analyses.
-
-
-
InConTPSS: Multi-scale Module Based Temporal Convolutional Networks for Accurate Protein Secondary Prediction
More LessAuthors: Xun Wang, Yuan Gao, Haonan Song, Zhiyi Pan and Xianjin XieBackgroundProtein secondary structure prediction is an important task in bioinformatics and structural biology. Protein’s structure is the basis for its corresponding function. Experimental methods for determining the tertiary structure of proteins are both costly and time-consuming. Since the tertiary structure of proteins is further formed by secondary structure, leveraging computational approaches for efficient prediction of protein secondary structure is important. Both local and global interactions between amino acids affect the prediction results.
ObjectiveWe propose a module aimed at processing sequence profile features for deep feature extraction and constructing a lightweight network to extract fused features.
MethodsTo enhance the network’s ability to capture both local and global interactions, we propose an efficient method InConTPSS, which integrates convolution operation with different receptive fields and temporal convolutional networks in the inception architecture. Concurrently, InConTPSS takes into account the issue of distribution imbalance across various states of secondary structures and improves the predictive performance of scarce categories.
ResultsExperimental results on six benchmark datasets (including CASP12, CASP13, CASP14, CB513, TEST2016, and TEST2018) demonstrate our method achieves state-of-the-art performance with a simpler model on both 3-state and 8-state secondary structure prediction.
ConclusionThrough the combination of the convolutional layer and temporal convolutional network, the inception network structure can effectively process the fused features and improve the prediction results. InConTPSS achieves the most advanced performance in protein secondary structure prediction, and the reasonable use of label-distribution-aware margin loss in our method can effectively improve the prediction accuracy of scarce secondary structures.
-
-
-
PredART: Uncertainty-quantified Machine Learning Prediction of Androgen Receptor Agonists Overcoming Imbalanced Dataset
More LessAuthors: Jidon Jang, Dokyun Na and Kwang-Seok OhAimThis study aims to develop and validate a machine learning-based model for the accurate prediction of Androgen Receptor (AR) agonistic toxicity, addressing the challenges posed by data imbalance in existing predictive models.
BackgroundAnomalous agonistic activity of the androgen receptor is a known major indicator of reproductive toxicity, which can lead to prostate cancer. Machine learning-based models have been developed for the rapid prediction of such agonists. However, the existing models have exhibited biased learning outcomes and low sensitivity due to the imbalance in the available training data. In the early screening process of drug discovery, low sensitivity caused by data imbalance can hinder the detection of potentially toxic compounds.
ObjectiveThe objective of this study is to develop a machine learning prediction model that classifies whether a drug candidate is an androgen receptor agonist or not with highly balanced performance compared to existing models.
MethodsPredART is a bootstrap aggregated k-nearest neighbor model for the balanced prediction of androgen receptor agonistic toxicity using 381 active and 8,089 inactive datasets with structural features of them.
ResultsIn this work, we propose an advanced model that combines the bootstrap aggregating algorithm with machine learning binary classifiers to identify androgen receptor-based reproductive toxicity while avoiding biased prediction results. The optimal model using k-nearest neighbor classifiers achieved an accuracy of 0.831, Positive Predictive Value (PPV) of 0.882, sensitivity of 0.625, specificity of 0.951, Mathews Correlation Coefficient (MCC) of 0.633 on external test data, demonstrating a significant improvement in sensitivity compared to the previous study and achieving balanced learning. Furthermore, by calculating the standard deviation among outputs of the classifiers and employing this prediction uncertainty as a screening metric to select reliable predictions, the model's performance could be further enhanced.
ConclusionBased on the bootstrap aggregating algorithm, our prediction model effectively addressed data imbalance while evaluating the performance of various machine learning and deep learning classifiers for a benchmark. Additionally, by quantifying uncertainty, our model provided an intuitive assessment of prediction reliability during large-scale screening processes.
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month