Current Bioinformatics - Volume 20, Issue 6, 2025
Volume 20, Issue 6, 2025
-
-
Recent Progress of Deep Learning Methods for RBP Binding Sites Prediction on circRNA
Authors: Zhengfeng Wang, Xiujuan Lei, Yuchen Zhang, Fang-Xiang Wu and Yi PanThe interaction between circular RNA (circRNA) and RNA binding protein (RBP) plays an important biological role in the occurrence and development of various diseases. High-throughput biological experimental methods such as CLIP-seq can effectively analyze the interaction between the two, but biological experiments are inefficient and expensive, and they can only capture binding sites of a specific RBP on circRNA in a selected cell environment at a time. These biological experiments still rely on downstream data analysis to understand the mechanisms behind many biological structures and physiological processes. However, the rapid growth of experimental data dimensions and production speed pose challenges to traditional analysis methods. In recent years, deep learning has made great progress in the genome and transcriptome, and some deep learning prediction algorithms for RBP binding sites on circRNA have also emerged. In this paper, we briefly introduce some biological background knowledge related to circRNA-RBP interaction; present relevant deep learning techniques in this field, including the problem formulation, data source, sequence encoding, deep learning model and overall process of RBP binding sites prediction on circRNA; deeply analyze the current deep learning methods. Finally, some problems existing in the current research and the direction of future research are discussed. It is hoped to help researchers without basic knowledge of deep learning or basic biological background quickly understand the RBP binding sites prediction on circRNA.
-
-
-
Multinomial Logistic Regression with Adaptive Regularization for Cancer Subtype Classification via Multi-omics Data
Authors: Yingdi Wu, Fuzhen Cao and Juntao LiBackgroundIntegrating multi-omics data for cancer classification brings complementary biological insights while also facing challenges such as data integration, gene grouping, and adaptive weight construction.
ObjectiveThis paper aims to address the challenges faced by the cancer subtype classification and gene screening based on multi-omics data.
MethodsMultinomial logistic regression with adaptive regularization (MLRAR) was proposed by integrating DNA methylation, gene mutation, and RNA-seq information. A data preprocessing strategy that effectively utilizes multi-omics information was presented, and the local maximum quasi-clique merging (lmQCM) algorithm was implemented to group genes. Biological pathway information was utilized to evaluate the significance of gene groups, while the significance of each gene within a group was evaluated by integrating mutation information, information theory, and methylation information.
ResultsCompared to MRlasso, MRGL, MSGL, MROGL, AMRSOGL, and AGLRMR, the proposed method yielded improvements in subtype classification accuracy of breast cancer by 2.6%, 2.9%, 3.5%, 2.3%, 2.0%, and 1.8%, respectively. In addition, MLRAR also achieved significant improvements in ovarian cancer by 8.2%, 5.0%, 6.8%, 5.2%, 12.7%, and 6.3%, respectively.
ConclusionThe proposed method can effectively deal with data integration, gene grouping, and adaptive weight construction.
-
-
-
GenRepAI: Utilizing Artificial Intelligence to Identify Repeats in Genomic Suffix Trees
More LessBackgroundThe human genome is densely populated with repetitive DNA sequences that play crucial roles in genomic functions and structures but are also implicated in over 40 human diseases. The computational challenge of identifying and characterizing these repeats is significant due to the complexity and size of the genome, which are overwhelming traditional algorithms.
MethodsTo address these challenges, we propose GenRepAI, a deep learning framework to navigate and analyze genomic suffix trees. GenRepAI employs supervised machine learning classifiers trained on labeled datasets of repeat annotations and unsupervised anomaly detection to identify novel repeat sequences. The models are trained using convolutional neural networks (CNNs), long short-term memory networks (LSTMs), and vision transformers to classify and annotate repeats within the human genome.
ResultsGenRepAI is designed to comprehensively profile repeats that underlie various neurological diseases, allowing researchers to identify pathogenic expansions. The framework will integrate into existing genomic analysis pipelines, with the capability to screen patient genomes and highlight potential causal variants for further validation.
ConclusionGenRepAI is set to become a foundational tool in genomics, leveraging artificial intelligence to enhance the characterization of repetitive sequences. It promises significant advancements in the molecular diagnosis of repeat expansion disorders and contributes to a deeper understanding of genomic structure and function, with broad applications in personalized medicine.
-
-
-
CNRBind: Small Molecule-RNA Binding Sites Recognition via Site Significant from Nucleotide and Complex Network Information
Authors: Lichao Zhang, Kang Xiao, Xueting Wang and Liang KongBackgroundSmall molecule-RNA binding sites play a significant role in developing drugs for disease treatment. However, it is a challenge to propose accurate computational tools for identifying these binding sites.
MethodsIn this study, an accurate prediction model named CNRBind was constructed by extracting site significant information from nucleotide and complex networks. We designed complex networks and calculated three topological structural parameters according to RNA tertiary structure. Acknowledging nucleotide interdependence, a sliding window was selected to integrate the influence of adjacent sites. Finally, the model was constructed using a random forest classifier.
ResultsCompared to the other computational tools, CNRBind was competitive and had excellent discriminative ability for metal ion-binding site prediction. Furthermore, statistic analysis revealed significant differences between CNRBind and existing methods. Additionally, CNRBind is a promising predictor in cases where experimental tertiary structure is unavailable.
ConclusionThese results show that CNRBind is effective because of the proposed site significant information encoding strategy. The approach provides a reasonable supplement for biology researches. The dataset and resource codes can be accessed at: https://github.com/Kangxiaoneuq/CNRBind.
-
-
-
Prediction of miRNA-disease Associations by Deep Matrix Decomposition Method based on Fused Similarity Information
Authors: Xia Chen, Qiang Qu, Xiang Zhang, Hao Nie, Xiuxiu Chao, Weihao Ou, Haowen Chen and Xiangzheng FuAimMicroRNAs (miRNAs), pivotal regulators in various biological processes, are closely linked to human diseases. This study aims to propose a computational model, SIDMF, for predicting miRNA-disease associations.
BackgroundComputational methods have proven efficient in predicting miRNA-disease associations, leveraging functional similarity and network-based inference. Machine learning techniques, including support vector machines, semi-supervised algorithms, and deep learning models, have gained prominence in this domain.
ObjectiveDevelop a computational model that integrates disease semantic similarity and miRNA functional similarity within a deep matrix factorization framework to predict potential associations between miRNAs and diseases accurately.
MethodsSIDMF, introduced in this study, integrates disease semantic similarity and miRNA functional similarity within a deep matrix factorization framework. Through the reconstruction of the miRNA-disease association matrix, SIDMF predicts potential associations between miRNAs and diseases.
ResultsThe performance of SIDMF was evaluated using global Leave-One-Out Cross-Validation (LOOCV) and local LOOCV, achieving high Area Under the Curve (AUC) values of 0.9536 and 0.9404, respectively. Comparative analysis against other methods demonstrated the superior performance of SIDMF. Case studies on breast cancer, esophageal cancer, and prostate cancer further validated SIDMF's predictive accuracy, with a substantial percentage of the top 50 predicted miRNAs confirmed in relevant databases.
ConclusionSIDMF emerges as a promising computational model for predicting potential associations between miRNAs and diseases. Its robust performance in global and local evaluations, along with successful case studies, underscores its potential contributions to disease prevention, diagnosis, and treatment.
-
-
-
TCM@MPXV: A Resource for Treating Monkeypox Patients in Traditional Chinese Medicine
Authors: Xin Zhang, Feiran Zhou, Pinglu Zhang, Quan Zou and Ying ZhangIntroductionTraditional Chinese Medicine (TCM) has been extensively employed in the treatment of Monkeypox Virus (MPXV) infections, and it has historically played a significant role in combating diseases like contagious pox-like viral diseases in China.
MethodsVarious traditional Chinese medicine (TCM) therapies have been recommended for patients with monkeypox virus (MPXV). However, as far as we know, there is no comprehensive database dedicated to preserving and coordinating TCM remedies for combating MPXV. To address this gap, we introduce TCM@MPXV, a carefully curated repository of research materials focusing on formulations with anti-MPXV properties. Importantly, TCM@MPXV extends its scope beyond herbal remedies, encompassing mineral-based medicines as well.
ResultsThe current iteration of TCM@MPXV boasts an impressive array of features, including (1) Documenting over 42 types of TCM herbs, with more than 27 unique herbs; (2) Recording over 285 bioactivity compounds within these herbs; (3) Launching a user-friendly web server for the docking, analysis, and visualization of 2D or 3D molecular structures; and (4) Providing 3D structures of druggable proteins of MPXV.
ConclusionTo summarize, TCM@MPXV presents a user-friendly and effective platform for recording, querying, and viewing anti-MPXV TCM resources and will contribute to the development and explanation of novel anti-MPXV mechanisms of action to aid in the ongoing battle against monkeypox. TCM@MPXV is accessible for academic use at http://101.34.238.132:5000/.
-
-
-
A Parallel Implementation for Large-Scale TSR-based 3D Structural Comparisons of Protein and Amino Acid
Authors: Feng Chen, Tarikul I. Milon, Poorya Khajouie, Antoinette Myers and Wu XuBackgroundProteins play a vital role in sustaining life, requiring the formation of specific 3D structures to manifest their essential biological functions. Structure comparison techniques are benefiting from the ever-expanding repositories of the Protein Data Bank. The development of computational tools for protein and amino acid 3D structural comparisons plays an important role in understanding protein functions. The Triangular Spatial Relationship (TSR)-based was developed for such purpose.
MethodsA parallelization strategy and actual implementation on high-performance clusters using the distributed and shared memory programming model, along with the utilization of multi-core CPU and many-core GPU accelerators, were developed. 3D structures of proteins and amino acids are represented by an integer vector in the TSR-based method. This parallelization strategy is designed for the TSR-based method for large-scale 3D structural comparisons of proteins and amino acids in this study. It can also be adapted to other applications where a vector type of data structure is used.
ResultsDue to the nature of the vector representation of protein and amino acid structures using the TSR-based method, the comparison algorithm is well-suited for parallelization on large scale supercomputers. Performance studies on the representative datasets were conducted to demonstrate the efficiency of the parallelization strategy. It allows comparisons of large 3D protein or amino acid structure datasets to finish within a reasonable amount of time.
ConclusionThe case studies, by taking advantage of this parallelization code, demonstrate that applying either mirror image or feature selection in the TSR-based algorithms improves the classifications of protein and amino acid 3D structures. The TSR keys have the advantage of performing structure-based BLAST searches. The parallelization code could be used as a reference for similar future studies.
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month
