Current Bioinformatics

Current Bioinformatics - Online First

Description text for Online First listing goes here...

30 results

- scHDR: A Heterogeneous Network Transfer Learning Model for Predicting Single-Cell Drug Responses
  
  Authors: Guanpeng Qi, Liugen Wang, Xiang Chen, Yongle Shi, Yibing Ma, Qing Ren, Yuhan Fu, Mengdi Nan and Jie Gao
  
  https://doi.org/10.2174/0115748936401291251010055212
  
  Available online: 08 January 2026
  More Less
  
  Introduction
  Single-cell RNA sequencing (scRNA-seq) generates expression data from individual cells, and drug response prediction based on these data can aid in drug therapy at the cell level. Existing methods for predicting single-cell drug responses primarily focus on gene expression, neglecting the complex interactions when drugs act on cells and inadequately integrating cross-domain information. This study proposes scHDR, which integrates multiple types of information based on heterogeneous networks and uses transfer learning to achieve cross-domain prediction of cell drug responses.
  
  Methods
  By integrating drug, cell, and gene information from both bulk and single-cell levels into heterogeneous networks, and employing message passing and structure-preserving transfer learning, scHDR predicts single-cell drug responses while maintaining high performance in both domains, with labels in the target domain by default completely unknown during training.
  
  Results
  Comparison experiments across six datasets demonstrate that scHDR outperforms other representative models at both the individual cell and cell cluster levels. Ablation and interpretability experiments confirm the critical role of the message passing and transfer learning components, while domain difference analysis and sensitivity experiments examine the effects of domain discrepancy and network size on model performance, respectively. Additionally, scHDR successfully screens drugs for gastric cancer cells, stratifies drug responses in breast cancer cells over time, and captures the overall response of patient cells, identifying corresponding drug response biomarkers and cell response biomarkers. Key chemical structures of drugs and important genes in cells are also calculated based on gradients.
  
  Discussion
  This model effectively leverages the strengths of heterogeneous networks and transfer learning to improve the accuracy of single-cell drug response prediction. Its components are well coordinated, enabling cross-domain information transfer. Case study results align with existing evidence, demonstrating excellent performance across multiple tasks.
  
  Conclusion
  scHDR provides a novel method for applying complex network modeling to single-cell drug research. Not only does it improve prediction accuracy, but it also offers valuable insights for drug research and precision therapy.
- RF-SCGFS: A Feature Selection Method Based on Secuer and Random Forest Model for Single-cell RNA-Seq Data
  
  Authors: Yutian Wang, Chuanxin Liu, Cunmei Ji, Zongpei Ma, Zongqiang Liu, Lijuan Qiao and Chunhou Zheng
  
  https://doi.org/10.2174/0115748936401797251030055501
  
  Available online: 07 January 2026
  More Less
  
  Introduction
  Single-cell RNA sequencing (scRNA-seq) is crucial for unraveling gene expression complexity. However, existing feature selection methods often overlook the biological significance of co-expressed gene regions, leading to the omission of potential biomarkers.
  
  Methods
  We propose RF-SCGFS, a co-expressed gene region and gene joint selection method based on random forests. The method identifies co-expressed gene regions within homologous cell populations and builds a random forest model using cell type labels generated by the Scalable and Efficient speCtral clUstERing algorithm (Secuer). Feature importance evaluation is applied to select key co-expressed gene regions and genes.
  
  Results
  Experiments on 13 public scRNA-seq datasets demonstrate that RF-SCGFS outperforms traditional methods with average improvements of 0.15 and 0.19 in normalized mutual information (NMI) and adjusted Rand index (ARI), respectively. When combined with mainstream unsupervised algorithms, RF-SCGFS achieves excellent performance (NMI > 0.91 on Yan and Biase datasets). In the PBMC-ctrl dataset, the method successfully identifies genes associated with immune system processes (GO:0006955, p = 2.02E-37).
  
  Discussion
  RF-SCGFS addresses key challenges in single-cell analysis by reducing computational burden through efficient feature selection while maintaining biological relevance through unsupervised clustering-guided selection.
  
  Conclusion
  RF-SCGFS provides an interpretable framework for feature selection in single-cell data, successfully identifying relevant disease genes and revealing the potential value of co-expressed gene regions in analyzing cellular heterogeneity.
- A Hybrid Quantum-Enhanced Sandwich Convolutional Neural Network for Medical Image Classification
  
  Authors: Changzhou Long, Shuaiyu Li, Meng Huang, Xiucai Ye and Tetsuya Sakurai
  
  https://doi.org/10.2174/0115748936418250251104114256
  
  Available online: 05 January 2026
  More Less
  
  Introduction
  Medical image classification is a crucial task in cancer diagnosis, relying on the accurate analysis of high-dimensional imaging data. While Convolutional Neural Networks (CNNs) have shown great success in this domain, their performance is often limited by the shallow feature expressiveness and overfitting, particularly in small or heterogeneous datasets.
  
  Methods
  Quantum machine learning offers new opportunities through high-dimensional representations and nonlinear transformations. In this work, we propose a Quantum-Enhanced Sandwich Convolutional Neural Network (QSCNN), a layered hybrid architecture that integrates quantum and classical modules. The model employs a quanvolutional layer for localized quantum feature extraction, followed by conventional convolution and pooling for hierarchical representation learning, and a variational quantum classifier for nonlinear decision-making.
  
  Results
  QSCNN achieved higher accuracy and training stability than classical CNNs and QCCNN baselines across three medical imaging tasks.: brain tumor MRI, skin cancer dermoscopy, and lung cancer CT. Circuit depth analysis revealed a trade-off between expressiveness and robustness, and additional experiments with depolarizing noise confirmed the model’s resilience under realistic quantum error conditions.
  
  Discussion
  This suggests that circuit design choices influence hybrid model behavior and generalization, supporting the feasibility of quantum-enhanced methods for small-sample medical imaging. However, the current evaluation is limited to relatively small benchmark datasets, and broader validation on large-scale data will be essential to confirm clinical applicability.
  
  Conclusion
  In summary, QSCNN presents a feasible hybrid framework for enhancing medical image classification with quantum features. While preliminary, our results suggest potential advantages in accuracy and stability under NISQ conditions.
- Prediction of Homologous Protein Thermostability at the Single-Cell Level by Incorporating Explicit and Implicit Sequence Features
  
  Authors: Shiming Zhao, Yanbin Gu, Lingzhi Liu and Yanrui Ding
  
  https://doi.org/10.2174/0115748936394443250911224648
  
  Available online: 28 October 2025
  More Less
  
  Introduction
  Considering the heterogeneity of proteins across diverse cell types and states, studying protein thermostability at the single-cell level enables a more profound comprehension of cellular function and the mechanisms underlying disease progression.
  
  Methods
  In this study, we constructed classification and regression models to predict the thermostability difference of homologous protein pairs by integrating implicit features extracted from protein sequences using eight language models, including ProtBERT, AminoBERT, and ProtT5-XL, with explicit sequence features that are manually computed.
  
  Results
  Our results demonstrate that the fusion of explicit and implicit features significantly enhances prediction performance. In classification tasks, the combination of implicit features extracted by AminoBERT and the optimal explicit feature set achieves an accuracy of 87.1%. In regression tasks, the combination of implicit features extracted by Word2vec and the optimal explicit feature set yields a PCC of 0.864 and a R2 of 0.742, which is better than previously reported results.
  
  Discussion
  This study reveals the complementary strengths of language models and handcrafted features in predicting protein thermostability. Combining both types of features significantly improves the performance of classification and regression models and helps identify key factors affecting protein stability. However, the study is limited by its reliance on existing datasets, which may reduce its ability to generalize to novel or rare protein families.
  
  Conclusion
  The integration of implicit and explicit sequence features enables a more comprehensive representation of protein sequences and facilitates the identification of factors influencing the thermostability of orthologous proteins.
- A Diagnostic Aid Platform to Detect the Transition of Mild Cognitive Impairment (MCI) to Alzheimer's Disease (AD)
  
  Authors: Jing Li, Siwen Li, Yat-fung Shea, Ming Yue, Pengfei Zhu, Quan Zou, Shuofeng Yuan, Leung-Wing Chu and You-Qiang Song
  
  https://doi.org/10.2174/0115748936367942250706092947
  
  Available online: 21 October 2025
  More Less
  
  Introduction
  Alzheimer's disease (AD), a leading cause of dementia, affects millions globally. By 2050, it is expected to impact over 100 million people. Mild cognitive impairment (MCI) is often considered a precursor to AD, but not all MCI patients progress to AD. Therefore, accurately predicting the risk of MCI patients converting to AD is essential.
  
  Methods
  This study is a cross-sectional study analyzing routine blood test data collected from AD and MCI patients in Hong Kong between 2000 and 2019. To reduce gender and age bias, subjects were divided into four groups. Models were trained using machine learning and routine blood markers.
  
  Results
  On the independent test set, the model for females aged 65–74 performed best with an AUC of 0.76. For other age groups, the AUCs were as follows: 0.65 for males aged 65–74, 0.66 for females aged 75–89, and 0.67 for males aged 75–89. Based on this, we developed a platform named MAP (http://lab.malab.cn/~lijing/MAP.html) to predict the risk of MCI converting to AD, assisting clinicians and MCI patients in early diagnosis and prevention.
  
  Discussion
  Routine blood markers combined with machine learning offer a practical, non-invasive approach for predicting the risk of MCI-to-AD conversion. Predictive performance varies by age and gender.
  
  Conclusion
  This study supports the use of blood-based machine learning models as cost-effective tools for early AD risk screening in MCI patients.
- A Hybrid Deep Neural Network Utilizing Graph Convolution for the Prediction of CircRNA-RBP Interaction
  
  Authors: Guangyi Tang, Jiayang Li, Dengju Yao, Xiaojuan Zhan and Xiangkui Li
  
  https://doi.org/10.2174/0115748936393849250830080429
  
  Available online: 21 October 2025
  More Less
  
  Introduction
  CircRNA, with its covalently closed circular structure, plays key roles in biological functions and diseases by interacting with RNA-binding proteins (RBPs) and microRNAs (miRNAs). However, existing computational methods struggle to capture secondary structure features.
  
  Methods
  We introduce CSGN, a graph neural network model that predicts circRNA-RBP interactions using secondary structure information. CSGN enhances physicochemical feature encodings by incorporating pseudo-secondary structures from thermodynamic models and utilizes graph convolutional networks (GCNs) for feature extraction. It also integrates Doc2Vec embeddings and employs CNNs, BiGRUs, and MLPs for efficient feature representation.
  
  Results
  CSGN outperforms existing models across 16 datasets. Ablation studies confirm the significance of RNA secondary structure and GCNs in improving prediction accuracy. Principal component analysis further highlights CSGN's strength in feature extraction.
  
  Discussion
  CSGN advances circRNA-RBP prediction by integrating GCNs and Doc2Vec, though global structural constraints remain. Future work should address longer-sequence modeling and experimental validation.
  
  Conclusion
  CSGN effectively improves circRNA-RBP interaction prediction, demonstrating superior performance through the integration of RNA secondary structure and GCNs.
- Computational Tools for Identifying Cancer Driver Genes and Mutations: A Comprehensive Review
  
  By Ali F. Alsulami
  
  https://doi.org/10.2174/0115748936405959250911064633
  
  Available online: 24 September 2025
  More Less
  
  Understanding the genetic basis of cancer requires the accurate identification of driver genes and driver mutations, those alterations that promote tumorigenesis, while distinguishing them from neutral, or passenger, mutations. This review provides a comprehensive overview of computational strategies developed to detect and prioritise cancer drivers at both the gene and mutation levels. The review systematically classifies and compares more than 20 widely used tools, highlighting differences in their conceptual foundations, including sequence-based, structure-based, statistical, machine learning, and network/pathway-based methods. These tools leverage diverse types of data, including mutation frequency and evolutionary conservation, as well as gene expression profiles and interaction networks, to assess the functional relevance of somatic alterations. By integrating complementary approaches, researchers can enhance the sensitivity and specificity of driver prediction, particularly in cases involving rare or heterogeneous mutations. This review aims to serve as a practical guide for researchers and clinicians seeking to apply or evaluate current methods for cancer driver identification.
- GAALSMDA: A Graph Attention-based Fusion Network Integrating Dual Attention and BiLSTM for Microbe-Drug Association Prediction
  
  Authors: Chunling Xiang, Gaoning Shen, Lei Wang and Xianyou Zhu
  
  https://doi.org/10.2174/0115748936395098250827094922
  
  Available online: 18 September 2025
  More Less
  
  Introduction
  Microbes have increasingly become critical new drug targets in human health. However, the paucity of known microbe-drug association data hinders drug discovery. Predicting potential microbe-drug associations can complement traditional experiments and accelerate drug development, making it crucial to develop efficient computational methods.
  
  Methods
  We proposed GAALSMDA, a graph attention-based fusion network. First, a microbe-drug heterogeneous network and feature matrix were constructed by integrating multiple similarities of microbes and drugs. Graph Attention Network (GAT) was used to mine low-dimensional features of microbes and drugs. Then, dual attention mechanism (CBAM) and Bidirectional Long Short-Term Memory (BiLSTM) were applied to fuse local and global features. Finally, a classifier output the likelihood scores of associations.
  
  Results
  The experimental results indicated that the AUC and AUPR evaluation indices of the model reached 0.9900±0.0011, 0.9958±0.0015 and 0.9492±0.0051, 0.9668±0.0042 in MDAD and aBiofilm datasets, respectively, and the prediction performance was significantly superior to that of existing prediction methods.
  
  Discussion
  The outstanding performance highlights GAALSMDA's ability to process sparse data and integrate multi-source information, addressing the limitations of previous models in terms of insufficient feature fusion. However, the similarity calculations of GIP and HIP may introduce parameter uncertainty, which still needs further optimization.
  
  Conclusion
  Our model demonstrates effectiveness and reliability in accurately inferring potential microbe-drug associations.
- CLsquared: A Cleaning and Clustering Tool for Viral Genomic Data
  
  Authors: Giorgia Mazzotti, Martina Bado, Enrico Lavezzo and Stefano Toppo
  
  https://doi.org/10.2174/0115748936416627250905170048
  
  Available online: 18 September 2025
  More Less
  
  Introduction
  During the COVID-19 pandemic, millions of viral genomic sequences were produced and deposited in public databanks. This unprecedented volume of data introduced inaccuracies and errors requiring effective management to ensure reliable scientific outcomes. Despite this, no bioinformatics tools have been developed specifically to comprehensively filter viral genomic datasets.
  
  Methods
  To address this need, we developed CLsquared, a tool suite implemented in Python3 and Bash for the selection of high-quality viral sequences. CLsquared flags sequences exhibiting unverified mutation patterns or metadata. It offers fully customizable filtering parameters and is adaptable to both public and private datasets. The tool supports multiprocessing, significantly reducing runtime on multi-core systems.
  
  Results
  CLsquared detects ambiguous, biologically implausible, and underrepresented mutation sets. Its modular architecture ensures efficient processing of large-scale datasets, optimizing both speed and memory usage.
  
  Discussion
  By systematically addressing sequencing and annotation errors, CLsquared fills a critical gap in current viral bioinformatics workflows. Its flexible and scalable design supports diverse research applications, improving data quality and reproducibility.
  
  Conclusion
  CLsquared is a robust resource for researchers working with large volumes of viral sequence data. It is freely available on GitHub (https://github.com/giorgia-m-95/CLsquared-multiprocessing and https://github.com/giorgia-m-95/CLsquared-base) and Docker Hub (giorgiam95/clsquared_parallel and giorgiam95/clsquared_base).
- Computational Approaches in Multi-Omics for Crop Improvement
  
  Authors: Jeshwanth Kannan, Tamilarasi Palani, Divya Selvakumar, Vijayalakshmi Dhashnamurthi, Varanavasiappan Shanmugam, Kavithamani Duraisamy and Jayakanthan Mannu
  
  https://doi.org/10.2174/0115748936402493250905065506
  
  Available online: 16 September 2025
  More Less
  
  The incorporation of multi-omics strategies, namely genomics, transcriptomics, proteomics, metabolomics, and epigenomics, has been instrumental for promoting crop improvement by providing comprehensive views of the molecular processes driving complex agricultural traits, including enhanced stress tolerance, yield, and nutritional quality. This review presents an overview of the computational methods and tools currently used to analyze and integrate multi-omics data in crops. We then systematically classify them according to integrative strategies (early, intermediate, and late), and analytical methodologies (statistical, machine learning, network-based). Recent advancements in deep learning and explainable AI for predictive trait modeling are highlighted. It also discusses key knowledge gaps, including the under-representation of minor and climate-resilient crops, as well as challenges posed by data heterogeneity, scalability, and field-level validation. Through a newly proposed classification and evaluation framework, the aim of this review is to provide guidelines for researchers to choose computational pipelines and pave the way for future research on data-driven crop improvement and sustainable agriculture.
- An End-to-End 3D Graph Neural Network for Predicting Drug-Target-Disease Associations
  
  Authors: Lei Chen, Wenzhuo Zhu and Daozheng Chen
  
  https://doi.org/10.2174/0115748936387701250811024506
  
  Available online: 11 September 2025
  More Less
  
  Introduction
  In medicine, uncovering the mechanisms of diseases is one of the key research fields, which is helpful in discovering and designing effective treatments. On the other hand, drugs are deemed as one of the efficient ways to treat various diseases. It is essential to understand the mechanisms of action of drugs. The investigation of drug-target, drug-disease, and target-disease associations can promote the research progress on the above problems. However, most studies individually investigated drug-target, drug-disease, and target-disease associations, including the computational models for the prediction of above associations. Drugs, targets, and diseases have high-order associations (triple associations). Investigations on such associations can provide a new and high-level perspective for understanding mechanisms of action of drugs and uncovering mechanisms of diseases. However, the computational approaches for predicting such associations are quite limited. The existing approaches cannot make full use of the relationships among drugs, targets, and diseases, limiting their performance.
  
  Methods
  This study designed an efficient computational model for the prediction of drug-target-disease triple associations. The proposed model first constructed a three-dimensional adjacency matrix to represent known drug-target-disease associations. Raw drug, target, and disease features were derived from this matrix and were further processed by the linear transformation projection, which contained the external associations among different entity types. At the same time, one similarity network was constructed for each entity type (drug, target, or disease), employing the internal relationships in one entity type. The similarity networks and features were fed into a graph convolutional network to extract high-order drug, target, and disease features. Finally, a tensor operation was designed to evaluate the strength of each drug-target-disease association.
  
  Results
  Under the five-fold cross-validation, the model achieved AUROC and AUPR of 0.9530 and 0.9577, respectively. The proposed model outperformed some existing models for the same task.
  
  Discussion
  The ablation test proved the reasonability of the structure of our model. Two latent drug-target-disease associations discovered by our model were analyzed, suggesting the generalization ability of the model.
  
  Conclusion
  The proposed model was efficient in predicting drug-target-disease associations. It can be a useful tool for discovering higher-order associations among drugs, targets, and diseases.
- Attention Graphical Neural Networks-based Single-cell Multi-omics Fusion Analysis of Chromatin Accessibility and Transcriptome Characterization in Alzheimer's Disease
  
  Authors: Pujing Ye, Wei Kong and Shuaiqun Wang
  
  https://doi.org/10.2174/0115748936367139250801163310
  
  Available online: 28 August 2025
  More Less
  
  Introduction
  Single-cell multi-omics technologies provide a comprehensive view of cellular states and transcriptional regulatory mechanisms by integrating diverse omics data. However, their complexity and heterogeneity present significant analytical challenges, particularly in understanding neurodegenerative disorders such as Alzheimer's disease (AD), an irreversible and progressive condition.
  
  Methods
  This study introduces the Multi-Omics Attention Graphical Convolutional Networks (MOAGCN), a novel multilayer deep learning model that addresses the heterogeneity in single-cell multi-omics data to enhance the analysis of multi-omics datasets and uncover potential mechanisms underlying AD. MOAGCN combines Graph Convolutional Networks (GCNs) and Graph Attention Networks (GATs) to simultaneously capture local cellular connectivity and dynamically weight cell-to-cell interactions. The model was applied to AD-related single-cell RNA-seq and ATAC-seq datasets to identify significant gene expression and epigenetic alterations. It was further validated on datasets including DNA methylation, mRNA, and miRNA from other diseases. The model's performance was compared with conventional methods using metrics such as AUC, accuracy, and F1 scores.
  
  Results
  MOAGCN effectively revealed key gene regulatory and protein interaction networks associated with AD, identifying significant changes in gene expression and epigenetic markers. In comparative validation across multiple datasets, MOAGCN outperformed traditional approaches in feature extraction and classification, achieving higher AUC, accuracy, and F1 scores. These results demonstrate its robustness in minimizing false positives and negatives while accurately identifying relevant features.
  
  Discussion
  By testing in the classification of cell types and disease samples, MOAGCN achieved remarkable results, showing that its performance outperformed eight leading algorithms in multi-omics data classification tasks. Further analysis of MOAGCN's accuracy revealed a 95% confidence interval for its performance, reinforcing the model's robustness and stability across different datasets.
  
  Conclusion
  MOAGCN presents a robust and adaptable framework for integrating single-cell multi-omics data, addressing the challenges of complexity and heterogeneity. Its application to AD datasets highlights its potential to uncover regulatory mechanisms and bio-signals, advancing our understanding of complex diseases. This innovative approach holds promise for broad applications in multi-omics data analysis, particularly in elucidating mechanisms underlying neurodegenerative disorders.
- Prediction of Interleukin-binding Sites Combining Multi-Source Features with Integrated Algorithm
  
  Authors: Xiaoyu Niu and YongE Feng
  
  https://doi.org/10.2174/0115748936372515250801173324
  
  Available online: 28 August 2025
  More Less
  
  Introduction
  Interleukins (ILs) are important immune cytokines involved in immune regulation, inflammatory responses, and metabolic control. They are closely associated with various diseases, such as rheumatoid arthritis, atherosclerosis, diabetes, and asthma. However, the specific binding mechanisms of interleukins remain unclear. Studying the binding mechanisms between proteins and interleukins can help to understand the functions of interleukins, disease pathogenesis, and the development of new drugs. This study aims to systematically analyze the characteristics of interleukin family binding sites, uncover their shared features and specific mechanisms, provide new perspectives for understanding their functional roles in ligand-receptor interactions, and elucidate the potential impact of binding sites on signal transduction and immune responses.
  
  Methods
  We constructed a dataset containing both binding and non-binding sites. Extracted eight features based on the sequence, structure, and functional information of the proteins. Six machine learning algorithms, along with an integrated algorithm, were used to predict these features.
  
  Results
  We found that among the machine learning algorithms, the prediction performance using energy features was the best, achieving the highest accuracy (ACC) and area under the ROC curve (AUC). Further feature fusion and ensemble algorithm models significantly improved the predictive performance, with a maximum accuracy (ACC) of 98.4% and an ensemble algorithm accuracy of up to 99.2%.
  
  Discussion
  This study outperforms existing methods, achieving an MCC score of 0.984 with the Gradient Boosting algorithm. However, the limitations of a small sample size and dataset imbalance highlight the need for future research to collect larger and more diverse datasets to improve the model's generalization ability and predictive accuracy. Future studies will aim to verify our method's applicability and develop an online prediction tool to assist in studying small molecule drugs, antibodies, and interleukin binding sites, supporting targeted drug design and treatment of immune-related diseases.
  
  Conclusion
  This study demonstrates that the developed predictive model for interleukin binding sites effectively utilizes geometric and biochemical features, validating the SMOTETomek sampling method in enhancing model performance and providing a basis for targeted drug design and understanding immune response mechanisms.
- Graph Convolution and Attention-Combined Learning for Multi-Type Prediction of miRNA-Disease Associations
  
  Authors: Ya-Fei Liu, Li Zeng, Zu-Guo Yu, Xuan Lin and Jinyan Li
  
  https://doi.org/10.2174/0115748936379275250812105519
  
  Available online: 22 August 2025
  More Less
  
  Introduction
  Associations of abnormally expressed miRNAs with disease development have long been investigated in the biomedical field. The association types are diverse and complex, including circulation type, epigenetic type, target type, and genetic type, as well as various unknown associations and possibly novel association types. However, most current studies focus on the yes/no binary prediction of miRNA–disease associations. Algorithms for multi-type prediction or novel-type discovery of these associations are less developed.
  
  Methods
  Graph convolution and attention mechanisms, integrated within a deep learning framework, form the basis of deepMDpred. In the first step, deepMDpred employs the ViennaRNA tool to derive sequence and functional features of miRNAs by calculating base pairs, minimum free energy, and other relevant properties. In the second step, disease features are extracted using a Graph Convolutional Network (GCN) combined with attention learning, enabling the adaptive capture of the importance of different node features. Finally, a nonlinear fully connected layer (NFCL) is applied to generate the embedding vectors for both diseases and miRNAs.
  
  Results
  In five-fold cross-validation, the model achieved high predictive performance for multi-type miRNA–disease associations. For task 1, the average AUC across the four predicted types exceeded 85%, with the genetics type achieving an accuracy of 0.919. For tasks 2 and 3, the average AUC exceeded 80%, and for the un-association type, the AUC reached 0.894. Validation using the HMDD v2.0 and HMDD v3.2 databases confirmed the robustness of the model, while additional case studies with the HMDD v3.2 and HMDD v4.0 databases demonstrated its applicability. Furthermore, investigations in breast and liver cancers supported the method’s capability to identify novel miRNA–disease associations.
  
  Discussion
  The findings of this study demonstrate the potential of DeepMDpred as a novel and effective approach for predicting multi-type associations between miRNAs and diseases. Validation across multiple databases, along with successful application in case studies on breast and liver cancers, underscores the generalizability and practical utility of this approach. The framework also offers a pathway for identifying novel associations, which may accelerate the discovery of biomarkers and therapeutic targets in complex diseases such as cancer. Nonetheless, certain limitations remain. Although the model achieves strong performance on curated datasets, its robustness in real-world noisy datasets and its applicability to rare diseases require further investigation. Future research should also consider integrating additional data modalities, including epigenetic modifications and clinical phenotypes, to improve predictive accuracy further and broaden the scope of application.
  
  Conclusion
  DeepMDpred is an effective method that combines graph convolution and attention learning for the multi-type prediction of miRNA-disease associations. It provides a better ability to identify new association types between diseases and miRNAs, as well as broader applicability to unveil associated miRNAs with new diseases.
- A Multipurpose Machine Learning Application in Microbiological Data
  
  Authors: Saurav Kumar Mishra, Jeba Praba J., Kusum Gurung, Akansha Subba, Tabsum Chhetri and John J. Georrge
  
  https://doi.org/10.2174/0115748936414246250808022341
  
  Available online: 21 August 2025
  More Less
  
  Microorganisms are widespread and essential to the transformation of substances and organic matter. Researchers studied microorganisms through various conventional methods, such as machine learning (ML), to overcome multiple obstacles. This review aims to highlight the involvement of ML in various aspects of microbiology to provide insightful information, along with advancement challenges.
  
  Concerning the microbiological aspects and the integration of ML and their associated applications, the relevant literature was diligently reviewed to collect meaningful information on the ML involvement in different fields of microbiology and discussed.
  
  Due to the complexity of microbiological data, the researchers are using the amalgamation of various stages and diverse ML applications to deal with and organize the data systematics for accurate results and proper hypotheses. Subsequently, navigating these microbiological data requires an extensive feature-based model for the appropriate validation and to obtain accurate results.
  
  This study mainly summarizes the various applications and development of ML models used in many aspects, especially the fundamentals of ML in microbiological data, clinical applications, microbial ecology, and the surrounding environment. At present, ML's involvement in microbial aspects is widely utilized; however, bulk data and proper information are needed for accurate and informative outcomes. This review sheds light on ML's involvement in microbiological aspects, and briefly discusses the different aspects. The advanced approaches followed by different tools and databases can be a potential lead toward significant research and promising findings.
- LO-HDL: A New Method for Prediction of Local Genetic Correlation Based on Maximum Likelihood Estimation
  
  Authors: Ya-Ping Wen and Zu-Guo Yu
  
  https://doi.org/10.2174/0115748936383234250804032135
  
  Available online: 19 August 2025
  More Less
  
  Introduction
  Genetic correlation plays a pivotal role in elucidating the shared genetic architecture underlying complex traits and diseases. Local genetic correlation can efficiently pinpoint specific genomic regions, thereby enhancing the precision of gene correlation analysis. However, accurate estimation of local genetic correlations remains challenging owing to linkage disequilibrium in local genomic regions and overlapping study samples.
  
  Methods
  In this work, we propose a novel method called LO-HDL that is based on high-dimensional maximum likelihood estimation. LO-HDL constructs marginal statistics using the summary statistics of GWAS and combines the 1000 Genomes Project Phase 3 data as a reference panel.
  
  Results
  To assess the statistical power of LO-HDL, we performed a comparative analysis of LO-HDL with other local genetic correlation estimation methods on simulations with three different degrees of sample overlap. In the case of the absence of sample overlaps, the LO-HDL method improves the statistical power for cases with high local genetic covariance. In the case of partial sample overlap and complete sample overlap, LO-HDL demonstrates an overall improvement in statistical power. As an application, we used LO-HDL to estimate local genetic associations between the four autoimmune disorders. We found that LO-HDL could identify 31 regions with significant associations.
  
  Discussion
  The LO-HDL method can identify genes or genomic regions that jointly influence multiple complex traits, thereby revealing the shared genetic architecture among traits. This approach elucidates the genetic relationships between traits and provides a basis for interpreting their associations. In simulated data, when the local genetic covariance ranges between (0.002–0.004), the statistical power of LO-HDL is slightly lower than that of previous methods. However, LO-HDL demonstrates superior performance in study scenarios with partial or complete sample overlap, as well as in real GWAS data analyses. Through LO-HDL, researchers can more accurately pinpoint genetically correlated regions among diseases. For instance, the TRIM27 gene on chromosome 6 exhibits significant associations with four diseases and may serve as a potential therapeutic target in future treatments.
  
  Conclusion
  LO-HDL is a novel method for estimating local genetic correlations, which is based on high-dimensional maximum likelihood estimation. Through its application in simulated datasets and four autoimmune diseases, LO-HDL improves the accuracy of estimating local genetic correlations, which has applicability for revealing relationships between genetic variants and specific traits or diseases.
- A Comprehensive Database of Human Transmembrane Protein Mutations
  
  Authors: Jiayi Zhang, Yibo Liu, Li Guo and Fang Ge
  
  https://doi.org/10.2174/0115748936369628250714051437
  
  Available online: 04 August 2025
  More Less
  
  Introduction
  Transmembrane proteins are essential for elucidating human disease mechanisms. This study establishes a comprehensive, current database of transmembrane protein mutations to advance research into disease processes and therapeutic innovation.
  
  Methods
  The study constructed a robust database of transmembrane protein mutations by integrating data from Swiss-Prot, Humsavar, COSMIC, and ClinVar. The Variant Effect Predictor (VEP) was employed to predict the functional consequences of mutations, and mutation sequence generation scripts were developed to generate and annotate mutation sequences. Stringent filtering criteria were applied to ensure data quality, and a thorough analysis of mutation types, distribution, and impact levels was conducted.
  
  Results
  The resulting dataset encompasses 138,235 entries across 202 annotation fields, incorporating standard identifiers (e.g., gene names, Ensembl IDs, genomic positions), as well as additional functional effects fields generated by different methods. The dataset is publicly accessible at http://tmliang.cn/memPmut/.
  
  Discussion
  The database highlights the functional significance of missense mutations and the prevalence of subtle effects from moderate-impact variants. Nucleotide transition biases suggest potential hotspots, while the web server facilitates research into disease mechanisms and therapeutic targets.
  
  Conclusion
  This study provides a cohesive, high-quality database that aids the research on transmembrane protein mutation by consolidating diverse data sources and hundreds of mutation function effects.
- Heart Sound Classification Using Kernel Partial Least Squares with Easy MKL-derived Kernels
  
  Authors: Wenjie Zhang and Zhen Tian
  
  https://doi.org/10.2174/0115748936366098250718052357
  
  Available online: 30 July 2025
  More Less
  
  Introduction
  The automatic classification of heart sound signals offers an economical and convenient approach for early diagnosis of cardiac diseases. By leveraging technological advancements, this method facilitates early detection and management of heart conditions, which is critical for improving patient outcomes.
  
  Methods
  To address the challenges in analyzing complex heart sound signals, we introduce a novel method utilizing EasyMKL-enhanced kernel partial least squares (KPLS). This approach begins with transforming segmented cardiac cycles into the time-frequency domain using the short-time Fourier transform (STFT). The STFT representations are then mapped into a high-dimensional feature space using multiple kernel functions derived from Easy MKL, designed to capture and enhance the discriminative nonlinear relationships among various heart sound categories. The extracted features are classified using a Support Vector Machine (SVM) for datasets with balanced samples and an XGBoost classifier for those with imbalanced samples.
  
  Results
  The proposed method was evaluated on two publicly available heart sound datasets, the PhysioNet/CinC Challenge 2016 and the Yaseen dataset. On the PhysioNet/CinC Challenge 2016 dataset, our method achieved a sensitivity of 0.9217, a specificity of 0.8950, and an overall score of 0.9084. On the Yaseen dataset, our method achieved an average recall of 0.9933, precision of 0.9930, and F1-score of 0.9930, demonstrating high classification accuracy across different heart sound categories. These results confirm the effectiveness of our approach in extracting discriminative features and improving classification performance.
  
  Discussion
  The high performance across two diverse datasets confirms the generalizability and robustness of the proposed method. Notably, the EasyMKL-enhanced KPLS framework captures complex nonlinear patterns while maintaining interpretability—an essential attribute for clinical applications. Compared to traditional approaches, our method significantly improves feature discriminability, as evidenced by ablation studies. While minor misclassifications persist in acoustically similar classes, the model consistently outperforms baselines, highlighting its strong potential for deployment in real-world intelligent auscultation systems.
  
  Conclusion
  The experimental results confirm the superiority of our proposed method, demonstrating its potential as a powerful tool for the automatic classification of heart sound signals. This approach not only enhances the accuracy of cardiac disease diagnostics but also offers a robust framework for handling complex and nonlinear characteristics of heart sound data, promising significant contributions to clinical practices and research in cardiology.
- Integrated Metabolic-related Transcription Factor Protein Activity for Stratification of Breast Cancer with Distinct Clinical Outcomes
  
  Authors: Yuqiang Xiong, Shaokang Li, Zhengchun Huo, Min Zou, Dongqing Su, Honghao Li, Shiyuan Wang and Lei Yang
  
  https://doi.org/10.2174/0115748936380191250710051557
  
  Available online: 23 July 2025
  More Less
  
  Introduction
  Breast carcinoma continues to be a predominant factor contributing to cancer-associated mortality in women across the globe. Despite the significant advancements in medical technology today, there remain challenges in precisely stratifying patients based on their risk profiles and identifying the most effective treatment strategies for breast cancer. The regulation of metabolism and transcription factors is considered to have a close association with cancer progression.
  
  Methods
  In this study, the co-expression network was utilized to identify transcription factors associated with metabolic molecule subtypes, and ultimately, a risk scoring model was constructed. WGCNA is also employed to explore related transcription factor modules, and the VIPER method is used to infer the state of transcription factors. A machine learning methodology, specifically SVM, has been employed to model patient survival outcomes.
  
  Results
  We found that patients with lower risk scores exhibit extended survival durations and chemotherapy response in comparison to their high-risk counterparts. Meanwhile, high-risk patients exhibited higher levels of chromosomal instability and tumor immunogenicity relative to low-risk patients. Additionally, we constructed a ceRNA network and successfully identified 39 master regulators associated with survival outcomes.
  
  Discussion
  This study provided a method for using the protein activity of transcription factors for subtyping breast cancer patients.
  
  Conclusion
  We achieved risk stratification of breast cancer patients and accurately predicted their prognosis. The result also highlighted various contributors impacting the clinical prognosis of breast cancer patients.
- Ensemble Regression-Based Identification of Signatures for Cancer Prognosis in RNA Expression Profiles
  
  Authors: Yajun Zhang and Xudong Zhao
  
  https://doi.org/10.2174/0115748936374758250702145027
  
  Available online: 15 July 2025
  More Less
  
  Introduction
  Previous studies have extensively reported various feature selection methods for identifying cancer signatures using RNA expression profiles. However, these methods often produce unreliable signatures due to four key factors. First, classifiers other than regression models are always inappropriately applied in prognostic survival analysis. Second, the unknown distribution of samples can lead to the ineffective selection of regression models. Third, high-dimensional expression profiles with small sample sizes typically result in poor predictive performance of the selected regression model. Fourth, variable control is usually overlooked.
  
  Methods
  To solve these problems, we have proposed a novel feature selection framework using ensemble regression to identify cancer prognostic signatures. This framework utilizes ensemble regression to overcome the limitations of classification models, as classification models reduce survival time to categorical labels, losing the original continuous information. At the same time, it incorporates up-sampling techniques to increase sample size and uses a bagging strategy to randomly select samples and features, addressing the challenges posed by high-dimensional data and small sample sizes. Additionally, the framework controls for clinical variables to ensure stable feature selection and reliable prediction results.
  
  Results
  Experimental results demonstrate the effectiveness of this method in addressing the issues mentioned, providing reliable prognostic signatures. The ensemble regression method significantly improves predictive performance, with robust adaptability to unknown sample distributions.
  
  Discussion
  The proposed ensemble regression model outperforms classification and single regressors in prognostic survival analysis by preserving continuous survival information, adapting to sample distribution, and benefiting from controlled variables. Using TCGA-GBM data, six prognostic miRNAs were validated as reliable biomarkers, whereas mRNA-based models showed limited robustness due to high dimensionality and small sample size.
  
  Conclusion
  The proposed feature selection framework offers a robust approach to improving the identification of cancer prognostic signatures, enhancing predictive accuracy in prognostic survival analysis.
- MK-NMF: A Novel Multiple Kernel-based Non-negative Matrix Factorization Model to Mini Synergistic Drug Combinations in Cell Lines
  
  Authors: Tianyi Li, Huirui Han, Jiaqi Chen, Dehua Feng, Zhengxin Chen, Xuefeng Wang, Xinying Liu, Ruijie Zhang, Qibin Wang, Lei Yu, Xia Li, Bing Li, Limei Wang and Jin Li
  
  https://doi.org/10.2174/0115748936364740250614061809
  
  Available online: 14 July 2025
  More Less
  
  Introduction
  Drug synergism may occur when two or more drugs are used in combination. Synergistic drug pairs can enhance efficacy and reduce drug dosage and side effects. Therefore, employing computational methodologies to identify specific synergistic drug combinations for clinical application is of significant importance.
  
  Methods
  We proposed a multiple kernel-based non-negative matrix factorization, MK-NMF, specifically for mining specific synergistic drug pairs in cell lines. In this method, we treated the features of drug pair space and cell line space in the form of two kernel matrices. We incorporated feature kernel matrices into the matrix factorization process.
  
  Results
  MK-NMF achieved an area under the curve (AUC) of 0.884 and an area under the precision versus recall curve (AUPR) of 0.537 on the NCI ALMANAC dataset. Both measures were more than a 5% improvement over the previous matrix factorization model. MK-NMF had good robustness with the missing input data. Its performance was stable when the amount of matrix data input was at least 40%. Literature and experimental verification confirmed some of our predictions.
  
  Discussion
  The increase in data volume and the introduction of more high-quality features will further enhance the performance of MK-NMF. Single-drug response data will help address the challenge of predicting synergistic combinations of new drugs.
  
  Conclusion
  MK-NMF could assist medical professionals in rapidly screening synergistic drug combinations against specific cancer cell lines. The source code of MK-NMF is freely available at https://github.com/XDRFDH/MK-NMF.
- Genome-wide Analysis of Ovarian Cancer-specific circRNAs in Alternative Splicing Regulation
  
  Authors: Minhui Zhuang, Meng Zhang, Yulan Wang, Lingxiao Zou, Shan He, Jingjing Liu, Jian Zhao, Ping Han, Xiaofeng Song and Jing Wu
  
  https://doi.org/10.2174/0115748936358930250518030128
  
  Available online: 26 May 2025
  More Less
  
  Introduction
  Ovarian cancer (OC) is a fatal female reproductive system cancer with a high mortality rate and is hard to detect at an early stage. Recent studies have indicated that alternative splicing plays an important role in OC progression by activating genes and pathways involved in tumorigenesis. Circular RNAs (circRNAs) have also been found to play a regulating role in tumor progression and present their potential ability in alternative splicing regulation. However, the underlying mechanism by which circRNAs regulate alternative splicing events (ASEs) in OC remains unclear.
  
  Methods
  In this study, we performed a comprehensive transcriptomic study on the RNA-seq data of our collected tumor and normal samples from OC patients, aiming to investigate the regulatory roles of OC-specific circRNAs in aberrant splicing events and their underlying pathways in tumorigenesis.
  
  Results
  We conducted a genome-wide regulatory network with strong correlations from 300 differentially expressed (DE) circRNAs and 1,150 aberrant ASEs, mediated by 31 DE SFs. Analyses of this network revealed that dysregulation of circRNAs may lead to aberrant ASEs that are closely involved in ovarian tumorigenesis. In addition, two crucial circRNAs, circ_AKT3 (hsa_circ_0000199) and circ_GSK3B (hsa_circ_0008797), were identified due to their significant roles in the network and associations with multiple tumor-related functional pathways.
  
  Discussion
  These findings suggest that OC-specific circRNAs may participate in tumor progression by indirectly regulating groups of ASEs through multiple SFs, rather than through direct interaction. Subnetwork analyses centered on the two hub circRNAs revealed that their associated ASEs are functionally clustered and involved in coordinated biological processes relevant to tumor biology.
  
  Conclusion
  This study provides novel insights into the regulatory pathways by which circRNAs are involved in OC progression, offering clues for discovering diagnostic biomarkers and therapeutic targets.
- A Survey of Trends in Biomolecule Recognition for Sensing and Machine Learning Combined with Heterogeneous Information
  
  Authors: Huiyu Ren, Cong Shen, Lingzhu Hu, Jijun Tang, Zhijun Liao and Wenyan Tian
  
  https://doi.org/10.2174/0115748936359132250421053523
  
  Available online: 14 May 2025
  More Less
  
  Biomolecule sensing for recognition is exhibited as the fundamental upstream step concerning target identification during the metabolism of individual life. Nevertheless, it is always a complicated work that leverages both in vitro and in vivo experiments to discriminate the corresponding interaction, affinity, structure, activity, and toxicity concerning target biomolecules. Simultaneously, biological investigation with intelligent computing has extended to bio-sequence analysis and biomedical image processing, especially biomolecule identification in multi-view and multi-modal. This review presents a panorama of contemporary development among biomolecular omics and computing biological sensing, machine learning scenarios, and heterogeneous information with multi-view, multi-modal, structured, and unstructured text and biomedical images. After being given the background, the concept and database of biomolecule interaction, affinity, and structure are introduced. Then, the machine learning paradigms in bioinformatics and biomedical engineering are demonstrated according to epigenetics-centered or pharmacogenomics. Next, the multi-view or multi-modal learning algorithms and optimization strategies with structured and unstructured data formats, including texts and biomedical images are listed in detail. By comparing and analyzing the state-of-the-art works, this study has summarized the advantages of existing methods in target biomolecule identification and the challenges. Finally, future developments are prospected, including the trend of research in robustness, data augmentation, generalized model delineated, and acceleration.
- Analysis of Alternative Splicing Heterogeneity during Early Stages of Mouse Embryonic Development
  
  Authors: Hongxia Chi, Yu Zhang, Anqi Li, Pengwei Hu, Wuritu Yang and Yongqiang Xing
  
  https://doi.org/10.2174/0115748936376211250429102627
  
  Available online: 14 May 2025
  More Less
  
  Introduction
  Pre-mRNA alternative splicing (AS) is a prevalent phenomenon in mammals, playing a crucial role in various biological processes such as embryonic development, tissue differentiation, and disease pathogenesis. Despite the advancements in single-cell RNA sequencing (scRNA-seq) technology, the extent of AS heterogeneity at the transcript level during early mouse embryonic development remains largely unexplored.
  
  Methods
  The BRIE2 and expedition were employed to identify and quantify splicing events. Cell clustering was performed with Scanpy based on Percent Spliced In (PSI) values and gene expression levels. Then, marker AS events and differential AS events were detected by the Wlicocon rank-sum test and BRIE2's Mode-2 quantification mode. GO and KEGG enrichment analysis were conducted by ClusterProfiler.
  
  Results
  The results suggested substantial heterogeneity in AS events and elucidated PSI values as a critical index of cell heterogeneity during early mouse embryonic development, shedding light on the regulatory mechanisms underlying these processes. By examining marker and differential AS events, the study provided a comprehensive understanding of the dynamic changes in splicing patterns throughout early mouse embryonic development.
  
  Discussion
  This study revealed the heterogeneity of AS and elucidated its implications during early mouse embryonic development by analyzing AS at the single-cell level. However, the results are theoretical and lack experimental validation.
  
  Conclusion
  The findings offer critical insights into studying mouse embryonic development from the perspective of RNA cellular heterogeneity, emphasizing the importance of AS in shaping cellular diversity and developmental processes.
- Investigating the Unique Transcriptional miRNA-mRNA Regulatory Network of ALK-positive Lung Adenocarcinoma Using Machine Learning Methods
  
  Authors: Xiandong Lin, YuSheng Bao, Shaoli Wang, Hongyu Yu, Wei Guo, KaiYan Feng, Tao Huang and Yu-Dong Cai
  
  https://doi.org/10.2174/0115748936362646250501185409
  
  Available online: 08 May 2025
  More Less
  
  Introduction
  Non-small Cell Lung Cancer (NSCLC) is characterized by key gene mutations, such as EGFR, KRAS, and ALK. ALK rearrangement occurs in 3–5% of patients with non-small cell lung adenocarcinoma and is related to different clinical characteristics. Although ALK tyrosine kinase inhibitors have shown efficacy, drug resistance remains a challenge. This current study aims to determine the unique molecular characteristics of ALK-positive lung adenocarcinoma to improve detection and prognosis.
  
  Methods
  GSE128311 integrates expression profiling data by array from GSE128309 and noncoding RNA profiling data by array from GSE128310, including 42 patients with ALK-positive lung adenocarcinoma and 35 patients with ALK-negative lung adenocarcinoma. This data was analyzed by eight feature ranking algorithms, yielding eight feature lists. These lists were fed into incremental feature selection to extract essential features.
  
  Results
  Key differentially expressed genes and miRNAs were identified, and functional enrichment analysis was carried out.
  
  Discussion
  Results of the imbalance of the cell cycle pathway, FOXM1 transcription factor network, and immune response process in ALK-positive tumors were emphasized. It is worth noting that CX3CL1, MMS22L, DSG3, RUFY1, miR-652-5p, and miR-1288 are potentially important markers. Gene set enrichment analysis revealed the low expression of the cell cycle pathway in ALK-positive samples.
  
  Conclusion
  This comprehensive computational analysis provides new insights into the molecular basis of ALK-positive lung adenocarcinoma and determines promising biomarkers for further research.
- Single-Cell RNA Sequencing to Identify Natural Killer Cell-Linked Genetic Markers and Regulatory Biomolecules in Coronary Heart Disease
  
  Authors: Prosenjit Saha Apu, Md. Arju Hossain, Md. Shakil, Md. Selim Reza, Siddique Akber Ansari, Irfan Aamer Ansari, Mahammad Humayoo and Md Habibur Rahman
  
  https://doi.org/10.2174/0115748936362436250417105150
  
  Available online: 25 April 2025
  More Less
  
  Introduction
  Bacterial and viral infections have been linked to an increased risk of coronary heart disease (CHD), potentially through natural killer (NK) cell-mediated innate immune mechanisms. This study aimed to integrate single-cell RNA sequencing (scRNA-seq) and bulk transcriptomics data to identify NK cell-associated genetic biomarkers that could aid in the diagnosis and assessment of CHD.
  
  Methods
  Publicly available single-cell and bulk RNA-seq datasets were analyzed to identify differentially expressed genes (DEGs). Functional enrichment analysis, protein-protein interaction (PPI) network construction, and biomarker validation were performed using standard bioinformatics pipelines.
  
  Results
  A total of 106 shared DEGs were identified through integrated cross-comparative analysis. Enrichment analysis revealed involvement in immune activation, signal transduction, T-cell receptor signaling, and TYROBP signaling pathways. PPI network analysis identified key hub proteins, including CDK1 and PTPRC, as potential biomarkers. Regulatory analysis revealed transcription factors (TP53, YY1, and RELA) and post-transcriptional miRNAs (hsa-miR-195-5p, hsa-miR-34a-5p, and hsa-miR-16-5p) that may influence CHD-associated gene expression. Several small molecules were also predicted to interact with these targets, suggesting potential therapeutic applications.
  
  Discussion
  The findings underscore the role of NK cell-mediated immune pathways in CHD pathogenesis. Hub genes such as CDK1 (involved in cell cycle regulation) and PTPRC (an immune signaling regulator) show promise as diagnostic biomarkers. The discovery of regulatory factors and druggable targets supports a complex, multi-level mechanism involving transcriptional and immune modulation.
  
  Conclusion
  This integrative study identifies novel NK cell-related molecular signatures and therapeutic targets, offering promising avenues for CHD diagnosis and the development of personalized treatment strategies.
- Innovative Insights into Liver Cancer: Multi-Omics Reveals Critical  Subtypes and Hub Genes
  
  Authors: Jin-Yuan Cheng, Zi Liu, Xin Liu, Muhammad Kabir and Wang-Ren Qiu
  
  https://doi.org/10.2174/0115748936365348250331112230
  
  Available online: 24 April 2025
  More Less
  
  Introduction/Objective
  Hepatocellular carcinoma (HCC) is a highly heterogeneous malignant tumor, characterized by elevated mortality rates and poor diagnostic outcomes. Accurate identification of cancer subtypes is crucial for guiding personalized treatment and improving patient prognosis.
  
  Methods
  A method for precisely identifying HCC subtypes by integrating multi-omics data was presented. This approach combines the GRACES dimensionality reduction technique with the hMKL subtype identification model to analyze data from 266 HCC patients.
  
  Results
  We identified two subtypes more accurately, both significantly associated with overall survival. Their respective three-year mortality rates were 55.9% and 27.9%. Additionally, we observed significant differences in the activity of five pathways between these two subtypes, along with notable variations in the abundance and status of seven types of immune cells. Through further determination of the PPI network and centrality indicators, 13 up-regulated hub genes and 14 down-regulated hub genes were identified.
  
  Discussion
  Based on the above results, we compared and discussed the hub genes with the textual data, examined differences in gene upregulation and downregulation, and evaluated findings from other bioinformatics analyses to identify potential biomarkers.
  
  Conclusion
  Limited research on ENPP3 and C3 in HCC suggests their potential as biomarkers. Additionally, low expression levels of PIK3R1, KDR, and CYP3A5, along with high expression levels of EGLN3 and EPO, may indicate a higher risk of liver cancer in patients. Single-gene survival analysis highlighted the significant impact of highly expressed genes on HCC prognosis, with PKM, RRM2, and EPO playing crucial roles in the risk scores.
- Multiple Approaches to Identifying Key Genes Linked to the  Anti-inflammatory Effects of Ginsenosides
  
  Authors: Gui-Fang Xiang, Fei-Ran Zhou, Chun-Yan Cui, Qing Liu, An-Qiong Mao and Ying Zhang
  
  https://doi.org/10.2174/0115748936348266250225070200
  
  Available online: 10 March 2025
  More Less
  
  Ginsenoside is a naturally occurring active ingredient in ginseng, which mainly consists of four components, including Rb1, Rb2, Rc, and Rd, which are considered to be an important part of ginseng's medicinal effects. Ginsenosides can enhance the anti-fatigue ability of the body, regulate immune function, improve cardiovascular function, and have anti-aging, antioxidant, and neuroprotective effects. In recent years, many studies have found that ginsenosides have anti-inflammatory properties and are used in the treatment of many inflammatory diseases, such as endodontitis, bronchitis, and many others. Ginsenosides reduce inflammation by suppressing the release of inflammatory mediators, modulating inflammatory signaling pathways,  scavenging free radicals, and modulating the immune system in a variety of ways. However, existing studies have not investigated the specific genes underlying the inflammation-reducing properties of ginsenosides. In this study, we analyzed two publicly accessible datasets from the GEO database (GSE255672 and GSE173990) to investigate the molecular basis of the anti-inflammatory effects of ginsenosides. This study aims to advance our understanding of how ginsenosides exert their anti-inflammatory properties, providing preliminary findings for identifying gene targets for their anti-inflammatory effects, thereby enhancing our understanding of their biological function and identifying new therapeutic pathways in the management of inflammation. It paves the way for further research of ginsenosides and therapeutic application of inflammation-related diseases.
- An Analysis of the Interactions between the 5' UTR and Introns in Mitochondrial Ribosomal Protein Genes
  
  Authors: Junchao Deng, Ruifang Li, Xinwei Song, Shan Gao, Shiya Peng and Xu Tian
  
  https://doi.org/10.2174/0115748936357583250207100102
  
  Available online: 10 February 2025
  More Less
  
  Background
  The 5' UTR plays a crucial role in gene regulation, which may be through its interaction with introns. Hence, there is a need to further study this interaction.
  
  Objective
  This study aimed to investigate the interactions between 5' UTR and introns and their correlation with species evolution.
  
  Methods
  The optimally matched segments between 5' UTR and introns were identified using Smith-Waterman local similarity matching, and the biological statistical methods were applied to compare the optimally matched segments between different species.
  
  Results
  The interactions between 5' UTR and introns were found to be primarily mediated by weak bonds and demonstrated a directional change with species evolution. Additionally, a large proportion of the optimally matched segments were very similar to miRNA and siRNA in terms of length and matching rate characteristics.
  
  Conclusion
  The weak bonds in the interactions between the 5' UTR and the introns could enhance the flexibility of expression regulation, and an important correlation was found between the characteristic distributions of the optimally matched segments and species evolution. Additionally, the length and matching rate of a large proportion of optimally matched segments were very similar to those of miRNA and siRNA. In conclusion, it is highly probable that quite a few of the optimally matched segments are some kinds of functional non-coding RNAs.
- PDTDAHN: Predicting Drug-Target-Disease Associations using a Heterogeneous Network
  
  Authors: Lei Chen and Jingdong Li
  
  https://doi.org/10.2174/0115748936359702250120114240
  
  Available online: 10 February 2025
  More Less
  
  Background
  Disease is a major threat to life, and extensive efforts have been made over the past centuries to develop effective treatments. Identifying drug-disease and disease-target associations is crucial for therapeutic advancements, whereas drug-target associations facilitate the design of more effective treatment strategies. However, traditional experimental approaches for identifying these associations are costly and time-consuming. Numerous computational models have been developed to predict drug-target, drug-disease, and disease-target associations. However, these models are designed individually and cannot directly predict drug-target-disease associations, which involve interconnections among drugs, targets, and diseases. Such triple associations provide deeper insights into disease mechanisms and therapeutic interventions by capturing high-order associations.
  
  Objective
  This study proposes a computational model named PDTDAHN to predict drug-target-disease triple associations.
  
  Method
  Six association types retrieved from public databases are used to construct a heterogeneous network comprising drugs, targets, and diseases. The network embedding algorithm Mashup is applied to extract features for drugs, targets, and diseases, which are then combined to represent each drug-target-disease association. The classification model is trained using LightGBM.
  
  Results
  Cross-validation on eight datasets demonstrates the high performance of PDTDAHN, with AUROC and AUPR exceeding 0.9. This model outperforms previous models based on pairwise association predictions.
  
  Conclusion
  The proposed model effectively predicts drug-target-disease triple associations.

20 | 50 | 100 per page

Most Cited Most Cited RSS feed

- A Review of Ensemble Methods in Bioinformatics
  
  Authors: Pengyi Yang, Yee Hwa Yang, Bing B. Zhou and Albert Y. Zomaya
- Bioinformatics Tools for Mass Spectroscopy-Based Metabolomic Data Processing and Analysis
  
  Authors: Masahiro Sugimoto, Masato Kawakami, Martin Robert, Tomoyoshi Soga and Masaru Tomita
- Distance-based Support Vector Machine to Predict DNA N6- methyladenine Modification
  
  Authors: Haoyu Zhang, Quan Zou, Ying Ju, Chenggang Song and Dong Chen
- A Review on the Recent Developments of Sequence-based Protein Feature Extraction Methods
  
  Authors: Jun Zhang and Bin Liu
- Molecular Genetic Markers: Discovery, Applications, Data Storage and Visualisation
  
  Authors: Chris Duran, Nikki Appleby, David Edwards and Jacqueline Batley
- A Brief Survey of Machine Learning Methods in Protein Sub-Golgi Localization
  
  Authors: Wuritu Yang, Xiao-Juan Zhu, Jian Huang, Hui Ding and Hao Lin
- Cancer Diagnosis Through IsomiR Expression with Machine Learning Method
  
  Authors: Zhijun Liao, Dapeng Li, Xinrui Wang, Lisheng Li and Quan Zou
- Relevance of Molecular Docking Studies in Drug Designing
  
  Authors: Ritu Jakhar, Mehak Dangi, Alka Khichi and Anil K. Chhillar
- The Advances and Challenges of Deep Learning Application in Biological Big Data Processing
  
  Authors: Li Peng, Manman Peng, Bo Liao, Guohua Huang, Weibiao Li and Dingfeng Xie
- Gene Expression Profile Classification: A Review
  
  Authors: Musa H. Asyali, Dilek Colak, Omer Demirkaya and Mehmet S. Inan
More Less

Current Bioinformatics - Online First

Most Read This Month Most Read RSS feed

Most Cited Most Cited RSS feed