Volume 20, Issue 5

Current Bioinformatics - Volume 20, Issue 5, 2025

Volume 20, Issue 5, 2025

- EPI-HAN: Identification of Enhancer Promoter Interaction Using Hierarchical Attention Network
  
  Authors: Fatma S. Ahmed, Saleh Aly and Xiangrong Liu
  
  https://doi.org/10.2174/0115748936294743240524113731
  More Less
  
  Background
  Enhancer-Promoter Interaction (EPI) recognition is crucial for understanding human development and transcriptional regulation. EPI in the genome plays a significant role in regulating gene expression. In Genome-Wide Association Studies (GWAS), EPIs help to improve the mechanistic understanding of disease- or trait-associated genetic variants.
  Methods
  Experimental methods for classifying EPIs are time-consuming and expensive. Consequently, there has been a growing emphasis on research focused on developing computational approaches that leverage deep learning and other machine learning techniques. One of the main challenges in EPI prediction is the long sequences of enhancers and promoters, which most existing computational approaches struggle with. This paper proposes a new deep learning model based on the Hierarchical Attention Network (HAN) for EPI detection. The proposed EPI-HAN model has two unique features: (i) a hybrid embedding strategy (ii) a hierarchical HAN structure comprising two attention layers that operate at both the individual token and smaller sequence levels.
  Results
  In benchmark comparisons, the EPI-HAN model demonstrates superior performance over state-of-the-art methods, as evidenced by AUROC and AUPR metrics for specific cell lines. Specifically, for the cell lines HeLa-S3, HUVEC, and NHEK, the AUROC values are 0.962, 0.946, and 0.987, respectively, and the AUPR values are 0.842, 0.724, and 0.926, respectively.
  Conclusion
  The comparative results indicate that our model surpasses other state-of-the-art models in three out of six cell lines. The superior performance in recognizing EPIs is attributed to the hierarchical structure of the attention mechanism.
  
  Add to my favourites
  
  Email this

- Detection of DNA N6-Methyladenine Modification through SMRT-seq Features and Machine Learning Model
  
  Authors: Yichu Guo, Yixuan Zhang, Xiaoqing Liu, Pingan He, Yuni Zeng and Qi Dai
  
  https://doi.org/10.2174/0115748936300671240523044154
  More Less
  
  Introduction
  N6-methyldeoxyadenine (6mA) is the most prevalent DNA modification in both prokaryotes and eukaryotes. While single-molecule real-time sequencing (SMRT-seq) can detect 6mA events at the individual nucleotide level, its practical application is hindered by a high rate of false positives.
  
  Methods
  We propose a computational model for identifying DNA 6mA that incorporates comprehensive site features from SMRT-seq and employs machine learning classifiers.
  
  Results
  The results demonstrate that 99.54% and 96.55% of the identified DNA 6mA instances in C. reinhardtii correspond with motifs and peak regions identified by methylated DNA immunoprecipitation sequencing (MeDIP-seq), respectively. Compared to SMRT-seq, the proportion of predicted DNA 6mA instances within MeDIP-seq peak regions increases by 2% to 70% across the six bacterial strains.
  
  Conclusion
  Our proposed method effectively reduces the false-positive rate in DNA 6mA prediction.
  
  Add to my favourites
  
  Email this

- Rank Matrix Approach for Endometriosis: Integrating Data and Constructing Diagnostic Models
  
  Authors: Ranze Xie, Deqing Hong, Jiaqi Yuan, Peng Xu, Wenbin Liu and Zheng Ye
  
  https://doi.org/10.2174/0115748936296151240605053713
  More Less
  
  Background
  Endometriosis is a debilitating gynecological disorder characterized by chronic pain, infertility, and the growth of endometrial tissue outside the uterus. Accurate and early detection of this condition is crucial for effective management and treatment.
  Methods
  We developed a gene rank matrix-based model to integrate endometriosis cohorts across multiple platforms. After removing batch effects, we identified 83 genes associated with endometriosis and further refined a diagnostic model using 11 of these genes. The model was trained on two platforms and validated on two others using SVM, Random Forest, Logistic Regression, and gradient-boosting machine learning algorithms.
  Results
  The integration via the gene rank matrix effectively mitigated batch effects. Utilizing a gradient boosting classifier with a subset of 11 genes, the model demonstrated commendable diagnostic efficacy, achieving an Area Under the Curve (AUC) of 0.77, an accuracy of 0.72, and an F1 score of 0.72 for the training dataset. When subjected to validation, the model maintained its performance, yielding an AUC of 0.769, an accuracy of 0.719, and an F1 score of 0.732. These 11 genes were found to be associated with immunosuppression.
  Conclusion
  Our approach to integrating gene rank matrices effectively consolidates endometriosis data across diverse platforms. The diagnostic model, harnessing the predictive power of 11 specific genes, surpasses alternative models, thereby offering promising prospects for aiding clinical diagnosis of endometriosis. Further validation is imperative to elucidate the functional significance of these 11 genes. Our study underscores the potential of data integration coupled with machine learning techniques in advancing the diagnosis of intricate diseases, such as endometriosis.
  
  Add to my favourites
  
  Email this

- Insights into Co-Expression Network Analysis of MicroProteins and their Target Transcription Factors in Plant Embryo Development
  
  Authors: Khadijeh Shokri, Naser Farrokhi, Asadollah Ahmadikhah, Mehdi Safaeizadeh and Amir Mousavi
  
  https://doi.org/10.2174/0115748936304167240530091051
  More Less
  
  Background
  Gene expression is regulated in a spatiotemporal manner, and the roles of microProteins (MiPs) in this concept have started to become clear in plants.
  Methods
  Here, a microarray data analysis was carried out to decipher the spatiotemporal role of MiPs in embryo development. The guilt-by-association method was used to determine the corresponding regulatory factors.
  Results
  Module network analyses and protein-protein interaction (PPI) assays suggested 13 modules for embryo development in the Arabidopsis model plant. Various biological processes such as metabolite biosynthesis, hormone transition and regulation, fatty acid and storage protein biosynthesis, and photosynthesis-related processes were prevalent. Different transcription factors (TFs) at different stages of embryo development were found and reviewed. Furthermore, 106 putative MiPs were identified that might be involved in the regulation of embryo development. Candidate hub MiPs (15) at embryo developmental stages were identified by PPI network analysis and their putative regulatory roles were discussed. Previously reported MiPs, AT1G14760 (KNOX), AT5G39860 (PRE1), and AT2G46410 (CPC), were noted to be present in modules M3 and M8.
  Conclusion
  Molecular comprehension of regulatory factors including MiPs and TFs during embryo development allows targeted breeding of the corresponding traits and genome-based engineering of value-added new varieties.
  
  Add to my favourites
  
  Email this

- Screening Analysis of Predictive Markers for Cytokine Release Syndrome Risk in CAR-T Cell Therapy
  
  Authors: Jiayu Xu, Chengkui Zhao, Zhenyu Wei, Weixin Xie, Qi Cheng, Min Zhang, Shuangze Han, Liqing Kang, Nan Xu, Lei Yu and Weixing Feng
  
  https://doi.org/10.2174/0115748936295986240619162816
  More Less
  
  Background
  Chimeric Antigen Receptor (CAR)-T cell therapy has emerged as a highly effective treatment for hematological tumors. However, the associated adverse reaction, Cytokine Release Syndrome (CRS), poses a significant challenge. While numerous studies have investigated CRS biomarkers during CAR-T cell therapy, the ability to predict CRS risk prior to treatment initiation remains a crucial yet underexplored aspect.
  Objective
  The primary purpose of this study was to address the issue of limited data, explore an alternative approach using public data to identify predictive markers for CRS risk assessment from RNA-Seq in pre-treatment patients data, and comprehend the inducible mechanisms underlying CRS.
  Methods
  We integrated information from two public databases, the FDA Adverse Event Reporting System (FAERS) for adverse reaction reports of CAR-T cell therapy and the Cancer Genome Atlas (TCGA) for RNA-Seq data on corresponding hematological tumors. Candidate genes were screened by correlation analysis between Reported Odds Ratio (ROR) values and RNA-Seq gene expression levels, and then core factors were identified through stepwise analysis of pathway enrichment, cluster analysis, and protein interactions.
  Results
  Our analysis highlighted the correlation between CRS risk and pre-treatment T cell activation/proliferation, identifying key genes (IFN-γ, IL1β, IL2, IL6, and IL10) as significant CRS indicators.
  Conclusion
  This study offers a unique perspective on predicting CRS risk before CAR-T cell therapy, circumventing the challenges of scarce clinical data by leveraging analysis of public databases. It elucidates the crucial role of T cell activation/proliferation dynamics in CRS. The analytical methods and identified markers provide a reference for the research and clinical application of CAR-T cell therapy.
  
  Add to my favourites
  
  Email this

- Automatic Detection of Standard Planes in Fetal Ultrasound Images based on Convolutional Neural Networks and Ensemble Learning
  
  Authors: Baoping Zhu, Fan Yang, Hongliang Duan and Zhipeng Gao
  
  https://doi.org/10.2174/0115748936295679240620094626
  More Less
  
  Introduction
  The wide application of artificial intelligence in various fields has shown its potential to aid medical diagnosis. Ultrasound is an important tool used to evaluate fetal development and diagnose fetal diseases.
  Methods
  However, traditional diagnostic methods are time-consuming and laborious. Therefore, we constructed an end-to-end automatic diagnosis system based on convolutional neural networks using ensemble learning to improve the robustness and accuracy of the system.
  Results
  The system classifies the ultrasound image dataset into six categories, namely, abdomen, brain, femur, thorax, maternal cervix, and other planes.
  Conclusion
  After experiments, the results showed that the proposed end-to-end system can considerably improve the detection accuracy of the standard plane.
  
  Add to my favourites
  
  Email this

- DNA Binding Protein Prediction based on Multi-feature Deep Meta-transfer Learning
  
  Authors: Chunliang Wang, Fanfan Kong, Yu Wang, Hongjie Wu and Jun Yan
  
  https://doi.org/10.2174/0115748936290782240624114950
  More Less
  
  Background
  In recent years, the rapid development of deep learning technology has had a significant impact on the prediction of DNA-binding proteins. Deep neural networks can automatically learn complex features in protein and DNA sequences, improving prediction accuracy and generalization capabilities.
  Objective
  This article mainly establishes a meta-migration model and combines it with a deep learning model to predict DNA-binding proteins.
  Methods
  This study introduces a meta-learning algorithm based on transfer learning, which helps achieve rapid learning and adaptation to new tasks. In addition, normalized Moreau-Broto autocorrelation attributes (NMBAC), position-specific scoring matrix-discrete cosine transform (PSSM-DCT), and position-specific scoring matrix-discrete wavelet transform (PSSM-DWT) are also used for feature extraction. Finally, the prediction of DBP is achieved through the deep neural network model based on the attention mechanism.
  Results
  This paper first establishes the basis of deep meta-transfer learning and uses the PDB186 data set as the benchmark to extract features using NMBAC, PSSM-DCT, and PSSM-DWT, respectively, and compare the fused features in pairs, and finally obtain the fused feature process. Through deep learning processing, it is concluded that the fused feature prediction effect is the best. At the same time, compared with the currently popular models, there are obvious improvements in the ACC, MCC, SN and Spec evaluation indicators.
  Conclusion
  Finally, it was concluded that the method used in this article can effectively predict DNA-binding proteins and show more significant performance.
  
  Add to my favourites
  
  Email this

- Biopanning Data Bank 2023: Updating and New Findings
  
  Authors: Hamza B. Abagna, Bowen Li, Chunchao Pu, Yuqing Jiang, Yuwei Zhou, Bifang He and Jian Huang
  
  https://doi.org/10.2174/0115748936329911241015123132
  More Less
  
  Background
  Biopanning, or phage display technology, has gained considerable research attention for discovering peptides, and antibodies, and understanding protein interactions, which are crucial for developing targeted therapeutics. The Biopanning Data Bank (BDB, http://i.uestc.edu.cn/bdb) serves as a repository for peptide biopanning results. However, its last significant update was in 2018, highlighting a research gap that needs urgent attention.
  Objectives
  This study aims to update BDB with the most recent data and enhance the identification of Target-Unrelated Peptides (TUPs).
  Methods
  A search of PubMed was conducted for recent articles related to “phage display” published between January 2018 and May 2023. Relevant data were manually curated and added to BDB. Each peptide’s target was identified using MimoSearch, while TUPScan was used to detect new TUPs.
  Results
  As of October 2023, BDB contains 3,682 biopanning datasets from 1,771 papers. These datasets included 124 NGPD datasets and 3,558 conventional biopanning datasets, featuring 34,078 peptide sequences, 593 templates, 2,231 targets, 524 peptide libraries, and 324 crystal structures. Our analysis identified 1,110 possible TUPs and 60 highly reliable TUPs, including 26 novel discoveries.
  Conclusion
  This update addresses critical research gaps by incorporating recent peptide data and introducing novel TUPs. BDB remains the most comprehensive resource for biopanning, playing a crucial role in peptide library research and supporting the development of new TUP predictors and mimotope decoding tools.
  
  Add to my favourites
  
  Email this

- Predicting Distant Metastatic Sites in Cancer Using miRNA and mRNA Expression Data
  
  Authors: Dostonjon Mamatkarimov, Jiahui Kang and Kyungsook Han
  
  https://doi.org/10.2174/0115748936338628241104110200
  More Less
  
  Background
  Cancer patients with metastasis face a much lower survival rate and a higher risk of recurrence than those without metastasis. So far, several learning methods have been proposed to predict cancer metastasis, but most of these methods are intended to predict lymph node metastasis rather than distant metastasis. Distant metastasis is more difficult to predict than lymph node metastasis because distant metastasis is detected after a comprehensive examination of the entire body, and there are not enough publicly available tumor samples with distant metastasis that can be used for training learning methods. Predicting distant metastatic sites is even more challenging than predicting whether distant metastasis will occur or not.
  Methods
  The problem of predicting distant metastatic sites is a multi‐class and multi‐label classification problem; there are more than two classes for distant metastatic sites (bone, brain, liver, lung, and other organs), and a single sample can have multiple labels for multiple metastatic sites. We transformed the multi‐label and multi‐class problem into multiple single‐label binary problems. For each metastatic site, we built a random forest model that deals with binary classification and linked the models along a chain.
  Results
  Testing the model on miRNA and mRNA expression datasets of several cancer types showed a high performance in all performance measures. In the comparison of our model with other methods, our method outperformed the others.
  Conclusion
  We developed a new method for predicting multiple metastatic sites using miRNA and mRNA expression data. The technique will be useful in predicting distant metastatic sites before distant metastasis occurs, which in turn will help clinicians determine treatment options for cancer patients.
  
  Add to my favourites
  
  Email this

Most Cited Most Cited RSS feed

- A Review of Ensemble Methods in Bioinformatics
  
  Authors: Pengyi Yang, Yee Hwa Yang, Bing B. Zhou and Albert Y. Zomaya
- Bioinformatics Tools for Mass Spectroscopy-Based Metabolomic Data Processing and Analysis
  
  Authors: Masahiro Sugimoto, Masato Kawakami, Martin Robert, Tomoyoshi Soga and Masaru Tomita
- Distance-based Support Vector Machine to Predict DNA N6- methyladenine Modification
  
  Authors: Haoyu Zhang, Quan Zou, Ying Ju, Chenggang Song and Dong Chen
- A Review on the Recent Developments of Sequence-based Protein Feature Extraction Methods
  
  Authors: Jun Zhang and Bin Liu
- Molecular Genetic Markers: Discovery, Applications, Data Storage and Visualisation
  
  Authors: Chris Duran, Nikki Appleby, David Edwards and Jacqueline Batley
- A Brief Survey of Machine Learning Methods in Protein Sub-Golgi Localization
  
  Authors: Wuritu Yang, Xiao-Juan Zhu, Jian Huang, Hui Ding and Hao Lin
- Cancer Diagnosis Through IsomiR Expression with Machine Learning Method
  
  Authors: Zhijun Liao, Dapeng Li, Xinrui Wang, Lisheng Li and Quan Zou
- Relevance of Molecular Docking Studies in Drug Designing
  
  Authors: Ritu Jakhar, Mehak Dangi, Alka Khichi and Anil K. Chhillar
- The Advances and Challenges of Deep Learning Application in Biological Big Data Processing
  
  Authors: Li Peng, Manman Peng, Bo Liao, Guohua Huang, Weibiao Li and Dingfeng Xie
- Gene Expression Profile Classification: A Review
  
  Authors: Musa H. Asyali, Dilek Colak, Omer Demirkaya and Mehmet S. Inan
More Less

Current Bioinformatics - Volume 20, Issue 5, 2025

Volume 20, Issue 5, 2025

Volumes & issues

Most Read This Month

Most Cited Most Cited RSS feed