Current Bioinformatics - Volume 20, Issue 5, 2025
Volume 20, Issue 5, 2025
-
-
EPI-HAN: Identification of Enhancer Promoter Interaction Using Hierarchical Attention Network
Authors: Fatma S. Ahmed, Saleh Aly and Xiangrong LiuBackgroundEnhancer-Promoter Interaction (EPI) recognition is crucial for understanding human development and transcriptional regulation. EPI in the genome plays a significant role in regulating gene expression. In Genome-Wide Association Studies (GWAS), EPIs help to improve the mechanistic understanding of disease- or trait-associated genetic variants.
MethodsExperimental methods for classifying EPIs are time-consuming and expensive. Consequently, there has been a growing emphasis on research focused on developing computational approaches that leverage deep learning and other machine learning techniques. One of the main challenges in EPI prediction is the long sequences of enhancers and promoters, which most existing computational approaches struggle with. This paper proposes a new deep learning model based on the Hierarchical Attention Network (HAN) for EPI detection. The proposed EPI-HAN model has two unique features: (i) a hybrid embedding strategy (ii) a hierarchical HAN structure comprising two attention layers that operate at both the individual token and smaller sequence levels.
ResultsIn benchmark comparisons, the EPI-HAN model demonstrates superior performance over state-of-the-art methods, as evidenced by AUROC and AUPR metrics for specific cell lines. Specifically, for the cell lines HeLa-S3, HUVEC, and NHEK, the AUROC values are 0.962, 0.946, and 0.987, respectively, and the AUPR values are 0.842, 0.724, and 0.926, respectively.
ConclusionThe comparative results indicate that our model surpasses other state-of-the-art models in three out of six cell lines. The superior performance in recognizing EPIs is attributed to the hierarchical structure of the attention mechanism.
-
-
-
Detection of DNA N6-Methyladenine Modification through SMRT-seq Features and Machine Learning Model
Authors: Yichu Guo, Yixuan Zhang, Xiaoqing Liu, Pingan He, Yuni Zeng and Qi DaiIntroductionN6-methyldeoxyadenine (6mA) is the most prevalent DNA modification in both prokaryotes and eukaryotes. While single-molecule real-time sequencing (SMRT-seq) can detect 6mA events at the individual nucleotide level, its practical application is hindered by a high rate of false positives.
MethodsWe propose a computational model for identifying DNA 6mA that incorporates comprehensive site features from SMRT-seq and employs machine learning classifiers.
ResultsThe results demonstrate that 99.54% and 96.55% of the identified DNA 6mA instances in C. reinhardtii correspond with motifs and peak regions identified by methylated DNA immunoprecipitation sequencing (MeDIP-seq), respectively. Compared to SMRT-seq, the proportion of predicted DNA 6mA instances within MeDIP-seq peak regions increases by 2% to 70% across the six bacterial strains.
ConclusionOur proposed method effectively reduces the false-positive rate in DNA 6mA prediction.
-
-
-
Rank Matrix Approach for Endometriosis: Integrating Data and Constructing Diagnostic Models
Authors: Ranze Xie, Deqing Hong, Jiaqi Yuan, Peng Xu, Wenbin Liu and Zheng YeBackgroundEndometriosis is a debilitating gynecological disorder characterized by chronic pain, infertility, and the growth of endometrial tissue outside the uterus. Accurate and early detection of this condition is crucial for effective management and treatment.
MethodsWe developed a gene rank matrix-based model to integrate endometriosis cohorts across multiple platforms. After removing batch effects, we identified 83 genes associated with endometriosis and further refined a diagnostic model using 11 of these genes. The model was trained on two platforms and validated on two others using SVM, Random Forest, Logistic Regression, and gradient-boosting machine learning algorithms.
ResultsThe integration via the gene rank matrix effectively mitigated batch effects. Utilizing a gradient boosting classifier with a subset of 11 genes, the model demonstrated commendable diagnostic efficacy, achieving an Area Under the Curve (AUC) of 0.77, an accuracy of 0.72, and an F1 score of 0.72 for the training dataset. When subjected to validation, the model maintained its performance, yielding an AUC of 0.769, an accuracy of 0.719, and an F1 score of 0.732. These 11 genes were found to be associated with immunosuppression.
ConclusionOur approach to integrating gene rank matrices effectively consolidates endometriosis data across diverse platforms. The diagnostic model, harnessing the predictive power of 11 specific genes, surpasses alternative models, thereby offering promising prospects for aiding clinical diagnosis of endometriosis. Further validation is imperative to elucidate the functional significance of these 11 genes. Our study underscores the potential of data integration coupled with machine learning techniques in advancing the diagnosis of intricate diseases, such as endometriosis.
-
-
-
Insights into Co-Expression Network Analysis of MicroProteins and their Target Transcription Factors in Plant Embryo Development
Authors: Khadijeh Shokri, Naser Farrokhi, Asadollah Ahmadikhah, Mehdi Safaeizadeh and Amir MousaviBackgroundGene expression is regulated in a spatiotemporal manner, and the roles of microProteins (MiPs) in this concept have started to become clear in plants.
MethodsHere, a microarray data analysis was carried out to decipher the spatiotemporal role of MiPs in embryo development. The guilt-by-association method was used to determine the corresponding regulatory factors.
ResultsModule network analyses and protein-protein interaction (PPI) assays suggested 13 modules for embryo development in the Arabidopsis model plant. Various biological processes such as metabolite biosynthesis, hormone transition and regulation, fatty acid and storage protein biosynthesis, and photosynthesis-related processes were prevalent. Different transcription factors (TFs) at different stages of embryo development were found and reviewed. Furthermore, 106 putative MiPs were identified that might be involved in the regulation of embryo development. Candidate hub MiPs (15) at embryo developmental stages were identified by PPI network analysis and their putative regulatory roles were discussed. Previously reported MiPs, AT1G14760 (KNOX), AT5G39860 (PRE1), and AT2G46410 (CPC), were noted to be present in modules M3 and M8.
ConclusionMolecular comprehension of regulatory factors including MiPs and TFs during embryo development allows targeted breeding of the corresponding traits and genome-based engineering of value-added new varieties.
-
-
-
Screening Analysis of Predictive Markers for Cytokine Release Syndrome Risk in CAR-T Cell Therapy
Authors: Jiayu Xu, Chengkui Zhao, Zhenyu Wei, Weixin Xie, Qi Cheng, Min Zhang, Shuangze Han, Liqing Kang, Nan Xu, Lei Yu and Weixing FengBackgroundChimeric Antigen Receptor (CAR)-T cell therapy has emerged as a highly effective treatment for hematological tumors. However, the associated adverse reaction, Cytokine Release Syndrome (CRS), poses a significant challenge. While numerous studies have investigated CRS biomarkers during CAR-T cell therapy, the ability to predict CRS risk prior to treatment initiation remains a crucial yet underexplored aspect.
ObjectiveThe primary purpose of this study was to address the issue of limited data, explore an alternative approach using public data to identify predictive markers for CRS risk assessment from RNA-Seq in pre-treatment patients data, and comprehend the inducible mechanisms underlying CRS.
MethodsWe integrated information from two public databases, the FDA Adverse Event Reporting System (FAERS) for adverse reaction reports of CAR-T cell therapy and the Cancer Genome Atlas (TCGA) for RNA-Seq data on corresponding hematological tumors. Candidate genes were screened by correlation analysis between Reported Odds Ratio (ROR) values and RNA-Seq gene expression levels, and then core factors were identified through stepwise analysis of pathway enrichment, cluster analysis, and protein interactions.
ResultsOur analysis highlighted the correlation between CRS risk and pre-treatment T cell activation/proliferation, identifying key genes (IFN-γ, IL1β, IL2, IL6, and IL10) as significant CRS indicators.
ConclusionThis study offers a unique perspective on predicting CRS risk before CAR-T cell therapy, circumventing the challenges of scarce clinical data by leveraging analysis of public databases. It elucidates the crucial role of T cell activation/proliferation dynamics in CRS. The analytical methods and identified markers provide a reference for the research and clinical application of CAR-T cell therapy.
-
-
-
Automatic Detection of Standard Planes in Fetal Ultrasound Images based on Convolutional Neural Networks and Ensemble Learning
Authors: Baoping Zhu, Fan Yang, Hongliang Duan and Zhipeng GaoIntroductionThe wide application of artificial intelligence in various fields has shown its potential to aid medical diagnosis. Ultrasound is an important tool used to evaluate fetal development and diagnose fetal diseases.
MethodsHowever, traditional diagnostic methods are time-consuming and laborious. Therefore, we constructed an end-to-end automatic diagnosis system based on convolutional neural networks using ensemble learning to improve the robustness and accuracy of the system.
ResultsThe system classifies the ultrasound image dataset into six categories, namely, abdomen, brain, femur, thorax, maternal cervix, and other planes.
ConclusionAfter experiments, the results showed that the proposed end-to-end system can considerably improve the detection accuracy of the standard plane.
-
-
-
DNA Binding Protein Prediction based on Multi-feature Deep Meta-transfer Learning
Authors: Chunliang Wang, Fanfan Kong, Yu Wang, Hongjie Wu and Jun YanBackgroundIn recent years, the rapid development of deep learning technology has had a significant impact on the prediction of DNA-binding proteins. Deep neural networks can automatically learn complex features in protein and DNA sequences, improving prediction accuracy and generalization capabilities.
ObjectiveThis article mainly establishes a meta-migration model and combines it with a deep learning model to predict DNA-binding proteins.
MethodsThis study introduces a meta-learning algorithm based on transfer learning, which helps achieve rapid learning and adaptation to new tasks. In addition, normalized Moreau-Broto autocorrelation attributes (NMBAC), position-specific scoring matrix-discrete cosine transform (PSSM-DCT), and position-specific scoring matrix-discrete wavelet transform (PSSM-DWT) are also used for feature extraction. Finally, the prediction of DBP is achieved through the deep neural network model based on the attention mechanism.
ResultsThis paper first establishes the basis of deep meta-transfer learning and uses the PDB186 data set as the benchmark to extract features using NMBAC, PSSM-DCT, and PSSM-DWT, respectively, and compare the fused features in pairs, and finally obtain the fused feature process. Through deep learning processing, it is concluded that the fused feature prediction effect is the best. At the same time, compared with the currently popular models, there are obvious improvements in the ACC, MCC, SN and Spec evaluation indicators.
ConclusionFinally, it was concluded that the method used in this article can effectively predict DNA-binding proteins and show more significant performance.
-
-
-
Biopanning Data Bank 2023: Updating and New Findings
Authors: Hamza B. Abagna, Bowen Li, Chunchao Pu, Yuqing Jiang, Yuwei Zhou, Bifang He and Jian HuangBackgroundBiopanning, or phage display technology, has gained considerable research attention for discovering peptides, and antibodies, and understanding protein interactions, which are crucial for developing targeted therapeutics. The Biopanning Data Bank (BDB, http://i.uestc.edu.cn/bdb) serves as a repository for peptide biopanning results. However, its last significant update was in 2018, highlighting a research gap that needs urgent attention.
ObjectivesThis study aims to update BDB with the most recent data and enhance the identification of Target-Unrelated Peptides (TUPs).
MethodsA search of PubMed was conducted for recent articles related to “phage display” published between January 2018 and May 2023. Relevant data were manually curated and added to BDB. Each peptide’s target was identified using MimoSearch, while TUPScan was used to detect new TUPs.
ResultsAs of October 2023, BDB contains 3,682 biopanning datasets from 1,771 papers. These datasets included 124 NGPD datasets and 3,558 conventional biopanning datasets, featuring 34,078 peptide sequences, 593 templates, 2,231 targets, 524 peptide libraries, and 324 crystal structures. Our analysis identified 1,110 possible TUPs and 60 highly reliable TUPs, including 26 novel discoveries.
ConclusionThis update addresses critical research gaps by incorporating recent peptide data and introducing novel TUPs. BDB remains the most comprehensive resource for biopanning, playing a crucial role in peptide library research and supporting the development of new TUP predictors and mimotope decoding tools.
-
-
-
Predicting Distant Metastatic Sites in Cancer Using miRNA and mRNA Expression Data
Authors: Dostonjon Mamatkarimov, Jiahui Kang and Kyungsook HanBackgroundCancer patients with metastasis face a much lower survival rate and a higher risk of recurrence than those without metastasis. So far, several learning methods have been proposed to predict cancer metastasis, but most of these methods are intended to predict lymph node metastasis rather than distant metastasis. Distant metastasis is more difficult to predict than lymph node metastasis because distant metastasis is detected after a comprehensive examination of the entire body, and there are not enough publicly available tumor samples with distant metastasis that can be used for training learning methods. Predicting distant metastatic sites is even more challenging than predicting whether distant metastasis will occur or not.
MethodsThe problem of predicting distant metastatic sites is a multi‐class and multi‐label classification problem; there are more than two classes for distant metastatic sites (bone, brain, liver, lung, and other organs), and a single sample can have multiple labels for multiple metastatic sites. We transformed the multi‐label and multi‐class problem into multiple single‐label binary problems. For each metastatic site, we built a random forest model that deals with binary classification and linked the models along a chain.
ResultsTesting the model on miRNA and mRNA expression datasets of several cancer types showed a high performance in all performance measures. In the comparison of our model with other methods, our method outperformed the others.
ConclusionWe developed a new method for predicting multiple metastatic sites using miRNA and mRNA expression data. The technique will be useful in predicting distant metastatic sites before distant metastasis occurs, which in turn will help clinicians determine treatment options for cancer patients.
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month
