Current Bioinformatics - Volume 20, Issue 9, 2025
Volume 20, Issue 9, 2025
-
-
A Novel Multitask Association Analysis Model with Deep Self-reconstruction for Diagnosis of Alzheimer’s Disease
More LessAuthors: Tian-Ru Wu, Ying-Lian Gao, Qian-Qian Ren, Xinchun Cui, Sha-Sha Yuan and Jin-Xing LiuBackgroundWith the development of brain imaging technology and genotyping technology, the brain imaging genetic method has become a powerful means to investigate the pathogenesis of Alzheimer's disease (AD). However, AD generally exhibits progression, multiplicity, and intricacy, and different diagnostic groups may carry different biomarkers. At the same time, traditional models often ignore the nonlinear relationship and inherent topological characteristics of brain imaging genetic data.
ObjectiveTherefore, developing a more reliable method to identify diagnosis-specific genotypes and phenotypes is indispensable for exploring the pathogenesis of AD. In this paper, a novel deep self-reconstruction multitask association analysis (DS-MTAA) method is proposed for AD-related biomarkers extraction and AD classification.
MethodsFirst, a deep neural network is designed to learn the nonlinear relationships between samples. Also, the self-expression idea based on hypergraph regularization is utilized to perform subspace clustering on the output of the network. Then, a multitask model consisting of sparse canonical correlation analysis and regular logistic regression is constructed, in which each task is responsible for learning a diagnosis-specific genotype-phenotype pattern.
ResultsFinally, the RobustBoost classifier is employed to perform the classification experiments under 5-fold cross-validations. The experimental results show that DS-MTAA can achieve better classification performance than other advanced comparison methods and identify more effective brain biomarkers and genetic markers that are strongly associated with diseases.
ConclusionTherefore, it can be concluded that a novel multitask association analysis model with deep self-reconstruction for the diagnosis of Alzheimer’s Disease can further understand the pathogenesis of AD.
-
-
-
Predicting Drug-target Binding Affinity Based on Graph Isomorphism Network and iTransformer
More LessAuthors: Weirong Cui, Jing Qian, Xiaojun Yao, Guang Hu and Henry H.Y. TongBackgroundDeep learning models have gained significant traction in predicting drug-target binding affinity, primarily focusing on deciphering intricate drug-target relationships. However, these models often overlook intermediate representations, thus failing to capture the holistic characteristics of proteins crucial for discerning drug-target interactions.
MethodsThis study proposes a novel deep-learning model that captures comprehensive and long-range dependencies within protein sequences. Leveraging deep feature engineering and an inverted Transformer module, it integrates multi-scale chemical information of drug molecules using graph neural networks and hierarchical attention mechanisms.
ResultsThe proposed model achieves state-of-the-art performance across multiple drug-target interaction datasets. It obtains MSE losses of 0.229 and 0.162 on the Davis and KIBA datasets, respectively, and AUC scores of 0.982 and 0.985 on the Human and C. elegans datasets.
ConclusionThese results demonstrate the model's superior efficacy in predicting drug-target affinity and interactions, showcasing its potential to expedite drug discovery processes.
-
-
-
Robust Somatic Copy Number Estimation using Coarse-to-fine Segmentation
More LessIntroductionCancers routinely exhibit chromosomal instability that results in copy number variants (CNVs), namely changes in the abundance of genomic material. Unfortunately, the detection of these variants in cancer genomes is difficult.
MethodsWe present Ploidetect, a software package that effectively identifies CNVs within whole-genome sequenced tumors. Ploidetect utilizes a coarse-to-fine segmentation approach which yields highly contiguous segments while allowing for focal CNVs to be detected with high sensitivity.
ResultsWe benchmark Ploidetect against popular CNV tools using synthetic data, cell line data, and real-world metastatic tumor data and demonstrate strong performance in all tests. We show that high quality CNVs from Ploidetect enable the identification of recurrent homozygous deletions and genes associated with chromosomal instability in a multi-cancer cohort of 687 patients. Using highly contiguous CNV calls afforded by Ploidetect, we also demonstrate the use of segment N50 as a novel metric for the measurement of chromosomal instability within tumor biopsies.
ConclusionWe propose that the increasingly accurate determination of CNVs is critical for their productive study in cancer, and our work demonstrates advances made possible by progress in this regard.
-
-
-
A Low Transformed Tubal Rank Tensor Model Using a Spatial-Tubal Constraint for Sample Clustering with Cancer Multi-omics Data
More LessAuthors: Sheng-Nan Zhang, Ying-Lian Gao, Yu-Lin Zhang, Junliang Shang, Chun-Hou Zheng and Jin-Xing LiuBackgroundSince each dimension of a tensor can store different types of genomics data, compared to matrix methods, utilizing tensor structure can provide a deeper understanding of multi-dimensional data while also facilitating the discovery of more useful information related to cancer. However, in reality, there are issues such as insufficient utilization of prior knowledge in multi-omics data and limitations in the recovery of low-tubal-rank tensors. Therefore, the method proposed in this article was developed.
ObjectiveIn this paper, we proposed a low transformed tubal rank tensor model (LTTRT) using a spatial-tubal constraint to accurately partition different types of cancer samples and provide reliable theoretical support for the identification, diagnosis, and treatment of cancer.
MethodsIn the LTTRT method, the transformed tensor nuclear norm based on the transformed tensor singular value decomposition is characterized by the low-rank tensor, which can explore the global low-rank property of the tensor, resolving the challenge of the tensor nuclear norm-based method not achieving the lowest tubal rank. Additionally, the introduction of weighted total variation regularization is conducive to extracting more information from sequencing data in both spatial and tubal dimensions, exploring cross-correlation features of multiple genomic data, and addressing the problem of overlooking prior knowledge from various perspectives. In addition, the L1-norm is used to improve sparsity. A symmetric Gauss‒Seidel-based alternating direction method of multipliers (sGS-ADMM) is used to update the LTTRT model iteratively.
ResultsThe experiments of sample clustering on multiple integrated cancer multi-omics datasets show that the proposed LTTRT method is better than existing methods. Experimental results validate the effectiveness of LTTRT in accurately partitioning different types of cancer samples.
ConclusionThe LTTRT method achieves precise segmentation of different types of cancer samples.
-
-
-
PredPVP: A Stacking Model for Predicting Phage Virion Proteins Based on Feature Selection Methods
More LessAuthors: Qian Cao, Xufeng Xiao, Yannan Bin, Jianping Zhao and Chunhou ZhengBackgroundPhage therapy has a broad application prospect as a novel therapeutic method, and Phage Virion Proteins (PVP) can recognize the host and bind to surface receptors, which is of great significance for the development of antimicrobial drugs for the treatment of infectious diseases caused by bacteria. In recent years, several PVP predictors based on machine learning have been developed, which usually use a single feature to train the learner. In contrast, higher dimensional feature representations tend to contain more potential sequence information.
MethodsIn this work, we construct a stacking model PredPVP for PVP prediction by combining multiple features and using feature selection methods. Specifically, the sequence is first encoded using seven features. For this high-dimensional feature representation, three feature selection methods were utilized to remove redundant features, then integrated with eight machine learning algorithms. Finally, probability features and class features (PCFs) generated by 24 base models were put into logistic regression (LR) to train the model.
ResultsThe results of the independent test set indicate that PredPVP has higher performance compared to other existing predictors, with an AUC of 93.4%.
ConclusionWe expect PredPVP to be used as a tool for large-scale PVP recognition, providing a new way for the development of novel antimicrobials and accelerating its application in actual treatment. The datasets and source codes used in this study are available at https://github.com/caoqian23/PredPVP.
-
-
-
Graph-Root: Prediction of Root-Associated Proteins in Maize, Sorghum, And Soybean Based on Graph Convolutional Network and Network Embedding Method
More LessAuthors: Bo Zhou, Siyang Liu, Lei Chen and Qi DaiBackgroundThe root system plays an irreplaceable role in plant growth. Its improvement can increase crop productivity. However, such a system is still mysterious for us. The underlying mechanism has not been fully uncovered. The investigation on proteins related to the root system is an important means to complete this task. In the previous time, lack of root-related proteins makes it impossible to adopt machine learning methods for designing efficient models for the discovery of novel root-related proteins. Recently, a public database on root-related proteins was set up and machine learning methods can be applied in this field.
ObjectiveThe purpose of this study was to design an efficient computational method to predict root-associated proteins in three plants: maize, sorghum, and soybean.
MethodsIn this study, we proposed a machine learning based model, named Graph-Root, for the identification of root-related proteins in maize, sorghum, and soybean. The features derived from protein sequences, functional domains, and one network were extracted, where the first type of features were processed by graph convolutional neural network and multi-head attention, the second type of features reflected the essential functions of proteins, and the third type of features abstracted the linkage between proteins. These features were fed into the fully connected layer to make predictions.
ResultsThe 5-fold cross-validation and independent tests suggested its acceptable performance. It also outperformed the only previous model, SVM-Root. Furthermore, the importance of each feature type and component in the proposed model was investigated.
ConclusionGraph-Root had a good performance and can be a useful tool to identify novel root-related proteins. BLOSUM62 features were found to be important in determining root-related proteins.
-
-
-
A Graphlet-based Explanation Generator for Graph Neural Networks Over Biological Datasets
More LessAuthors: Selinay Cetin and Emre SeferIntroduction/ ObjectiveGraph neural networks’ (GNNs) explainability, especially the explanation of edges and interactions among vertices in GNNs, is demanding mainly owing to dynamics and groupings between vertices. The existing graph explainability methods ignore the analysis of the following tasks weights over subgraphs but instead analyze solely sample-level explainability. Such sample-level explainability decreases their generalizability since it directly searches the explaining behaviour in the input dataset. In this study, we come up with a novel Orbit-based GNN explainer (OExplainer), which integrates both sample-level and method-level approaches over a predetermined set of subgraphs. As part of such analysis of subgraphs, our goal is to interpret graphs more comprehensively and intelligibly while providing each vertex’s explainability score for a particular graph instance.
MethodsOur OExplainer decomposes the following graph neural network weights into explaining subgraph bases while identifying and characterizing particular predictions. By such characterization, we can carefully and accurately interpret the predetermined graph orbit’s role in vertex representation determination. In this characterization, we can also clarify the method’s behaviour generally for the whole input dataset. Moreover, we come up with novel vertex-specific scores in our subgraph-based approach over nonisomorphic graphlets. Such vertex-specific score encourages sample-level vertex improvement, and such improvement is related to the graph neural network’s vertex classification task.
ResultsOur experiments over simulated datasets confirm the importance and criticality of method weights in vertex classification explanation. In this case, method weight decomposition also has criticality. Our detailed experiments over multiple real protein-protein interaction datasets and metabolic interaction networks also exhibit enhanced performance in vertex classification.
ConclusionIn both simulated and biological protein-protein interaction datasets, our approach outperforms the competing explanation approaches.
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month