Current Bioinformatics - Volume 16, Issue 4, 2021
Volume 16, Issue 4, 2021
-
-
The Prospect of Bioactive Peptide Research: A Review on Databases and Tools
Authors: FanYi Dong, GuiLing Zhao, Huige Tong, Zhenyuan Zhang, Xingzhen Lao and Heng ZhengBioactive peptides (BPs) are peptides with hormonal or pharmacological properties. They play a key role in growth, metabolism, disease, aging and death by affecting digestion, endocrine, cardiovascular, immune and nervous systems. They show the potential therapeutic effects on blood pressure-lowering (ACE inhibitory), anticancer, antithrombotic, antibacterial, anti-inflammatory, antioxidant, antiobesity, anti-genotoxic and immunomodulatory. Companied by the fast development and wide applications of DNA sequencing method, a wealth of bioactive peptide sequences accumulated through empirical and bioinformatics approaches or an integrated approach. To store and facilitate the usage of bioactive peptide data, a series of databases have been established that concerned about different aspects of BPs. A variety of information including sequence, source, biological activity, toxicity, physical-chemical property, and structure is stored in these databases. This review summarizes the latest development of BPs databases and briefly introduces the characteristics of different databases, to help readers to retrieve the required information more easily. In addition, it also includes sequence analysis, structural simulation and activity prediction tools, which may be helpful for the design and discovery of new bioactive peptides.
-
-
-
Scaling Method for Batch Effect Correction of Gene Expression Data Based on Spectral Clustering
Authors: Momo Matsuda, Xiucai Ye and Tetsuya SakuraiBackground: Batch effects are usually introduced in gene expression data, which can dramatically reduce the accuracy of statistical inference in the genomic analysis since samples in different batches cannot be directly comparable. Objective: To accurately measure biological variability and obtain correct statistical inference, we considered to correct/remove the batch effects for merging the samples from different batches into a comparable dataset for high-throughput genomic analysis. Methods: The existing L/S model uses the empirical Bayes methods to find the constant values for multiplication/addition for each gene. Different from the L/S model, we used the dimensionality reduction method. We proposed an effective scaling method to scale each gene by multiplying a constant value, which was formulated as an optimization problem based on spectral clustering. The data samples from different batches can be merged into a comparable dataset with batch effect correction. Furthermore, we proposed an approximation solution to solve the optimization problem for the scaling adjustment values. Results: We evaluated the proposed method on both artificial and gene expression datasets by comparing it with the existing well-established batch effect correction methods. Numerical experiments show that the proposed method projects the data samples from different batches to resemble each other and outperforms the others on both microarray and single-cell RNA-seq datasets. Conclusion: The scaling adjustment for genes and dimensionality reduction improved the accuracy and removed the batch effects, thereby making the proposed method more robust for interfering genes.
-
-
-
Potentiality of Risk SNPs Identification Based on GSP Theory
Authors: Hengyi Zhang and Qinli ZhangBackground: A large number of studies have shown that susceptibility to diseases may be related to some Single Nucleotide Polymorphisms (SNPs). Therefore, the location of SNPs associated with diseases in genes can help us understand the genetic mechanism of disease, intervene in risk SNPs and prevent some genetic diseases. Methods: Based on Graph Signal Processing (GSP) theory, a novel method is proposed to locate the risk SNPs in this paper. The proposed method first builds the graph signal model of all SNP loci, and then realizes the location of abnormal SNPs (risk SNPs) based on the joint analysis of the vertex domain and frequency domain of the graph. Results: The experimental results on synthetic datasets show that our method outperforms many existing methods, including BOOST, SNPHarvester, SNPRule, Random Forest (RF), Chi-square Test and LASSO regression in terms of power. The experimental results on two real Genome-Wide Association Studies (GWAS) datasets, Agerelated Macular Degeneration (AMD) and Genetic Disease A (GDA), show that our method not only finds the risk SNPs found by several state-of-the-art methods, including RF, Chi-square Test and LASSO regression, but also discovers three potential risk SNPs. Conclusion: Our method is suitable and effective for the identification of risk SNPs in GWAS.
-
-
-
A Constrained Probabilistic Matrix Decomposition Method for Predicting miRNA-disease Associations
Authors: Xinguo Lu, Yan Gao, Zhenghao Zhu, Li Ding, Xinyu Wang, Fang Liu and Jinxin LiBackground: MicroRNA is a type of non-coding RNA molecule whose length is about 22 nucleotides. The growing evidence shows that microRNA makes critical regulations in the development of complex diseases, such as cancers, and cardiovascular diseases. Predicting potential microRNA-disease associations can provide a new perspective to achieve a better scheme of disease diagnosis and prognosis. However, there is a challenge to predict some potential essential microRNAs only with few known associations. Objective: In this paper, we propose a novel method, named as a constrained strategy for predicting microRNA-disease associations called CPMDA, which can predict some potential essential microRNAs only with few known associations. Methods: We firstly construct a disease similarity network and microRNA similarity network to preprocess the microRNAs with none available associations. Then, we apply probabilistic factorization to obtain two feature matrices of microRNA and disease. Meanwhile, we formulate a similarity feature matrix as constraints in the factorization process. Finally, we utilize obtained feature matrixes to identify potential associations for all diseases. Result and Conclusion: The results indicate that CPMDA is superior over other methods in predicting potential microRNA-disease associations. Moreover, the evaluation shows that CPMDA has a strong effect on microRNAs with few known associations. In case studies, CPMDA also demonstrated the effectiveness to infer unknown microRNA-disease associations for those novel diseases and microRNAs.
-
-
-
HAMP: A Knowledge-base of Antimicrobial Peptides from Human Microbiome
Authors: Viswajit Mulpuru, Rahul Semwal, Pritish K. Varadwaj and Nidhi MishraBackground: Antimicrobial peptides (AMPs) can defend the hosts against various pathogens and are found in almost every life form from microorganisms to humans. As the rapid increase of drug-resistant strains in recent years is presenting a serious challenge to healthcare, antimicrobial peptides (AMPs) can revolutionize the antimicrobial development against the drugresistant microbes. Objective: The objective was to encourage the study on the human microbiome towards the inhibition of drug-resistant bacteria by the development of a database containing antimicrobial peptides from the human microbiome. Methods: This database is an outcome of an extended analysis of human metagenome, involving the prediction of coding regions, extraction of peptides, prediction of antimicrobial peptides, and modeling their structure utilizing different in silico tools. Furthermore, an intelligent hash function-based query engine was designed to validate the novelty of specific candidate peptide over the reported Knowledge-base. Results and Discussion: This Knowledge-base currently focuses on antimicrobial peptide sequences (AMPs) predicted from the human microbiome along with their 3D structures modeled using various modeling and molecular dynamics approaches. It includes a total of 1087 unique AMPs from various body sites, with 454 AMPs from the oral cavity, 180 AMPs from the gastrointestinal tract, 42 AMPs from the skin, 12 AMPs from the airway, 6 AMPs from the urogenital tract and 393 AMPs from undefined body locations. A scoring matrix has been generated based on the similarity scores of the sequences that have been incorporated into the Knowledge-base. Furthermore, a Jmol applet is included in the website to help users visualize the 3D structures. Conclusion: The information and functions of the Knowledge-base can offer great help in finding novel antimicrobial drugs, especially towards finding inhibitors for drug-resistant bacteria. The HAMP is freely available at https://bioserver.iiita.ac.in/amp/index.html.
-
-
-
Delineating Characteristic Sequence and Structural Features of Precursor and Mature Piwi-interacting RNAs of Epithelial Ovarian Cancer
Authors: Garima Singh, Arpit C. Swain and Bibekanand MallickBackground: Piwi-interacting RNAs (piRNAs) are an amazing class of small noncoding RNAs (sncRNAs) known for its promising role in germline and somatic cells. Myriad functional studies have been performed to unveil the true potential of this class of ncRNAs; however, global features encoded in their sequence and structure have not been explored. Objectives: We aim to identify the sequence and structural level characteristic features of piRNAs of normal ovary (NO), and two subtypes of epithelial ovarian cancer (EOCa)- endometrioid (ENOCa), and serous ovarian cancer (SOCa) that we had reported earlier and their precursors. Methods: We have performed sequence analysis of mature piRNAs and their upstream/downstream regions as well as structural analysis of precursor piRNAs (pre-piRNAs) by examining their minimal folding energy (MFE), adjusted minimal folding energy (AMFE) and minimal folding energy index (MFEI) etc. using in silico approaches. Results: We observed enrichment of U at first position and G at several other positions of mature piRNAs, which might be associated with the processing of mature piRNAs similar to what is seen in the case of miRNAs and strong target binding, respectively. In addition, we found the richness of AU in and around 20 nts upstream and downstream of precursor piRNAs (pre-piRNAs). This characteristic feature of pre-piRNAs possibly contributes to lower MFE compared to random sequences and make its secondary structure less stable which decides biogenesis of piRNAs. We also noticed that MFE, AMFE and MFEI of pre-piRNAs are comparatively less than pre-miRNAs of metazoans, plants and viruses reported in other studies, which clearly discriminate pre-piRNAs from other RNA sequences including pre-miRNAs of other organisms. Conclusion: In summary, the present study reveals key characteristic features encoded within and around mature piRNAs as well as pre-piRNAs of NO and EOCa samples that distinguish piRNAs from miRNAs and other random RNA sequences. These findings might act as a cornerstone for a better understanding of biogenesis and function of piRNAs as well as will aid in easier identification of new piRNAs from unknown stretch of sequences using the characteristic features.
-
-
-
Expression Profiles and Bioinformatics Analysis of Full-length circRNA Isoforms in Gliomas
Authors: Jing Wu and Xiaofeng SongBackground: Circular RNAs (circRNAs) are a newly discovered type of non-coding RNA, which have been demonstrated to act as microRNA (miRNA) “sponges” to modulate gene expression. Emerging evidence has confirmed that circRNAs take part in many biological processes in a variety of malignant tumors, including gliomas, suggesting that they could serve as biomarkers or therapeutic targets for tumors. The purpose of this study was to explore the roles of circRNAs in gliomas and to provide valuable clues for clinical diagnosis and treatment. Methods: RNA-seq data with poly(A)-/RNase R treatment was employed to investigate the expression profiles of circRNAs in tumor and paracancerous tissues derived from glioma patients. CircAST was used for full-length circRNA reconstruction and quantification. Bioinformatics analyses, including GO enrichment and KEGG pathway analyses, were performed to reveal the potential biological process and pathways of their host genes. A circRNA-miRNA interaction network was constructed to depict the interactions of the dysregulated circRNA transcripts with miRNAs. Results: A total of 20,474 circular transcripts that originated from 16,022 circRNAs were successfully reconstructed in the samples. We detected 646 upregulated and 112 downregulated circular transcripts in tumor tissues compared with paracancerous tissues. GO analysis revealed that their host genes might be related to positive regulation of GTPase activity, regulation of synaptic transmission, and glutamatergic and dendrite morphogenesis in the cytoplasm and cytosol. KEGG pathway analysis showed that the glutamatergic synapse, neurotrophin signaling pathway, and ErbB signaling pathway might be linked to the occurrence and development of gliomas. Conclusion: Our study revealed a comprehensive profile of differentially expressed circRNA transcripts in gliomas, indicating that aberrantly expressed circRNAs might play important roles in the occurrence and development of human gliomas.
-
-
-
A Cross-entropy-based Method for Essential Protein Identification in Yeast Protein-protein Interaction Network
Authors: Weimiao Sun, Lei Wang, Jiaxin Peng, Zhen Zhang, Tingrui Pei, Yihong Tan, Xueyong Li and Zhiping ChenBackground: Research has shown that essential proteins play important roles in the development and survival of organisms. Because of the high costs of traditional biological experiments, several computational prediction methods based on known protein-protein interactions (PPIs) have been recently proposed to detect essential proteins. Objective: Here, a novel prediction model called IoMCD is proposed to identify essential proteins by combining known PPIs with a variety of biological information about proteins, including gene expression data and homologous information of proteins. Methods: Compared to the traditional state-of-the-art prediction models, IoMCD involves two kinds of weights that are obtained, respectively, by extracting topological features of proteins from the original known protein–protein interaction (PPI) networks and calculating the Pearson correlation coefficients (PCCs) between the gene expression data of proteins. Based on these two kinds of weights and adopting a cross-entropy method, a unique weight is assigned to each protein. Subsequently, the homologous information of proteins is used to calculate an initial score for each protein. Finally, based on the unique weights and initial score of proteins, an iterative method is designed to measure the essentialities of proteins. Results: Intensive experiments were performed, and simulation results showed that the prediction accuracy of IoMCD, based on the dataset downloaded from the DIP and Gavin databases, was 92.16% and 89.71%, respectively, in the top 1% of the predicted essential proteins. Conclusion: Both simulation results demonstrated that IoMCD can achieve excellent prediction accuracy and could be an effective method for essential protein prediction.
-
-
-
PsePSSM-based Prediction for the Protein-ATP Binding Sites
Authors: Li Qian, Yu Jiang, Yan Y. Xuan, Chen Yuan and Tan SiQiaoBackground: Predicting the protein-ATP binding sites is a highly unbalanced binary classification problem, and higher precision prediction through the machine learning methods is of great significance to the researches on proteins’ functions and the design of drugs. Objective: Most existing researches typically select 17aa as the length of window by experience, and extract features by the Position-specific Scoring Matrix (PSSM), and then construct models predicting with SVC. However, the independent prediction values obtained in these researches are either over-high (ACC) or lower (MCC), and there is therefore a larger improvement room in the prediction precision. Methods: This paper utilizes the mutual information, I, to define the window length of 15aa, and the Pseudo Position Specific Scoring Matrix (PsePSSM), which is more fault-tolerance, to extract the features, and then train multiple 1:1 SVC classifiers to model, and finally perform the simple votings. Results: The prediction results over two protein-ATP binding site datasets, the ATP168 and the ATP227, are totally superior to the independent prediction results obtained in the Reference Feature Extraction Approach. And in our approach, the MCC values are respectively improved, from the range of 0.3110 ∼ 0.5360 and the range of 0.3060 ∼ 0.553, to 0.7512 and 0.7106. Conclusion: Further, we explain why the PsePSSM approach is more fault-tolerance. This approach has a promising application prospect in the feature-extraction of protein sequences.
-
-
-
Colorectal Cancer Classification and Survival Analysis Based on an Integrated RNA and DNA Molecular Signature
Authors: Mohanad Mohammed, Henry Mwambi and Bernard OmoloBackground: Colorectal cancer (CRC) is the third most common cancer among women and men in the USA, and recent studies have shown an increasing incidence in less developed regions, including Sub-Saharan Africa (SSA). We developed a hybrid (DNA mutation and RNA expression) signature and assessed its predictive properties for the mutation status and survival of CRC patients. Methods: Publicly-available microarray and RNASeq data from 54 matched formalin-fixed paraffin-embedded (FFPE) samples from the Affymetrix GeneChip and RNASeq platforms, were used to obtain differentially expressed genes between mutant and wild-type samples. We applied the support-vector machines, artificial neural networks, random forests, k-nearest neighbor, naïve Bayes, negative binomial linear discriminant analysis, and the Poisson linear discriminant analysis algorithms for classification. Cox proportional hazards model was used for survival analysis. Results: Compared to the genelist from each of the individual platforms, the hybrid genelist had the highest accuracy, sensitivity, specificity, and AUC for mutation status, across all the classifiers and is prognostic for survival in patients with CRC. NBLDA method was the best performer on the RNASeq data, while the SVM method was the most suitable classifier for CRC across the two data types. Nine genes were found to be predictive of survival. Conclusion: This signature could be useful in clinical practice, especially for colorectal cancer diagnosis and therapy. Future studies should determine the effectiveness of integration in cancer survival analysis and the application on unbalanced data, where the classes are of different sizes, as well as on data with multiple classes.
-
-
-
An Automated Model for Target Protein Prediction in PPI
Authors: G. N. Sundar and D. NarmadhaBackground: Essential proteins play a crucial role in most of the living organisms. The computer-based task of predicting essential proteins is important for target protein identification, disease treatment and suitable drug development. Objective: Traditionally, many experimental and centrality measures have been proposed by researchers to predict protein essentiality. Methods: The prediction accuracy, sensitivity, and specificity identified by traditional methods is very low. Results and Discussion: In this research work, a novel computational based approach such as NCKNN model has been proposed to identify the essential proteins. The proposed work uses a combination of network topology measure and machine learning model to predict the essential proteins. Conclusion: The proposed work shows a remarkable improvement than seven traditional centrality based measures such as DC, BC, CC, EC, NC, ECC and SC in terms of the metrics such as accuracy (A1), precision (P1), recall (R1), sensitivity (SE) and specificity (SP).
-
-
-
Artificial Neural Network Models for Coronary Artery Disease
Background: Coronary artery disease (CAD) is an important cause of mortality and morbidity globally. Objective: The early prediction of the CAD would be valuable in identifying individuals at risk, and in focusing resources on its prevention. In this paper, we aimed to establish a diagnostic model to predict CAD by using three approaches of ANN (pattern recognition-ANN, LVQ-ANN, and competitive ANN). Methods: One promising method for early prediction of disease based on risk factors is machine learning. Among different machine learning algorithms, the artificial neural network (ANN) algorithms have been applied widely in medicine and a variety of real-world classifications. ANN is a non-linear computational model that is inspired by the human brain to analyze and process complex datasets. Results: Different methods of ANN that are investigated in this paper indicate in both pattern recognition ANN and LVQ-ANN methods, the predictions of Angiography+ class have high accuracy. Moreover, in CNN, the correlations between the individuals in cluster ”c” with the class of Angiography+ are strongly high. This accuracy indicates the significant difference among some of the input features in Angiography+ class and the other two output classes. A comparison among the chosen weights in these three methods in separating control class and Angiography+ shows that hs-CRP, FSG, and WBC are the most substantial excitatory weights in recognizing the Angiography+ individuals although, HDL-C and MCH are determined as inhibitory weights. Furthermore, the effect of decomposition of a multi-class problem to a set of binary classes and random sampling on the accuracy of the diagnostic model is investigated. Conclusion: This study confirms that pattern recognition-ANN had the most accuracy of performance among different methods of ANN. This is due to the back-propagation procedure in which the network classifies input variables based on labeled classes. The results of binarization show that decomposition of the multi-class set to binary sets could achieve higher accuracy.
-
-
-
Continual Cough: Experience and Lessons from a Case of Bronchial Adenoid Cystic Carcinoma
Authors: Lin Sheng, Junwei Tu, Yijun Sheng, Jingqian Zhu, Huijun Chen, Jianghua Tian, Lixia Wang and Chuli PanBackground: Primary tracheal adenoid cystic carcinoma is a rare, slow-growing pulmonary malignancy. Due to the low incidence, clinicians are unable to diagnose and treat such disease, which is prone to cause misdiagnosis or missed diagnosis, consequently leading to delayed treatment. Case Presentation: Here, we reported a case of a 72-year-old woman who was diagnosed as primary bronchial adenoid cystic carcinoma after three years. At the time of the final diagnosis, lesion involvement was seen in the entire bronchus, and radical treatment was not available. Conclusions: Endoscopic bronchoscopy and palliative radiotherapy can relieve the symptoms of the patient and make the patient survive with the tumor for a long time.
-
Volumes & issues
-
Volume 20 (2025)
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)
Most Read This Month
