Volume 15, Issue 5

Current Bioinformatics - Volume 15, Issue 5, 2020

Volume 15, Issue 5, 2020

- A Review of Pathway Databases and Related Methods Analysis
  
  Authors: Ali Ghulam, Xiujuan Lei, Min Guo and Chen Bian
  
  https://doi.org/10.2174/1574893614666191018162505
  More Less
  
  Pathway analysis integrates most of the computational tools for the investigation of high-level and complex human diseases. In the field of bioinformatics research, biological pathways analysis is an important part of systems biology. The molecular complexities of biological pathways are difficult to understand in human diseases, which can be explored through pathway analysis. In this review, we describe essential information related to pathway databases and their mechanisms, algorithms and methods. In the pathway database analysis, we present a brief introduction on how to gain knowledge from fundamental pathway data in regard to specific human pathways and how to use pathway databases and pathway analysis to predict diseases during an experiment. We also provide detailed information related to computational tools that are used in complex pathway data analysis, the roles of these tools in the bioinformatics field and how to store the pathway data. We illustrate various methodological difficulties that are faced during pathway analysis. The main ideas and techniques for the pathway-based examination approaches are presented. We provide the list of pathway databases and analytical tools. This review will serve as a helpful manual for pathway analysis databases.
  
  Add to my favourites
  
  Email this

- Identification of Lysine Carboxylation Sites in Proteins by Integrating Statistical Moments and Position Relative Features via General PseAAC
  
  Authors: Saba Amanat, Adeel Ashraf, Waqar Hussain, Nouman Rasool and Yaser D. Khan
  
  https://doi.org/10.2174/1574893614666190723114923
  More Less
  
  Background: Carboxylation is one of the most biologically important post-translational modifications and occurs on lysine, arginine, and glutamine residues of a protein. Among all these three, the covalent attachment of the carboxyl group with the lysine side chain is the most frequent and biologically important type of carboxylation. For studying such biological functions, it is essential to correctly determine the lysine sites sensitive to carboxylation. Objective: Herein, we present a computational model for the prediction of the carboxylysine site which is based on machine learning. Methods: Various position and composition relative features have been incorporated into the Pse- AAC for construction of feature vectors and a neural network is employed as a classifier. The model is validated by jackknife, cross-validation, self-consistency, and independent testing. Results: The results of the self-consistency test elaborated that model has 99.76% Acc, 99.76% Sp, 99.76% Sp, and 0.99 MCC. Using the jackknife method, prediction model validation gave 97.07% Acc, while for 10-fold cross-validation, prediction model validation gave 95.16% Acc. Conclusion: The results of independent dataset testing were 94.3% which illustrated that the proposed model has better performance as compared to the existing model PreLysCar; however, the accuracy can be improved further, in the future, due to the increasing number of carboxylysine sites in proteins.
  
  Add to my favourites
  
  Email this

- APP Medical Diagnostic Check-up Consultation System Based on Speech Recognition
  
  Authors: Zhi Li, Yusen Wang, Shiwen Tai, Jingquan Wang, Yusong Huang, Wu Jiang and Hao Zhang
  
  https://doi.org/10.2174/1574893614666191105161335
  More Less
  
  Background: Medical test orders can display the physiological functions of patients by using medical means. The medical staff determines the patient's condition through medical test orders and completes the treatment. However, for most patients and their families, there are so many terminologies in the medical test list and they are inconvenient to understand and query, which would affect the patients’ cognition and treatment effect. Therefore, it is especially necessary to develop a consulting system that can provide related analysis after getting medical test data. Objective: This paper starts with information acquisition and speech recognition. It proposes a natural scene information acquisition and analysis model based on deep learning, focusing on improving the recognition rate of routine test list and achieving targeted smart search to allow users to get more accurate personalized health advice. Methods: Based on medical characteristics, considering the needs of patients, this paper constructs an APP-based conventional medical test consultation system, using artificial intelligence and voice recognition technology to collect user input; analyzing user needs with the help of conventional medical information knowledge database. Results: This model combines speech recognition and data mining methods to obtain routine test list data and is suitable for accurate analysis of problems in routine check-up procedure. The app provides effective explanations and guidance for the treatment and rehabilitation of patients. Conclusion: It organically links the Internet with personalized medicine, which can effectively improve the popularity of medical knowledge and provide a reference for the application of medical services on the Internet. Meanwhile, this app can contribute to the improvement of medical standards and provide new models for modern medical management.
  
  Add to my favourites
  
  Email this

- ZFARED: A Database of the Antioxidant Response Elements in Zebrafish
  
  Authors: Azhwar Raghunath, Raju Nagarajan and Ekambaram Perumal
  
  https://doi.org/10.2174/1574893614666191018172213
  More Less
  
  Background: Antioxidant Response Elements (ARE) play a key role in the expression of Nrf2 target genes by regulating the Keap1-Nrf2-ARE pathway, which offers protection against toxic agents and oxidative stress-induced diseases. Objective: To develop a database of putative AREs for all the genes in the zebrafish genome. This database will be helpful for researchers to investigate Nrf2 regulatory mechanisms in detail. Methods: To facilitate researchers functionally characterize zebrafish AREs, we have developed a database of AREs, Zebrafish Antioxidant Response Element Database (ZFARED), for all the protein-coding genes including antioxidant and mitochondrial genes in the zebrafish genome. The front end of the database was developed using HTML, JavaScript, and CSS and tested in different browsers. The back end of the database was developed using Perl scripts and Perl-CGI and Perl- DBI modules. Results: ZFARED is the first database on the AREs in zebrafish, which facilitates fast and efficient searching of AREs. AREs were identified using the in-house developed Perl algorithms and the database was developed using HTML, JavaScript, and Perl-CGI scripts. From this database, researchers can access the AREs based on chromosome number (1 to 25 and M for mitochondria), strand (positive or negative), ARE pattern and keywords. Users can also specify the size of the upstream/promoter regions (5 to 30 kb) from transcription start site to access the AREs located in those specific regions. Conclusion: ZFARED will be useful in the investigation of the Keap1-Nrf2-ARE pathway and its gene regulation. ZFARED is freely available at http://zfared.buc.edu.in/.
  
  Add to my favourites
  
  Email this

- Hardware Performance Evaluation of De novo Transcriptome Assembly Software in Amazon Elastic Compute Cloud
  
  Authors: Fernando Mora-Márquez, José L. Vázquez-Poletti, Víctor Chano, Carmen Collada, Álvaro Soto and Unai López de Heredia
  
  https://doi.org/10.2174/1574893615666191219095817
  More Less
  
  Background: Bioinformatics software for RNA-seq analysis has a high computational requirement in terms of the number of CPUs, RAM size, and processor characteristics. Specifically, de novo transcriptome assembly demands large computational infrastructure due to the massive data size, and complexity of the algorithms employed. Comparative studies on the quality of the transcriptome yielded by de novo assemblers have been previously published, lacking, however, a hardware efficiency-oriented approach to help select the assembly hardware platform in a cost-efficient way. Objective: We tested the performance of two popular de novo transcriptome assemblers, Trinity and SOAPdenovo-Trans (SDNT), in terms of cost-efficiency and quality to assess limitations, and provided troubleshooting and guidelines to run transcriptome assemblies efficiently. Methods: We built virtual machines with different hardware characteristics (CPU number, RAM size) in the Amazon Elastic Compute Cloud of the Amazon Web Services. Using simulated and real data sets, we measured the elapsed time, cost, CPU percentage and output size of small and large data set assemblies. Results: For small data sets, SDNT outperformed Trinity by an order the magnitude, significantly reducing the time duration and costs of the assembly. For large data sets, Trinity performed better than SDNT. Both the assemblers provide good quality transcriptomes. Conclusion: The selection of the optimal transcriptome assembler and provision of computational resources depend on the combined effect of size and complexity of RNA-seq experiments.
  
  Add to my favourites
  
  Email this

- Genetic Algorithm-based Feature Selection Approach for Enhancing the Effectiveness of Similarity Searching in Ligand-based Virtual Screening
  
  Authors: Fouaz Berrhail and Hacene Belhadef
  
  https://doi.org/10.2174/1574893614666191119123935
  More Less
  
  Background: In the last years, similarity searching has gained wide popularity as a method for performing Ligand-Based Virtual Screening (LBVS). This screening technique functions by making a comparison of the target compound’s features with that of each compound in the database of compounds. It is well known that none of the individual similarity measures could provide the best performances each time pertaining to an active compound structure, representing all types of activity classes. In the literature, we find several techniques and strategies that have been proposed to improve the overall effectiveness of ligand-based virtual screening approaches. Objective: In this work, our main objective is to propose a features selection approach based on genetic algorithm (FSGASS) to improve similarity searching pertaining to ligand-based virtual screening. Methods: Our contribution allows us to identify the most important and relevant characteristics of chemical compounds and to minimize their number in their representations. This will allow the reduction of features space, the elimination of redundancy, the reduction of training execution time, and the increase of the performance of the screening process. Results: The obtained results demonstrate superiority in the performance compared with these obtained with Tanimoto coefficient, which is considered as the most widely coefficient to quantify the similarity in the domain of LBVS. Conclusion: Our results show that significant improvements can be obtained by using molecular similarity research methods at the basis of features selection.
  
  Add to my favourites
  
  Email this

- A Sequence-segment Neighbor Encoding Schema for Protein Hotspot Residue Prediction
  
  Authors: Peng Chen, Tong Shen, Youzhi Zhang and Bing Wang
  
  https://doi.org/10.2174/1574893615666200106115421
  More Less
  
  Background: Hotspots are those residues that contribute major free energy of binding in protein-protein interactions. Protein functions are frequently dependent on hotspot residues. At present, hotspot residues are always identified by Alanine scanning mutagenesis technology, which is costly, time-consuming and laborious. Objective: Therefore, more accurate and efficient methods have to be developed to identify protein hotspot residues. Methods: This paper proposed a novel encoding schema of sequence-segment neighbors and constructed a random forest-based model to identify hotspots in protein interaction interfaces. Firstly, 10 amino acid physicochemical properties, 16 features related to the PI and DI, and 25 features related to ASA were extracted. Different from the previous residue encoding schemas, such as auto correlation descriptor or triplet combination information, this paper employed the influence of amino acids neighbors to hotspot residues and amino acids with a certain distance in sequence to the hotspot. Results: Moreover, the proposed model was compared with other hotspot prediction methods, including APIS, Robetta, FOLDEF, KFC, MINERVA models, etc. Conclusion: The experimental results showed that the proposed model can improve the prediction ability of protein hotspot residues on the same test set.
  
  Add to my favourites
  
  Email this

- Whole Aegilops tauschii Transcriptome Investigation Revealed Nine Novel miRNAs Involved in Stress Response
  
  Authors: Behnam Bakhshi and Ehsan M. Fard
  
  https://doi.org/10.2174/1574893614666191017151708
  More Less
  
  Background: Aegilops tauschii is a wild relative of bread wheat. This species has been reported as the donor of bread wheat D genome. There are also several reports that mentioned the importance of Ae. tauschii in biotic and abiotic stress tolerance. On the other hands, miRNAs have been reported as the essential regulatory elements in stress response. Objective: Therefore, it is important to discover novel miRNAs involved in stress tolerance in this species. The aim of the current study was to predict novel miRNAs in Ae. tauschii and also uncover their potential role in stress response. Methods: For this purpose, ESTs, TSAs, and miRBase databases were obtained and used to predict new miRNAs. Results: Our results discovered nine novel stem-loop miRNAs. These predicted miRNAs could be introduced as the new members of previously identified miRNA families in Ae. tauschii, including miR156, miR168, miR169, and miR319. The result indicating that miR397 and miR530 are novel families in this species. Furthermore, several novel stem-loop miRNAs predicted for T. aestivum showed remarkable similarities to novel Ae. tauschii stem-loops. Conclusion: Our results demonstrated that predicted novel miRNAs could play a significant role in stress response.
  
  Add to my favourites
  
  Email this

- White Blood Cell Image Segmentation Based on Color Component Combination and Contour Fitting
  
  Authors: Chuansheng Wang, Hong Zhang, Zuoyong Li, Xiaogen Zhou, Yong Cheng and Rongyan Chen
  
  https://doi.org/10.2174/1574893614666191017102310
  More Less
  
  Background: White Blood Cell (WBC) image segmentation plays a key role in cell morphology analysis. However, WBC segmentation is still a challenging task due to the diversity of WBCs under different staining conditions. Objective: In this paper, we propose a novel WBC segmentation method based on color component combination and contour fitting to segment WBC images accurately. Methods: Specifically, the proposed method first uses color component combination and image thresholding to achieve nucleus segmentation, then uses a color prior to remove image background, and extracts the initial WBC contour via Canny edge detection, and finally judges and closes the unclosed WBC contour by contour fitting. Accordingly, cytoplasm segmentation is achieved by subtracting the nucleus region from the WBC region. Results: Experimental results on 100 WBC images under rapid staining condition and 50 WBC images under standard staining condition showed that the proposed method improved segmentation accuracy of white blood cells under rapid and standard staining conditions. Conclusion: The proposed color component combination and contour fitting is effective in WBC segmentation task.
  
  Add to my favourites
  
  Email this

- The Human OncoBiome Database: A Database of Cancer Microbiome Datasets
  
  Authors: Nadia and Jayashree Ramana
  
  https://doi.org/10.2174/1574893614666190902152727
  More Less
  
  Background: The microbiome plays a very important role in many physiological processes including metabolism, inflammation, homeostasis and many biological pathways. Therefore, dysbiosis of the microbiome disrupts these pathways in different ways that may result in causing cancer. There is a complex connection between the microbiome and cancer. The human bodies are continuously exposed to microbial cells, both resident and transient, as well as their byproducts, including toxic metabolites. Objective: To develop the manually curated, searchable and metagenomic resource to facilitate the investigation of Human Cancer microbiota and make it publicly accessible through a web interface which will help further in metagenomic studies. Methods: In HOBD, the information on different cancers (Oral Cancer, Breast Cancer, Liver Cancer, and Colorectal Cancer) has been compiled. The main purpose of creating HOBD was to provide the scientific community with comprehensive information on the species that play a crucial role in various Human Cancers. Result: Over time, this resource will grow to become a unique community resource of human cancer bacteria, providing an extra level of annotation for the analysis of metagenomic datasets. Conclusion: The HOBD site offers easy to use tools for viewing all publicly available Human Cancer microbiota. The freely accessible website is available at http://www.juit.ac.in/hcmd/home.
  
  Add to my favourites
  
  Email this

- Screening and Analysis of Hypolipidemic Components from Shuangdan Capsule Based on Pancreatic Lipase
  
  Authors: Y.J. Qi, H.N. Lu, Y.M. Zhao, Z. Wang, Y.J. Ji, N.Z. Jin and Z.R. Ma
  
  https://doi.org/10.2174/1574893615666200106113910
  More Less
  
  Background: Some natural pancreatic lipase inhibitors with fewer side effects are proposed. As a traditional Chinese medicine, Shuangdan Capsule (SDC) has been used for the treatment of higher lipid in blood, which is mainly composed by Radix Salviae and Peony skin. Objective: This work is aimed to investigate the molecular mechanism of the constituents from this SDC against metabolic disorders, the molecular flexibility and intermolecular interactional characteristics of these components in the active sites. Methods: The small molecules were obtained from the Traditional Chinese Medicine Database TCM database, the systems-level pharmacological database for Traditional Chinese Medicine TCMSP server was used to calculate the ADME-related properties. Autodock Vina was used to perform virtual screening of the selected molecules and to return energy values in several ligand conformations. The network parameters were calculated using the network analyzer plug-in in Cytoscape. Results: The most active six molecules are all enclosed by amino acids ASP79, TYR114, GLU175, PRO180, PHE215, GLY216 and LUE264, among which, hydrophobic interaction, hydrogen bond and repulsive forces play extremely important roles. It is worth noting that most of the local minima of molecular electrostatic potentials on van der Waals (vdW) surface are increased while the maxima negative ones are decreased simultaneously, implying that the electrostatic potential tends to be stable. From the topological analysis of the Protein-Protein Interaction (PPI) network, PNLIP related genes are also proved to be pivotal targets for hyperlipidemia, such as LPL, AGK, MGLL, LIPE, LIPF and PNPLA2. Further GO analysis indicated that lipophilic terpenoid compounds may reduce the blood lipid by taking part in the lipid catabolic process, the extracellular space and the cellular components of the extracellular region part and the triacylglycerol lipase activity. Conclusion: This study provides some useful information for the development and application of natural hypolipidemic medcines. Further pharmacologically active studies are still needed both in vivo and in vitro.
  
  Add to my favourites
  
  Email this

- Predicting Thermophilic Proteins by Machine Learning
  
  Authors: Xian-Fang Wang, Peng Gao, Yi-Feng Liu, Hong-Fei Li and Fan Lu
  
  https://doi.org/10.2174/1574893615666200207094357
  More Less
  
  Background: Thermophilic proteins can maintain good activity under high temperature, therefore, it is important to study thermophilic proteins for the thermal stability of proteins. Objective: In order to solve the problem of low precision and low efficiency in predicting thermophilic proteins, a prediction method based on feature fusion and machine learning was proposed in this paper. Methods: For the selected thermophilic data sets, firstly, the thermophilic protein sequence was characterized based on feature fusion by the combination of g-gap dipeptide, entropy density and autocorrelation coefficient. Then, Kernel Principal Component Analysis (KPCA) was used to reduce the dimension of the expressed protein sequence features in order to reduce the training time and improve efficiency. Finally, the classification model was designed by using the classification algorithm. Results: A variety of classification algorithms was used to train and test on the selected thermophilic dataset. By comparison, the accuracy of the Support Vector Machine (SVM) under the jackknife method was over 92%. The combination of other evaluation indicators also proved that the SVM performance was the best. Conclusion: Because of choosing an effectively feature representation method and a robust classifier, the proposed method is suitable for predicting thermophilic proteins and is superior to most reported methods.
  
  Add to my favourites
  
  Email this

- Bioinformatics Analysis of The Rhizosphere Microbiota of Dangshan Su Pear in Different Soil Types
  
  Authors: Xiaojing Ma, Sambhaji B. Thakar, Huimin Zhang, Zequan Yu, Li Meng and Junyang Yue
  
  https://doi.org/10.2174/1574893615666200129104523
  More Less
  
  Background: The rhizosphere microbiota are of vital importance for plant growth and health in terrestrial ecosystems. There have been extensive studies aiming to identify the microbial communities as well as their relationship with host plants in different soil types. Objective: In the present study, we have employed the high-throughput sequencing technology to investigate the composition and structure of rhizosphere microbiota prosperous at the root of Dangshan Su pear growing in sandy soil and clay soil. Methods: A high-throughput amplicon sequencing survey of the bacterial 16S rRNA genes and fungal ITS regions from rhizosphere microbiota was firstly performed. Subsequently, several common bacterial and fungal communities were found to be essential to Dangshan Su pear by using a series of bioinformatics and statistics tools. Finally, the soil-preferred microbiota were identified through variance analysis and further characterized in the genus level. Result: Dangshan Su pears host rich and diverse microbial communities in thin layer of soil adhering to their roots. The composition of dominant microbial phyla is similar across different soil types, but the quantity of each microbial community varies significantly. Specially, the relative abundance of Firmicutes increases from 9.69% to 61.66% as the soil ecosystem changes from clay to sandy, which can be not only conducive to the degradation of complex plant materials, but also responsible for the disinfestation of pathogens. Conclusion: Our results have a symbolic significance for the potential efforts of rhizosphere microbiota on the soil bioavailability and plant health. Through selecting soil types and altering microbial structures, the improvement of fruit quality of Dangshan Su pear is expected to be achieved.
  
  Add to my favourites
  
  Email this

Most Cited Most Cited RSS feed

- A Review of Ensemble Methods in Bioinformatics
  
  Authors: Pengyi Yang, Yee Hwa Yang, Bing B. Zhou and Albert Y. Zomaya
- Bioinformatics Tools for Mass Spectroscopy-Based Metabolomic Data Processing and Analysis
  
  Authors: Masahiro Sugimoto, Masato Kawakami, Martin Robert, Tomoyoshi Soga and Masaru Tomita
- Distance-based Support Vector Machine to Predict DNA N6- methyladenine Modification
  
  Authors: Haoyu Zhang, Quan Zou, Ying Ju, Chenggang Song and Dong Chen
- A Review on the Recent Developments of Sequence-based Protein Feature Extraction Methods
  
  Authors: Jun Zhang and Bin Liu
- Molecular Genetic Markers: Discovery, Applications, Data Storage and Visualisation
  
  Authors: Chris Duran, Nikki Appleby, David Edwards and Jacqueline Batley
- A Brief Survey of Machine Learning Methods in Protein Sub-Golgi Localization
  
  Authors: Wuritu Yang, Xiao-Juan Zhu, Jian Huang, Hui Ding and Hao Lin
- Cancer Diagnosis Through IsomiR Expression with Machine Learning Method
  
  Authors: Zhijun Liao, Dapeng Li, Xinrui Wang, Lisheng Li and Quan Zou
- Relevance of Molecular Docking Studies in Drug Designing
  
  Authors: Ritu Jakhar, Mehak Dangi, Alka Khichi and Anil K. Chhillar
- The Advances and Challenges of Deep Learning Application in Biological Big Data Processing
  
  Authors: Li Peng, Manman Peng, Bo Liao, Guohua Huang, Weibiao Li and Dingfeng Xie
- Gene Expression Profile Classification: A Review
  
  Authors: Musa H. Asyali, Dilek Colak, Omer Demirkaya and Mehmet S. Inan
More Less

Current Bioinformatics - Volume 15, Issue 5, 2020

Volume 15, Issue 5, 2020

Volumes & issues

Most Read This Month

Most Cited Most Cited RSS feed