Current Proteomics - Volume 15, Issue 5, 2018
Volume 15, Issue 5, 2018
-
-
A Parallel and Distributed Computing System for Protein-Protein Interaction Literature Mining
More LessAuthors: Hsi-Chieh Lee and Szu-Wei HuangA parallel and distributed computing mining system is proposed for finding protein-protein interaction literatures from the databases on the Internet. In the proposed system, we try to find out discriminating words for protein-protein interaction by way of text mining from the literatures. A threshold called matching-degree is also evaluated to check if a given literature might related to protein- protein interactions. Furthermore, a keypage-based search mechanism is adopted to find related papers for protein-protein interactions from a given document. The system is designed with a webbased graphical user interface and a parallel and distributed job-dispatching kernel. Experiments are conducted the experimental results indicate that by using the proposed system, it is helpful for researchers to find out protein-protein literatures from the overwhelming piece of information. Moreover, the utilization of parallel and distributed architecture makes this system scalable and the speedup and efficiency of the system are promising. With two servers, the speedup is 1.95 and with three servers the speedup is 3.97 which derive the efficiency to be 0.975 and 0.9925, respectively.
-
-
-
Feature Selection Using Information Distance Measure for Gene Expression Data
More LessAuthors: Jie Cai, Cheng Liang and Jiawei LuoBackground: The accurate classification of microarray data has been a great challenge in machine learning due to its high dimensionality and small number of samples. Feature selection is an effective way to deal with such data. Objective: Feature subset that maximizes feature-feature diversity as well as feature-class relevance is selected to improve the predictive efficiency and reduce the cost of feature acquisition. Moreover, the selection of features with high entropy but low classification performance is restricted. Method: We first present a feature selection criterion based on information distance measure by introducing the self-redundancy factor into the maximum relevance and maximum redundancy criterion, where the self-redundancy factor is taken as the penalty for feature with high entropy; then, an incremental search based feature selection method using this criterion called MFFID is proposed to maximize the information distance between features. Results: Compared with four representative feature selection methods on twelve high-dimensional microarray datasets, the proposed method MFFID achieves better performance than the other methods in terms of the classification accuracy. Conclusion: In this study, a novel feature selection method MFFID is proposed, which is expressed in the form of information distance measure by introducing the self-redundancy factor into CMRMR. The experimental results clearly demonstrate that MFFID is an effective and stable feature selection method for the tumor datasets classification.
-
-
-
Identification of DNA-Binding Proteins via a Voting Strategy
More LessBackground: DNA-binding proteins are vital cellular components, and their identification is important for the understanding of biological processes. Traditional methods for the prediction of protein function are both time-consuming and expensive. With the development of bioinformatics, a large amount of protein sequence information is available to researchers, necessitating the development of an efficient predictor for identification of DNA-binding proteins based on the protein-sequence information. Objective: To better utilize the protein sequence information and further improve the accuracy of DNA-binding protein recognition, we designed a new predictor for identifying DNA-binding protein based on a voting strategy. Method: Here, we employed two feature extractions for DNA-binding protein identification, including Physicochemical Distance Transformation (PDT), and PDT-profile. Then two predictors (iDNA-Prot- PDT and iDNA-Prot-PDT-Profile) were established on the basis of these two feature extraction methods. To further improve the quality of prediction, a voting strategy (iDNA-Prot-Vote) was adopted. Results: The experimental results on benchmark dataset and independent dataset showed that our methods outperformed other state-of-the-art methods. Conclusion: These results indicate that the proposed methods are useful for DNA-binding protein identification, which would promote the development of protein sequence analysis.
-
-
-
Discrimination of Thermophilic and Mesophilic Proteins Using Support Vector Machine and Decision Tree
More LessAuthors: Haixin Ai, Li Zhang, Jikuan Zhang, Tong Cui, Alan K. Chang and Hongsheng LiuBackground: The need to enhance the stability of proteins is vital to protein engineering and design. The manipulation of protein stability is also important to understand the principles that govern protein thermostability, both in basic research and industrial application. Objective: To build models that can discriminate thermophilic and mesophilic proteins and comprehend the factors influencing protein thermostability using machine learning methods. Method: A total of 613 protein features were calculated and various feature selection algorithms were used to build subset features. Support vector machine and decision tree methods were applied to predict the thermostability of the proteins, and the problems caused by unbalanced data were resolved by using a grid search method to find the best weights of error costs for different classes. Results: According to the result, the influence of primary structure on the thermostability of a protein was more important than the influence of secondary structure. The best classification model was obtained when the support vector machine was run on the subset of amino acid composition plus amino acid class composition, which yielded a prediction accuracy of 84.07%. At the primary structure level, Gln, Glu, and Ser were the features that contributed most to protein thermostability. At the secondary structure level, Q_coil and Helix_E were the most important features affected protein thermostability. Conclusion: These results suggested that the thermostability of a protein was mainly associated with the primary structural features of the protein.
-
-
-
Kernelized Convex Hull based Collaborative Representation for Tumor Classification
More LessAuthors: Xia Chen, Haowen Chen, Dan Cao and Bo LiBackground: Reliable and precise classification methods for tumor types have started to see wide deployment, in particular in the area of cancer diagnosis and personalized cancer drug design. The traditional Sparse Representation-based Classification (SRC) method can achieve high accuracy for tumor classification but also suffer from inefficiency when handling noisy datasets. To resist such disadvantage, some researchers proposed collaborative Representation–based Classification (CRC) method, which is more efficient and less complex. Method: In this paper, we design a novel Kernelized Convex Hull Collaborative Representation and Classification (KCHCRC) approach to further improve it. Though modeling the testing sample as a special convex hull with a single element, the convex hull can collaboratively be represented over the whole training samples. When the represented coefficients are fixed, we can calculate the distance between the testing sample and training samples with identical type for each category. To demonstrate the performance of our approach, we compare with the prior state-of-the-art tumor classification methods on various 11 tumor gene expression datasets. Result: The experimental results show that our approach is efficacy and efficiency.
-
-
-
Construction Model of Cardiac Purkinje System
More LessAuthors: Li Jie, Lu Weigang, Jing Jun and Wang HuaibaoBackground: The Cardiac Purkinje System (CPS), mainly locating on the endo-cardial surface, is very important to physio-pathology of the ventricle, diseases on which are the most in the whole heart. The location and the width of branches are essential to the paths of the electrical signal activation on CPS. However, current ventricle models are presented inclined to ignore its real geometry or use some simple geometry to approximate its real geometry. Objective: In view of such problems, a methodology in this paper is presented for semi-automatic construction of a computational geometry model of cardiac purkinje system, based on canine anatomic data. Method: The methodology includes linear extraction by fast marching method, mapping and inverse mapping from 3D surface model of the endocardial layer to 2D image. And the methodology is implemented in a semi-automated way, with the virtue of faster and more accurate. Results: Compared to existing construction methods, the result obtained by anatomical data is real and effective, more accurate and less time occupation. Conclusion: This paper presents a construction model of CPS, using anatomic specimen data, so it is very close to the reality of the CPS.
-
-
-
Molecular Tags for Proteins and Their Biological Applications
More LessAuthors: Ajoy Basak and Sarmistha BasakMolecular tags are becoming increasingly popular and useful for the study of biological macromolecules like proteins. Thus tags have been used to investigate structural and functional properties of proteins including their cellular/tissue distributions, localizations, imaging, interacting partners, trafficking routes, isolations, purifications and characterizations. Today, in-depth biochemical and functional research of proteins are possible due to the availability of specific molecular tags and their antibodies or ligands. As a result, chemical modifications of proteins with appropriate tags became an integral part of protein research. In most cases, these tags are attached via carboxyl (C) or amino (N-) terminal end of the protein. Alternatively, reactive side chain functional groups like SH, OH, NH2 or COOH groups of specific amino acid residues within the protein chain, can also be employed as connecting points for labeling agents. Fluorescent, radioactive or photo-labile tags in particular have been extensively used to study protein biosynthesis, localization, pathway analysis as well as its transient and ultimate final residence in real time manner in both cellular and animal systems. Attention has now been devoted to the development of tags that are selective to specific class of proteins such as phospho-, lipo- or glycoproteins in physiological systems. Moreover, proteins of all categories and residence types such as secreted soluble, membrane bound, cell surface, nuclear and mitochondrial proteins have been labeled with tags for various study purposes. This review provides a summarized report on the latest development in this field while presenting an up to date information on tags that are widely in use for today's protein research.
-
Volumes & issues
-
Volume 21 (2024)
-
Volume 20 (2023)
-
Volume 19 (2022)
-
Volume 18 (2021)
-
Volume 17 (2020)
-
Volume 16 (2019)
-
Volume 15 (2018)
-
Volume 14 (2017)
-
Volume 13 (2016)
-
Volume 12 (2015)
-
Volume 11 (2014)
-
Volume 10 (2013)
-
Volume 9 (2012)
-
Volume 8 (2011)
-
Volume 7 (2010)
-
Volume 6 (2009)
-
Volume 5 (2008)
-
Volume 4 (2007)
-
Volume 3 (2006)
-
Volume 2 (2005)
-
Volume 1 (2004)
Most Read This Month