Current Protein and Peptide Science - Volume 8, Issue 2, 2007
Volume 8, Issue 2, 2007
-
-
Editorial [Hot Topic: Workshop on the Definition of Protein Domains and their Likelihood of Crystallization (Guest Editors: Oliviero Carugo, Kristina Djinovic, Sasha Gorbalenya and Paul Tucker)]
More LessThis issue of Current Protein and Peptide Science is devoted to the emerging field of likelihood of protein crystallization and is related to the seminars and lectures presented recently at the Workshop on the definition of protein domains and their likelihood of crystallization, held in Vienna at the end of June 2006 (http://www.emblhamburg. de/workshops/2006/domains/), where a number of scientists addressed these questions by presenting and debating both experimental and computational approaches. Likelihood of crystallization must be predicted computationally and/or determined experimentally in order to avoid time expensive experiments on samples, the three-dimensional structure of which cannot be determined experimentally, because of a series of possible obstacles. For example, if a protein is natively disordered, in the sense that it is not characterized by a unique, well defined conformation, its three-dimensional structure cannot be determined experimentally, since it does not exist. Moreover, a sequence construct that does not correspond to a protein domain might be difficult to express because of its misfolding or its reduced solubility. This is particularly important in the structural genomics era, in which high throughput approaches are applied to the determination of three-dimensional structures of proteins, the biochemical, biophysical, and biological features of which were not previously studied. However, the preliminary analysis and estimation of the likelihood of crystallization is not relegated to proteomics studies only, but it is important also for traditional hypothesis driven projects, in which the optimization of the protein sample is equally important, allowing one to generate samples suitable for structural studies and/or improve diffraction quality of crystals and obtain, as a consequence, more reliable final results. The first review, written by Dmitrij Frishman and co-workers (Technische Universitat Munchen, Germany), deals with the general problem of predicting, with computational and bioinformatics methods, experimental success in cloning, expression, soluble expression, purification and crystallization of proteins. On the basis of publicly available resources, sophisticated machine learning algorithms allow one to make reasonable predictions. For example, solubility predictions are reaching the accuracy of over 70%. The successive four reviews are devoted to prediction, determination, and analysis of conformational disorder. Sonia Longhi and co-workers (CNRS and Universites Aix-Marseille I et II, France) presents an overview of several methods currently employed for predicting protein conformational disorder and present some practical examples of how they can be combined in order to achieve more reliable predictions. Anne Poupon and co-workers (Universite Paris-Sud, France) report the high throughput application of disorder predictions in a structural genomics project on soluble yeast proteins and focus their attention on strategies for tailoring proteins into crystallizable domains. Predictions of conformational disorder are analyzed also by Zsuzsanna Dosztanyi and co-workers (Hungarian Academy of Sciences, Hungary), though from a different perspective. The primary focus of this review is the systematic interpretation of the scores of different predictors. Experimental approaches for the detection of protein disorder are reviewed by Peter Tompa and co-workers (Hungarian Academy of Sciences, Hungary), with special emphasis on proteomic-scale methods, like heat- or acid treatments with a subsequent two-dimensional electrophoresis/mass spectrometry characterization. Furthermore, the problem of defining domain boundaries on the basis of the amino acidic sequences is analyzed in the next two reviews. David Jones and co-workers (University College London, United Kingdom) compare completely automatic and computer-assisted methods and discuss the problem of benchmarking different predictors. Furthermore, the DomPred server, which includes predictors based on sequence comparisons and on secondary structure predictors, is critically analyzed in order to allow its optimal use.......
-
-
-
Predicting Experimental Properties of Proteins from Sequence by Machine Learning Techniques
Authors: Pawel Smialowski, Antonio J. Martin-Galiano, Jurgen Cox and Dmitrij FrishmanEfficient target selection methods are an important prerequisite for increasing the success rate and reducing the cost of high-throughput structural genomics efforts. There is a high demand for sequence-based methods capable of predicting experimentally tractable proteins and filtering out potentially difficult targets at different stages of the structural genomic pipeline. Simple empirical rules based on anecdotal evidence are being increasingly superseded by rigorous machine- learning algorithms. Although the simplicity of less advanced methods makes them more human understandable, more sophisticated formalized algorithms possess superior classification power. The quickly growing corpus of experimental success and failure data gathered by structural genomics consortia creates a unique opportunity for retrospective data mining using machine learning techniques and results in increased quality of classifiers. For example, the current solubility prediction methods are reaching the accuracy of over 70%. Furthermore, automated feature selection leads to better insight into the nature of the correlation between amino acid sequence and experimental outcome. In this review we summarize methods for predicting experimental success in cloning, expression, soluble expression, purification and crystallization of proteins with a special focus on publicly available resources. We also describe experimental data repositories and machine learning techniques used for classification and feature selection.
-
-
-
Predicting Protein Disorder and Induced Folding: From Theoretical Principles to Practical Applications
Authors: Jean M. Bourhis, Bruno Canard and Sonia LonghiIn the last years there has been an increasing amount of experimental evidence pointing out that a large number of proteins are either fully or partially disordered (unstructured). Intrinsically disordered proteins are ubiquitary proteins that fulfil essential biological functions while lacking highly populated and uniform secondary and tertiary structure under physiological conditions. Despite the large abundance of disorder, disordered regions are still poorly detected. Recognition of disordered regions in a protein is instrumental for reducing spurious sequence similarity between disordered regions and ordered ones, and for delineating boundaries of protein domains amenable to crystallization. As presently none of the available automated methods for prediction of protein disorder can be taken as fully reliable on its own, we present a brief overview of the methods currently employed highlighting their philosophy. We show a few practical examples of how they can be combined to avoid pitfalls and to achieve more reliable predictions. We also describe the currently available methods for the identification of regions involved in induced folding and provide a few practical examples in which the accuracy of predictions was experimentally confirmed.
-
-
-
Production and Crystallization of Protein Domains: How Useful are Disorder Predictions ?
Authors: S. Quevillon-Cheruel, Nicolas Leulliot, Lucie Gentils, Herman van Tilbeurgh and Anne PouponThe failure to produce and/or crystallize proteins is often due to their modular structure. There exists therefore considerable interest to develop strategies for tailoring proteins into crystallizable domains. In the framework of a Structural Genomics Project on soluble yeast proteins, we have tested the expression of numerous genetic constructs of our targets in order to produce and crystallize proteins and protein domains and solve their three-dimensional structure. In some cases, the choice of the domain boundaries was guided by prediction from sequence using various software packages, including Prelink, a home-made prediction method for detecting unfolded regions. In other cases, large numbers of constructs were generated using molecular biology or biochemical methods. In this paper, we analyze the results of the over-expression in E. coli and crystallization of these constructs, and compare these with the predictions that can be obtained from our software and from others.
-
-
-
Prediction of Protein Disorder at the Domain Level
Authors: Zsuzsanna Dosztanyi, Mark Sandor, Peter Tompa and Istvan SimonIntrinsically disordered/unstructured proteins exist in a highly flexible conformational state largely devoid of secondary structural elements and tertiary contacts. Despite their lack of a well defined structure, these proteins often fulfill essential regulatory functions. The intrinsic lack of structure confers functional advantages on these proteins, allowing them to adopt multiple conformations and to bind to different binding partners. The structural flexibility of disordered regions hampers efforts solving structures at high resolution by X-ray crystallography and/or NMR. Removing such proteins/ regions from high-throughput structural genomics pipelines would be of significant benefit in terms of cost and success rate. In this paper we outline the theoretical background of structural disorder, and review bioinformatic predictors that can be used to delineate regions most likely to be amenable for structure determination. The primary focus of our review is the interpretation of prediction results in a way that enables segmentation of proteins to separate ordered domains from disordered regions.
-
-
-
Towards Proteomic Approaches for the Identification of Structural Disorder
Authors: Veronika Csizmok, Zsuzsanna Dosztanyi, Istvan Simon and Peter TompaIntrinsically unstructured/disordered proteins (IUPs) and protein domains lack a well-defined three-dimensional structure under physiological conditions. Structural disorder imparts advantages in many non-conventional functions, which poses a significant challenge to our understanding of the structure-function relationship of proteins. The general appreciation of this fact, however, is hampered by the large gap in our knowledge on IUPs, as we have biophysical data on less than 500 of them, whereas bioinformatic predictions suggest at least several thousand such proteins in the human proteome alone. Thus, proteomic-scale identification and characterization of IUPs will need to be implemented to fill this gap and advance our knowledge in this important field. In this review we give an insight into the various rationales of proteomic efforts of identifying IUPs, and survey the handful of attempts that combined enrichment of extracts for IUPs by heat- or acid treatment with a subsequent two-dimensional electrophoresis/mass spectrometry identification. Advantages and drawbacks of the various approaches are outlined in anticipation of future inventions in the field that will hopefully elevate IUP research to the truly proteomic level.
-
-
-
Computer-Assisted Protein Domain Boundary Prediction Using the Dom-Pred Server
Authors: Kevin Bryson, Domenico Cozzetto and David T. JonesDomain prediction from sequence is a particularly challenging task, and currently, a large variety of different methodologies are employed to tackle the task. Here we try to classify these diverse approaches into a number of broad categories. Completely automatic domain prediction from sequence alone is currently fraught with problems, but this should not be so surprising since human experts currently have significant disagreement on domain assignment even when given the structures. It can be argued that we should only test the domain prediction methods on benchmark data that human experts agree upon and this is the approach we take in this paper. Even for the data sets on which human experts agree, automatic structure-based domain assignment still cannot always agree, and so again it is still unlikely that domain prediction methods will reliably obtain correct results completely automatically. We make the argument that computerassisted domain prediction is a more achievable goal. With this aim in mind, we present the DomPred server. This server provides the user with the results from two completely different categories of method (DPS and DomSSEA). In this paper, each method is individually benchmarked against one of the latest domain prediction benchmarks to provide information about their respective reliabilities. A variety of different benchmark scores are employed since the accuracy of a domain prediction method depends critically on what types of results one wishes to obtain (single/multi-domain classification, domain number, residue linker positions, etc.). Also both of these methods, implemented within the DomPred server, can suggest alternative domain predictions, allowing the user to make the final decision based on these results and applying their own background knowledge to the problem. The DomPred server is available from the URL: http://bioinf.cs.ucl.ac.uk/software.html.
-
-
-
Prediction of Number and Position of Domain Boundaries in Multi-Domain Proteins by Use of Amino Acid Sequence Alone
Authors: Nikita V. Dovidchenko, Michail Yu Lobanov and Oxana V. GalzitskayaPrediction of protein domain boundaries is an important step for the prediction of three-dimensional structure. The simple method PDP has been elaborated for prediction of the number and position of domain boundaries in multidomain proteins by use of amino acid sequence alone. The method uses an optimized scale based on the statistics of appearance of amino acid residues at domain boundaries. Our method demonstrates promising results in comparison to other methods that do not use homologous sequences. From the database of proteins that are targets from CASP6 (Critical Assessment of Techniques for Protein Structure Prediction) our program correctly assigned the number of domains for ∼80% of one domain proteins and ∼50% for two-domain proteins. Our method offers three main advantages: it is very simple, it is fast, and it uses a minimal number of parameters in comparison with other methods.
-
-
-
Posttranslational Modifications and Subcellular Localization Signals: Indicators of Sequence Regions without Inherent 3D Structure?
Authors: Birgit Eisenhaber and Frank EisenhaberGiven the huge number of sequences of otherwise uncharacterized protein sequences, computer-aided prediction of posttranslational modifications (PTMs) and translocation signals from amino acid sequence becomes a necessity. We have contributed to this multi-faceted, worldwide effort with the development of predictors for GPI lipid anchor sites, for N-terminal N-myristoylation sites, for farnesyl and geranylgeranyl anchor attachment as well as for the PTS1 peroxisomal signal. Although the substrate protein sequence signals for various PTMs or translocation systems vary dramatically, we found that their principal architecture is similar for all the cases studied. Typically, a small stretch of the amino acid residues is buried in the catalytic cleft of the protein-modifying enzyme (or the binding site of the transporter). This piece most intensely interacts with the enzyme and its sequence variability is most restricted. This stretch is surrounded by linker segments that connect the part bound by the enzyme with the rest of the substrate protein. These residues are, as a trend, small with a flexible backbone and polar. Due to the mechanistic requirements of binding to the enzyme, we suggest that most PTM sites are necessarily embedded into intrinsically disordered regions (except for cases of autocatalytic PTMs, PTMs executed in the unfolded state or non-enzymatic PTMs) and this issue requires consideration in structural studies of proteins with complex architecture. Surprisingly, some proteins carry sequence signals for posttranslational modification or translocation that remain hidden in the normal biological context but can become fully functional in certain conditions.
-
-
-
Pipelines, Robots, Crystals and Biology: What Use High Throughput Solving Structures of Challenging Targets?
More LessWith recent advances in the technology and software underlying crystallographic structure solution, demands on both output and functional significance of X-ray structures are soaring. To achieve the required speed and quality also with ever larger and more difficult targets, combining HTP screening methods (robotics based or not) adopted from structural genomics initiatives with thorough expertise and dedicated characterization effort for each individual target is almost a must. I present concepts, practical considerations, and experiences on implementing an HTP technology platform for structural and functional studies on complexes, membrane proteins and other challenging targets. Emphasis lies on the environment of small academic groups engaged exclusively in hypothesis driven projects focused on specific biological systems. Suitability of given HTP protocols for particular target classes, benchmarking and quality control for procedures, and project management issues at the interface between extensive, broad parameter screening and intensive individual target work required by non-SG amenable targets are discussed.
-
Volumes & issues
-
Volume 26 (2025)
-
Volume (2025)
-
Volume 25 (2024)
-
Volume 24 (2023)
-
Volume 23 (2022)
-
Volume 22 (2021)
-
Volume 21 (2020)
-
Volume 20 (2019)
-
Volume 19 (2018)
-
Volume 18 (2017)
-
Volume 17 (2016)
-
Volume 16 (2015)
-
Volume 15 (2014)
-
Volume 14 (2013)
-
Volume 13 (2012)
-
Volume 12 (2011)
-
Volume 11 (2010)
-
Volume 10 (2009)
-
Volume 9 (2008)
-
Volume 8 (2007)
-
Volume 7 (2006)
-
Volume 6 (2005)
-
Volume 5 (2004)
-
Volume 4 (2003)
-
Volume 3 (2002)
-
Volume 2 (2001)
-
Volume 1 (2000)
Most Read This Month
