Combinatorial Chemistry & High Throughput Screening - Volume 12, Issue 5, 2009
Volume 12, Issue 5, 2009
-
-
Editorial [Hot Topic: Machine Learning for Virtual Screening (Part 2) (Guest Editor: Ovidiu Ivanciuc)]
More LessComputer-assisted drug design is used to increase the chances of finding valuable drug candidates, by applying a wide range of computational methods, such as machine learning, structure-activity relationships, quantitative structure-activity relationships, molecular mechanics, quantum mechanics, molecular dynamics, and drug-protein docking. Machine learning is an important field of artificial intelligence, and includes a diversity of methods and algorithms that extract rules and functions from large datasets. The most important algorithms are linear discriminant analysis, artificial neural networks, decision trees, lazy learning, k-nearest neighbors, Bayesian methods, Gaussian processes, support vector machines, and kernel algorithms. This special issue presents a representative selection of machine learning applications for the virtual screening of chemical libraries. Machine learning is a rich and dynamic field, with new methods proposed constantly, which makes difficult to estimate the quality of predictions expected from a particular algorithm. Schwaighofer et al. explore the theoretical and practical aspects of estimating the confidence (error bars) of predictions obtained with quantitative structure-activity relationships based on three prevalent nonlinear regression methods, namely support vector regression, Gaussian processes, and decision trees. This practical aspect of estimating biological activities is currently overlooked in many structure-activity models, but the algorithms presented in this paper demonstrate an efficient approach in computing confidence levels for activity predictions. Naive Bayesian classifiers are robust and efficient algorithms for the rapid virtual screening of large compound libraries. Klon presents a substantial and comprehensive review of Bayesian classifiers that are currently used in drug design and discovery. Bayesian models have consistently been shown to be tolerant of noisy training data, often outperforming more elaborated machine learning algorithms, and may provide reliable predictions even when trained with limited amounts of experimental data. Alternatively, Bayesian classifiers have been used as an effective post-processing technique to integrate sets of predictions obtained with other machine learning methods. Ligand-protein docking is an effective approach in selecting promising inhibitors, but its main drawback is the large computation time necessary to screen large chemical libraries. Plewczynski et al. propose a hybrid method in which a fast machine learning algorithm, random forest, is coupled with ligand-protein docking to obtain a virtual screening procedure that demonstrates in practical applications both speed and reliable predictions. The random forest machine learning is trained with predictions obtained from ligand-protein docking and scoring, and thus the virtual screening procedure may be applied even when trained only with limited number of experimental data.
-
-
-
How Wrong Can We Get? A Review of Machine Learning Approaches and Error Bars
Authors: Anton Schwaighofer, Timon Schroeter, Sebastian Mika and Gilles BlanchardA large number of different machine learning methods can potentially be used for ligand-based virtual screening. In our contribution, we focus on three specific nonlinear methods, namely support vector regression, Gaussian process models, and decision trees. For each of these methods, we provide a short and intuitive introduction. In particular, we will also discuss how confidence estimates (error bars) can be obtained from these methods. We continue with important aspects for model building and evaluation, such as methodologies for model selection, evaluation, performance criteria, and how the quality of error bar estimates can be verified. Besides an introduction to the respective methods, we will also point to available implementations, and discuss important issues for the practical application.
-
-
-
Bayesian Modeling in Virtual High Throughput Screening
More LessNaive Bayesian classifiers are a relatively recent addition to the arsenal of tools available to computational chemists. These classifiers fall into a class of algorithms referred to broadly as machine learning algorithms. Bayesian classifiers may be used in conjunction with classical modeling techniques to assist in the rapid virtual screening of large compound libraries in a systematic manner with a minimum of human intervention. This approach allows computational scientists to concentrate their efforts on their core strengths of model building. Bayesian classifiers have an added advantage of being able to handle a variety of numerical or binary data such as physicochemical properties or molecular fingerprints, making the addition of new parameters to existing models a relatively straightforward process. As a result, during a drug discovery project these classifiers can better evolve with the needs of the projects from general models in the lead finding stages to increasingly precise models in the lead optimization stages that are of particular interest to a specific medicinal chemistry team. Although other machine learning algorithms abound, Bayesian classifiers have been shown to compare favorably under most working conditions and have been shown to be tolerant of noisy experimental data.
-
-
-
Virtual High Throughput Screening Using Combined Random Forest and Flexible Docking
More LessWe present here the random forest supervised machine learning algorithm applied to flexible docking results from five typical virtual high throughput screening (HTS) studies. Our approach is aimed at: i) reducing the number of compounds to be tested experimentally against the given protein target and ii) extending results of flexible docking experiments performed only on a subset of a chemical library in order to select promising inhibitors from the whole dataset. The random forest (RF) method is applied and tested here on compounds from the MDL drug data report (MDDR). The recall values for selected five diverse protein targets are over 90% and the performance reaches 100%. This machine learning method combined with flexible docking is capable to find 60% of the active compounds for most protein targets by docking only 10% of screened ligands. Therefore our in silico approach is able to scan very large databases rapidly in order to predict biological activity of small molecule inhibitors and provides an effective alternative for more computationally demanding methods in virtual HTS.
-
-
-
The Applications of Machine Learning Algorithms in the Modeling of Estrogen-Like Chemicals
Authors: Huanxiang Liu, Xiaojun Yao and Paola GramaticaIncreasing concern is being shown by the scientific community, government regulators, and the public about endocrine-disrupting chemicals that, in the environment, are adversely affecting human and wildlife health through a variety of mechanisms, mainly estrogen receptor-mediated mechanisms of toxicity. Because of the large number of such chemicals in the environment, there is a great need for an effective means of rapidly assessing endocrine-disrupting activity in the toxicology assessment process. When faced with the challenging task of screening large libraries of molecules for biological activity, the benefits of computational predictive models based on quantitative structure-activity relationships to identify possible estrogens become immediately obvious. Recently, in order to improve the accuracy of prediction, some machine learning techniques were introduced to build more effective predictive models. In this review we will focus our attention on some recent advances in the use of these methods in modeling estrogen-like chemicals. The advantages and disadvantages of the machine learning algorithms used in solving this problem, the importance of the validation and performance assessment of the built models as well as their applicability domains will be discussed.
-
-
-
Recent Developments of In Silico Predictions of Intestinal Absorption and Oral Bioavailability
Authors: Tingjun Hou, Youyong Li, Wei Zhang and Junmei WangAmong the absorption, distribution, metabolism, elimination, and toxicity properties (ADMET), unfavorable oral bioavailability is indeed an important reason for stopping further development of the drug candidates. Thus, predictions of oral bioavailability and bioavailability-related properties, especially intestinal absorption are areas in need of progress to aid pharmaceutical drug development. In this article, we review recent developments in the prediction of passive intestinal absorption and oral bioavailability. The advances in the datasets used for model building, the molecular descriptors, the prediction models, and the statistical modeling techniques, are summarized. Furthermore, we compared the performance of one machine learning method, support vector machines (SVM), and one traditional classification method, recursive partitioning (RP), on the predictions of passive absorption. Our comparisons demonstrate that the complex machine learning method could give better predictions than the traditional approach. Finally we discuss the current challenges that remain to be addressed.
-
-
-
Feature Selection and Classification Employing Hybrid Ant Colony Optimization/Random Forest Methodology
Authors: Diwakar Patil, Rahul Raj, Prashant Shingade, Bhaskar Kulkarni and Valadi K. JayaramanAccurate classification of instances depends on identification and removal of redundant features. Classification of data having high dimensionality is usually performed in conjunction with an appropriate feature selection method. Feature selection enables identification of the most informative feature subset from the enormously vast search space that can accurately classify the given data. We propose an ant colony optimization (ACO)/random forest based hybrid filterwrapper search technique, which traverses the search space and selects a feature subset with high classifying ability. We evaluate the performance of our algorithm on four widely studied CoEPrA (Comparative Evaluation of Prediction Algorithms, http://coepra.org) datasets. The performance of the software ants mediated hybrid filter/wrapper approach compares well with the available competition results. Thus, the proposed Ant Colony Optimization based technique can effectively find small feature subsets capable of classifying with a very good accuracy and can be employed for feature subset selection with a high level of confidence.
-
-
-
Controlling Feature Selection in Random Forests of Decision Trees Using a Genetic Algorithm: Classification of Class I MHC Peptides
Authors: Loren Hansen, Ernestine A. Lee, Kevin Hestir, Lewis T. Williams and David FarrellyFeature selection is an important challenge in many classification problems, especially if the number of features greatly exceeds the number of examples available. We have developed a procedure - GenForest - which controls feature selection in random forests of decision trees by using a genetic algorithm. This approach was tested through our entry into the Comparative Evaluation of Prediction Algorithms 2006 (CoEPrA) competition (accessible online at: http://www.coepra.org). CoEPrA was a modeling competition organized to provide an objective testing for various classification and regression algorithms via the process of blind prediction. In the competition GenForest ranked 10/23, 5/16 and 9/16 on CoEPrA classification problems 1, 3 and 4, respectively, which involved the classification of type I MHC nonapeptides i.e. peptides containing nine amino acids. These problems each involved the classification of different sets of nonapeptides. Associated with each amino acid was a set of 643 features for a total of 5787 features per peptide. The method, its application to the CoEPrA datasets, and its performance in the competition are described.
-
-
-
Profiling Human Saliva Endogenous Peptidome via a High Throughput MALDI-TOF-TOF Mass Spectrometry
Authors: Chun-Ming Huang and Wenhong ZhuEstablishment of a saliva protein/peptide signature will provide important information for clinical diagnostics and prognosis of human disease. We digested human whole saliva with trypsin to create a tryptic digest salivary peptidome. Proteins/peptides were subsequently identified by high throughput tandem mass spectrometry in conjunction with database searching. Sixty-three saliva peptides corresponding to twenty-two saliva proteins were identified. Thirty of sixty-three saliva peptides with non-specific tryptic cleavage sites were derived from proline-rich proteins, mucin 7, statherin and collagen. Several peptides derived from proline-rich proteins exhibit proline (Pro) - glutamine (Gln) C-termini (- PQ C-termini). Seven peptides with -PQ C-termini were identified in undigested whole saliva, suggesting that peptides with -PQ C-termini indigenously exist in human saliva. Peptides with -PQ C-termini are known to bind oral bacteria and exhibit properties characteristic of innate-immunity peptides. Thus, a saliva peptidome containing peptides with -PQ Ctermini, as presented here, may reinforce the development of innate-immunity-related disease monitoring using noninvasive saliva samples and mass spectrometry-based techniques.
-
-
-
High Throughput Heme Assay by Detection of Chemiluminescence of Reconstituted Horseradish Peroxidase
Authors: Shigekazu Takahashi and Tatsuru MasudaIn living organisms, heme is an essential molecule for various biological functions. Recent studies also suggest that heme functions as organelle-derived signal that regulates fundamental cell processes. Furthermore, estimation of heme is widely used for studying various blood disorders. In this regard, development of a rapid, sensitive, and high throughput heme assay has been sought. The most frequently used method of measuring heme by pyridine hemochrome is time, labor, and material intensive, and therefore limiting in its utility for large scale, high throughput analysis. Recently, we reported alternative method that is sensitive and specific to heme, which is based on the ability of horseradish peroxidase (HRP) apo-enzyme to reconstitute with heme to form an active holo-enzyme. Here, we developed high throughput heme assay by performing reactions on multi-well plate with highly sensitive chemiluminescence detection reagents. Detection of chemiluminescence in charged coupled device (CCD)-based gel doc apparatus enables simultaneous measurement of multiple samples. Furthermore, the high sensitivity of this assay allowed a direct measurement of heme in solvent extracts after dilution. This assay is sensitive, quick, provides a large dynamic range, and is well suited for large-scale analysis of heme extracted from minute amount of samples.
-
-
-
Multicomponent One-Pot Reactions: Synthesis of Some New 6-Oxopyrano [2,3-c]Isochromenes by Condensation of Homophthalic Anhydride, Dialkyl acetylenedicarboxylate, and Isocyanides
Authors: Ali A. Mohammadi, Roya Akbarzadeh and Hamed RouhiA novel three-component, one-pot condensation of the zwitterion generated from dialkyl acetylenedicarboxylate and isocyanides with homophthalic anhydride is described. The reaction affords new 6-oxopyrano[2,3- c]isochromenes in good yield. Isochromenes have been reported to possess diverse biological activities such as antibacterial, antifungal, antiinflammatory, and antiangiogenic effects. Moreover, Theses important compounds are found in various natural products.
-
Volumes & issues
-
Volume 28 (2025)
-
Volume 27 (2024)
-
Volume 26 (2023)
-
Volume 25 (2022)
-
Volume 24 (2021)
-
Volume 23 (2020)
-
Volume 22 (2019)
-
Volume 21 (2018)
-
Volume 20 (2017)
-
Volume 19 (2016)
-
Volume 18 (2015)
-
Volume 17 (2014)
-
Volume 16 (2013)
-
Volume 15 (2012)
-
Volume 14 (2011)
-
Volume 13 (2010)
-
Volume 12 (2009)
-
Volume 11 (2008)
-
Volume 10 (2007)
-
Volume 9 (2006)
-
Volume 8 (2005)
-
Volume 7 (2004)
-
Volume 6 (2003)
-
Volume 5 (2002)
-
Volume 4 (2001)
-
Volume 3 (2000)
Most Read This Month

Most Cited Most Cited RSS feed
-
-
Label-Free Detection of Biomolecular Interactions Using BioLayer Interferometry for Kinetic Characterization
Authors: Joy Concepcion, Krista Witte, Charles Wartchow, Sae Choo, Danfeng Yao, Henrik Persson, Jing Wei, Pu Li, Bettina Heidecker, Weilei Ma, Ram Varma, Lian-She Zhao, Donald Perillat, Greg Carricato, Michael Recknor, Kevin Du, Huddee Ho, Tim Ellis, Juan Gamez, Michael Howes, Janette Phi-Wilson, Scott Lockard, Robert Zuk and Hong Tan
-
-
- More Less