Volume 12, Issue 4

Combinatorial Chemistry & High Throughput Screening - Volume 12, Issue 4, 2009

Volume 12, Issue 4, 2009

- Editorial [Hot Topic: Machine Learning for Virtual Screening (Part 1) (Guest Editor: Ovidiu Ivanciuc)]
  
  By Ovidiu Ivanciuc
  
  https://doi.org/10.2174/138620709788167999
  More Less
  
  Computer-assisted drug design is used to increase the chances of finding valuable drug candidates, by applying a wide range of computational methods, such as machine learning, structure-activity relationships, quantitative structure-activity relationships, molecular mechanics, quantum mechanics, molecular dynamics, and drug-protein docking. Machine learning is an important field of artificial intelligence, and includes a diversity of methods and algorithms that extract rules and functions from large datasets. The most important algorithms are linear discriminant analysis, artificial neural networks, decision trees, lazy learning, k-nearest neighbors, Bayesian methods, Gaussian processes, support vector machines, and kernel algorithms. This special issue presents a representative selection of machine learning applications for the virtual screening of chemical libraries. In the opening paper, Melville, Burke and Hirst review recent applications of machine learning techniques in ranking chemical libraries based on their biological activity against a particular protein target. Applications of ligand-based similarity searching and structure-based docking are critically evaluated, with an accent on the major algorithms, such as decision trees, naïve Bayesian classifiers, artificial neural networks, and support vector machines. Chen et al. examine the technical aspects of ligand-based virtual screening, such as available software, molecular descriptors, and performance measures. The procedures reviewed include binary kernel discrimination, k-nearest neighbors, linear discriminant analysis, logistic regression, and probabilistic neural networks. The detailed comparison of various studies is especially valuable in providing an estimate of the level of success that may be expected in virtual screening. The comparison of various machine learning techniques is further explored by Plewczynski, Spieser and Koch in a large-scale evaluation of the screening success. Based on the biological targets explored in the literature, it was found that there is no machine learning approach that consistently provides the best results. Thorough careful tuning of parameters, most chemical libraries may be modeled with existing algorithms. The study found that a promising class of methods is represented by fusion (or ensemble) classifiers, which combine predictions from several models and are thus able to outperform single classifiers.
  
  Add to my favourites
  
  Email this

- Machine Learning in Virtual Screening
  
  Authors: James L. Melville, Edmund K. Burke and Jonathan D. Hirst
  
  https://doi.org/10.2174/138620709788167980
  More Less
  
  In this review, we highlight recent applications of machine learning to virtual screening, focusing on the use of supervised techniques to train statistical learning algorithms to prioritize databases of molecules as active against a particular protein target. Both ligand-based similarity searching and structure-based docking have benefited from machine learning algorithms, including naïve Bayesian classifiers, support vector machines, neural networks, and decision trees, as well as more traditional regression techniques. Effective application of these methodologies requires an appreciation of data preparation, validation, optimization, and search methodologies, and we also survey developments in these areas.
  
  Add to my favourites
  
  Email this

- Comparative Analysis of Machine Learning Methods in Ligand-Based Virtual Screening of Large Compound Libraries
  
  Authors: Xiao H. Ma, Jia Jia, Feng Zhu, Ying Xue, Ze R. Li and Yu Z. Chen
  
  https://doi.org/10.2174/138620709788167944
  More Less
  
  Machine learning methods have been explored as ligand-based virtual screening tools for facilitating drug lead discovery. These methods predict compounds of specific pharmacodynamic, pharmacokinetic or toxicological properties based on their structure-derived structural and physicochemical properties. Increasing attention has been directed at these methods because of their capability in predicting compounds of diverse structures and complex structure-activity relationships without requiring the knowledge of target 3D structure. This article reviews current progresses in using machine learning methods for virtual screening of pharmacodynamically active compounds from large compound libraries, and analyzes and compares the reported performances of machine learning tools with those of structure-based and other ligand-based (such as pharmacophore and clustering) virtual screening methods. The feasibility to improve the performance of machine learning methods in screening large libraries is discussed.
  
  Add to my favourites
  
  Email this

- Performance of Machine Learning Methods for Ligand-Based Virtual Screening
  
  Authors: Dariusz Plewczynski, Stephane A.H. Spieser and Uwe Koch
  
  https://doi.org/10.2174/138620709788167962
  More Less
  
  Computational screening of compound databases has become increasingly popular in pharmaceutical research. This review focuses on the evaluation of ligand-based virtual screening using active compounds as templates in the context of drug discovery. Ligand-based screening techniques are based on comparative molecular similarity analysis of compounds with known and unknown activity. We provide an overview of publications that have evaluated different machine learning methods, such as support vector machines, decision trees, ensemble methods such as boosting, bagging and random forests, clustering methods, neuronal networks, naïve Bayesian, data fusion methods and others.
  
  Add to my favourites
  
  Email this

- Virtual Screening for Cytochromes P450: Successes of Machine Learning Filters
  
  Authors: Julien Burton, Ismail Ijjaali, Francois Petitet, Andre Michel and Daniel P. Vercauteren
  
  https://doi.org/10.2174/138620709788167935
  More Less
  
  Cytochromes P450 (CYPs) are crucial targets when predicting the ADME properties (absorption, distribution, metabolism, and excretion) of drugs in development. Particularly, CYPs mediated drug-drug interactions are responsible for major failures in the drug design process. Accurate and robust screening filters are thus needed to predict interactions of potent compounds with CYPs as early as possible in the process. In recent years, more and more 3D structures of various CYP isoforms have been solved, opening the gate of accurate structure-based studies of interactions. Nevertheless, the ligand-based approach still remains popular. This success can be explained by the growing number of available data and the satisfying performances of existing machine learning (ML) methods. The aim of this contribution is to give an overview of the recent achievements in ML applications to CYP datasets. Particularly, popular methods such as support vector machine, decision trees, artificial neural networks, k-nearest neighbors, and partial least squares will be compared as well as the quality of the datasets and the descriptors used. Consensus of different methods will also be discussed. Often reaching 90% of accuracy, the models will be analyzed to highlight the key descriptors permitting the good prediction of CYPs binding.
  
  Add to my favourites
  
  Email this

- Scaffold-Hopping Potential of Fragment-Based De Novo Design: The Chances and Limits of Variation
  
  Authors: Bjoern A. Krueger, Axel Dietrich, Karl-Heinz Baringhaus and Gisbert Schneider
  
  https://doi.org/10.2174/138620709788167971
  More Less
  
  The identification of new lead structures is a pivotal task in early drug discovery. Molecular de novo design of ligand structures has been successfully applied in various drug discovery projects. Still, the question of the scaffold hopping potential of drug design by adaptive evolutionary optimization has been left unanswered. It was unclear whether de novo design is actually able to leap away from given chemotypes (“activity islands”), allowing for rescaffolding of compounds. We have addressed these questions by scrutinizing different scoring functions of our de novo design software Flux for their ability to enable scaffold-hops for various target classes. We evaluated both the potential bioactivity and the scaffold diversity of de novo generated structures. For several target classes, known lead structures were reconstructed by the de novo algorithm (“lead-hopping”). We demonstrate that for one or multiple templates of a given chemotype, other chemotypes are reached during de novo compound generation, thus indicating successful scaffold-hops.
  
  Add to my favourites
  
  Email this

- Structure-Based Drug Screening and Ligand-Based Drug Screening with Machine Learning
  
  By Yoshifumi Fukunishi
  
  https://doi.org/10.2174/138620709788167890
  More Less
  
  The initial stage of drug development is the hit (active) compound search from a pool of millions of compounds; for this process, in silico (virtual) screening has been successfully applied. One of the problems of in silico screening, however, is the low hit ratio in relation to the high computational cost and the long CPU time. This problem becomes serious in structure-based in silico screening. The major reason is the low accuracy of the estimation of proteincompound binding free energy. The problem of ligand-based in silico screening is that the conventional quantitative structure- activity relationship (QSAR) approach is not effective at predicting new hit compounds with new scaffolds. Recently, machine-learning approaches have been applied to in silico drug screening to overcome the above problems. We review here machine-learning approaches for both structure-based and ligand-based drug screening. Machine learning is used to improve database enrichment in two ways, namely by improving the docking score calculated by the protein-compound docking program and by calculating the optimal distance between the feature vectors of active and inactive compounds. Both approaches require compounds that are known to be active with respect to the target protein. In structure-based screening, the former approach is mainly used with a protein-compound affinity matrix. In ligand-based screening, both the former and latter approaches are used, and the latter approach can be applied to various kinds of descriptors, such as 1D/2D descriptors/fingerprints and the affinity fingerprint given by the protein-compound affinity matrix.
  
  Add to my favourites
  
  Email this

- Virtual Screening with Support Vector Machines and Structure Kernels
  
  Authors: Pierre Mahe and Jean-Philippe Vert
  
  https://doi.org/10.2174/138620709788167926
  More Less
  
  Support vector machines and kernel methods have recently gained considerable attention in chemoinformatics. They offer generally good performance for problems of supervised classification or regression, and provide a flexible and computationally efficient framework to include relevant information and prior knowledge about the data and problems to be handled. In particular, with kernel methods molecules do not need to be represented and stored explicitly as vectors or fingerprints, but only to be compared to each other through a comparison function technically called a kernel. While classical kernels can be used to compare vector or fingerprint representations of molecules, completely new kernels were developed in the recent years to directly compare the 2D or 3D structures of molecules, without the need for an explicit vectorization step through the extraction of molecular descriptors. While still in their infancy, these approaches have already demonstrated their relevance on several toxicity prediction and structure-activity relationship problems.
  
  Add to my favourites
  
  Email this

- Reverse Fingerprinting and Mutual Information-Based Activity Labeling and Scoring (MIBALS)
  
  Authors: Chris Williams and Suzanne K. Schreyer
  
  https://doi.org/10.2174/138620709788167953
  More Less
  
  A mutual information based activity labeling and scoring (MIBALS) approach to reverse fingerprint analysis is presented. Whole molecule scores produced by the method are shown to be capable of ranking compounds in virtual highthroughput screening (vHTS) experiments, while fragment scores produced by the method are able to identify pharmacophore moieties important for biological activity. The performance of MIBALS in vHTS experiments is assessed using reference ligands active against 40 different biological targets, and MIBALS retrieval rates are compared with those obtained using more traditional group fusion similarity search methods. The use of MIBALS to identify important pharmacophore fragments is demonstrated by comparing ligand fragment scores with known pharmacophores and known ligand/protein contacts. The ability of MIBALS to highlight beneficial and detrimental groups in a congeneric series is examined by comparing MIBALS fragment scores with features in known structure-activity relationships.
  
  Add to my favourites
  
  Email this

- Review on Lazy Learning Regressors and their Applications in QSAR
  
  Authors: Abhijit J. Kulkarni, Valadi K. Jayaraman and Bhaskar D. Kulkarni
  
  https://doi.org/10.2174/138620709788167908
  More Less
  
  Building accurate quantitative structure-activity relationships (QSAR) is important in drug design, environmental modeling, toxicology, and chemical property prediction. QSAR methods can be utilized to solve mainly two types of problems viz., pattern recognition, (or classification) where output is discrete (i.e. class information), e.g., active or non-active molecule, binding or non-binding molecule etc., and function approximation, (i.e. regression) where the output is continuous (e.g., actual activity prediction). The present review deals with the second type of problem (regression) with specific attention to one of the most effective machine learning procedures, viz. lazy learning. The methodologies of the algorithm along with the relevant technical information are discussed in detail. We also present three real life case studies to briefly outline the typical characteristics of the modeling formalism.
  
  Add to my favourites
  
  Email this

Combinatorial Chemistry & High Throughput Screening - Volume 12, Issue 4, 2009

Volume 12, Issue 4, 2009

Volumes & issues

Most Read This Month

Most Cited Most Cited RSS feed