PyComp: A Versatile Tool for Efficient Data Extraction, Conversion, and Management in High-throughput Virtual Drug Screening

Mohsen Sisakht; Mohammad Keyvanloo Shahrestanaki; Jafar Fallahi; Vahid Razban

doi:10.2174/0115734099274495231218150611

ISSN: 1573-4099
E-ISSN: 1875-6697

PyComp: A Versatile Tool for Efficient Data Extraction, Conversion, and Management in High-throughput Virtual Drug Screening
Authors: Mohsen Sisakht¹, Mohammad Keyvanloo Shahrestanaki², Jafar Fallahi¹ and Vahid Razban¹
View Affiliations Hide Affiliations

¹ Department of Molecular Medicine, School of Advanced Medical Sciences and Technologies, Shiraz University of Medical Sciences, Shiraz, Iran ; ² Department of Nutrition and Biochemistry, School of Medicine, Sabzevar University of Medical Sciences, Sabzevar, Iran
Source: Current Computer - Aided Drug Design, Volume 21, Issue 4, Jun 2025, p. 479 - 486
DOI: https://doi.org/10.2174/0115734099274495231218150611
- Received: 14 Sep 2023
- Accepted: 02 Dec 2023
- Available online: 08 Jan 2024

Abstract

Background

Virtual screening (VS) is essential for analyzing potential drug candidates in drug discovery. Often, this involves the conversion of large volumes of compound data into specific formats suitable for computational analysis. Managing and processing this wealth of information, especially when dealing with vast numbers of compounds in various forms, such as names, identifiers, or SMILES strings, can present significant logistical and technical challenges.

Methods

To streamline this process, we developed PyComp, a software tool using Python's PyQt5 library, and compiled it into an executable with Pyinstaller. PyComp provides a systematic way for users to retrieve and convert a list of compound names, IDs (even in a range), or SMILES strings into the desired 3D format.

Results

PyComp greatly enhances the efficiency of data extraction, conversion, and storage processes involved in VS. It searches for similar compounds coupled with its ability to handle misidentified compounds and offers users an easy-to-use, customizable tool for managing large-scale compound data. By streamlining these operations, PyComp allows researchers to save significant time and effort, thus accelerating the pace of drug discovery research.

Conclusion

PyComp effectively addresses some of the most pressing challenges in high-throughput VS: efficient management and conversion of large volumes of compound data. As a user-friendly, customizable software tool, PyComp is pivotal in improving the efficiency and success of large-scale drug screening efforts, paving the way for faster discovery of potential therapeutic compounds.

Article metrics loading...

/content/journals/cad/10.2174/0115734099274495231218150611

2024-01-08

2026-02-27

From This Site

/content/journals/cad/10.2174/0115734099274495231218150611

dcterms_title,dcterms_subject,pub_keyword

-contentType:Contributor -contentType:Concept -contentType:Institution

10

5

Full text loading...

References

FerreiraL. dos SantosR. OlivaG. AndricopuloA. Molecular docking and structure-based drug design strategies.Molecules2015207133841342110.3390/molecules20071338426205061
[Google Scholar]
TripathiA. MisraK. Molecular docking: A structure-based drug designing approach.JSM Chem.20175210421047
[Google Scholar]
MengX.Y. ZhangH.X. MezeiM. CuiM. Molecular docking: A powerful approach for structure-based drug discovery.Curr. Computeraided Drug Des.20117214615710.2174/15734091179567760221534921
[Google Scholar]
KimS. ThiessenP.A. BoltonE.E. ChenJ. FuG. GindulyteA. HanL. HeJ. HeS. ShoemakerB.A. WangJ. YuB. ZhangJ. BryantS.H. PubChem substance and compound databases.Nucleic Acids Res.201644D1D1202D121310.1093/nar/gkv95126400175
[Google Scholar]
KimS. ChenJ. ChengT. GindulyteA. HeJ. HeS. LiQ. ShoemakerB.A. ThiessenP.A. YuB. ZaslavskyL. ZhangJ. BoltonE.E. PubChem in 2021: New data content and improved web interfaces.Nucleic Acids Res.202149D1D1388D139510.1093/nar/gkaa97133151290
[Google Scholar]
XieX.Q.S. Exploiting PubChem for virtual screening.Expert Opin. Drug Discov.20105121205122010.1517/17460441.2010.52492421691435
[Google Scholar]
XieX.Q. ChenJ.Z. Data mining a small molecule drug screening representative subset from NIH PubChem.J. Chem. Inf. Model.200848346547510.1021/ci700193u18302356
[Google Scholar]
FontaineF. BoltonE. BorodinaY. BryantS.H. Fast 3D shape screening of large chemical databases through alignment-recycling.Chem. Cent. J.2007111210.1186/1752‑153X‑1‑1217880744
[Google Scholar]
GuhaR. Van DrieJ.H. Structure--activity landscape index: Identifying and quantifying activity cliffs.J. Chem. Inf. Model.200848364665810.1021/ci700409318303878
[Google Scholar]
KimberT.B. ChenY. VolkamerA. Deep learning in virtual screening: Recent applications and developments.Int. J. Mol. Sci.2021229443510.3390/ijms2209443533922714
[Google Scholar]
KorotcovA. TkachenkoV. RussoD.P. EkinsS. Comparison of deep learning with multiple machine learning methods and metrics using diverse drug discovery data sets.Mol. Pharm.201714124462447510.1021/acs.molpharmaceut.7b0057829096442
[Google Scholar]
HussinS.K. AbdelmageidS.M. AlkhalilA. OmarY.M. MarieM.I. RamadanR.A. Handling imbalance classification virtual screening big data using machine learning algorithms.Complexity2021202111510.1155/2021/6675279
[Google Scholar]
AdeshinaY.O. DeedsE.J. KaranicolasJ. Machine learning classification can reduce false positives in structure-based virtual screening.Proc. Natl. Acad. Sci. USA202011731184771848810.1073/pnas.200058511732669436
[Google Scholar]
LongY. WuM. LiuY. FangY. KwohC.K. ChenJ. LuoJ. LiX. Pre-training graph neural networks for link prediction in biomedical networks.Bioinformatics20223882254226210.1093/bioinformatics/btac10035171981
[Google Scholar]
RathS. PandaS. SacchettiniJ.C. BerthelS.J. DAIKON: A data acquisition, integration, and knowledge capture web application for target-based drug discovery.ACS Pharmacol. Transl. Sci.2023671043105110.1021/acsptsci.3c0003437470023
[Google Scholar]
O’BoyleN.M. BanckM. JamesC.A. MorleyC. VandermeerschT. HutchisonG.R. Open Babel: An open chemical toolbox.J. Cheminform.2011313310.1186/1758‑2946‑3‑3321982300
[Google Scholar]
SisakhtM. MahmoodzadehA. DarabianM. Plant‐derived chemicals as potential inhibitors of SARS‐CoV ‐2 main protease ( 6LU7 ), a virtual screening study.Phytother. Res.20213563262327410.1002/ptr.704133759279
[Google Scholar]
MurphyA.H. The Finley affair: A signal event in the history of forecast verification.Weather Forecast.199611132010.1175/1520‑0434(1996)011<0003:TFAASE>2.0.CO;2
[Google Scholar]
BeierleJ. AlgorriM. CortésM. CauchonN.S. LennardA. KirwanJ.P. OghamianS. AbernathyM.J. Structured content and data management—enhancing acceleration in drug development through efficiency in data exchange.AAPS Open2023911110.1186/s41120‑023‑00077‑637193559
[Google Scholar]
TanoliZ. SeemabU. SchererA. WennerbergK. TangJ. Vähä-KoskelaM. Exploration of databases and methods supporting drug repurposing: A comprehensive survey.Brief. Bioinform.20212221656167810.1093/bib/bbaa00332055842
[Google Scholar]
WishartD.S. FeunangY.D. GuoA.C. LoE.J. MarcuA. GrantJ.R. SajedT. JohnsonD. LiC. SayeedaZ. AssempourN. IynkkaranI. LiuY. MaciejewskiA. GaleN. WilsonA. ChinL. CummingsR. LeD. PonA. KnoxC. WilsonM. DrugBank 5.0: A major update to the DrugBank database for 2018.Nucleic Acids Res.201846D1D1074D108210.1093/nar/gkx103729126136
[Google Scholar]
GaultonA. BellisL.J. BentoA.P. ChambersJ. DaviesM. HerseyA. LightY. McGlincheyS. MichalovichD. Al-LazikaniB. OveringtonJ.P. ChEMBL: a large-scale bioactivity database for drug discovery.Nucleic Acids Res.201240D1D1100D110710.1093/nar/gkr77721948594
[Google Scholar]
KanehisaM. The KEGG database. ‘In silico’ simulation of biological processes: Novartis Foundation SymposiumWiley Online Library2002
[Google Scholar]
IrwinJ.J. ShoichetB.K. ZINC--a free database of commercially available compounds for virtual screening.J. Chem. Inf. Model.200545117718210.1021/ci049714+15667143
[Google Scholar]
BrownN. CambruzziJ. CoxP.J. DaviesM. DunbarJ. PlumbleyD. SellwoodM.A. SimA. Williams-JonesB.I. ZwierzynaM. SheppardD.W. Big data in drug discovery.Prog. Med. Chem.201857127735610.1016/bs.pmch.2017.12.00329680150
[Google Scholar]

/content/journals/cad/10.2174/0115734099274495231218150611

PyComp: A Versatile Tool for Efficient Data Extraction, Conversion, and Management in High-throughput Virtual Drug Screening

Curr. Computeraided Drug Des. 21, 479 (2025); https://doi.org/10.2174/0115734099274495231218150611

/content/journals/cad/10.2174/0115734099274495231218150611

Data & Media loading...

Article Type: Research Article

Keyword(s): high-throughput screening; misidentified compounds; pharmaceutical compounds; PubChem; PyComp; SMILES strings; Virtual screening

PyComp: A Versatile Tool for Efficient Data Extraction, Conversion, and Management in High-throughput Virtual Drug Screening

Abstract

From This Site

Most Read This Month

Most Cited Most Cited RSS feed

Molecular Docking: A Powerful Approach for Structure-Based Drug Discovery

Recent Advances in Free Energy Calculations with a Combination of Molecular Mechanics and Continuum Models

Recent Advances in Ligand-Based Drug Design: Relevance and Utility of the Conformationally Sampled Pharmacophore Approach

Recent Advances in Docking and Scoring

Visualization of the Chemical Space in Drug Discovery

Metabolomics of Medicinal Plants: The Importance of Multivariate Analysis of Analytical Chemistry Data

Structure-Activity Relationships and Rational Design Strategies for Radical- Scavenging Antioxidants

Nonlinear SVM Approaches to QSPR/QSAR Studies and Drug Design

Pharmacophore Based Drug Design Approach as a Practical Process in Drug Discovery

The Role of Hydrophobicity in Toxicity Prediction