Skip to content
2000
Volume 20, Issue 6
  • ISSN: 1574-8936
  • E-ISSN: 2212-392X

Abstract

Background

Proteins play a vital role in sustaining life, requiring the formation of specific 3D structures to manifest their essential biological functions. Structure comparison techniques are benefiting from the ever-expanding repositories of the Protein Data Bank. The development of computational tools for protein and amino acid 3D structural comparisons plays an important role in understanding protein functions. The Triangular Spatial Relationship (TSR)-based was developed for such purpose.

Methods

A parallelization strategy and actual implementation on high-performance clusters using the distributed and shared memory programming model, along with the utilization of multi-core CPU and many-core GPU accelerators, were developed. 3D structures of proteins and amino acids are represented by an integer vector in the TSR-based method. This parallelization strategy is designed for the TSR-based method for large-scale 3D structural comparisons of proteins and amino acids in this study. It can also be adapted to other applications where a vector type of data structure is used.

Results

Due to the nature of the vector representation of protein and amino acid structures using the TSR-based method, the comparison algorithm is well-suited for parallelization on large scale supercomputers. Performance studies on the representative datasets were conducted to demonstrate the efficiency of the parallelization strategy. It allows comparisons of large 3D protein or amino acid structure datasets to finish within a reasonable amount of time.

Conclusion

The case studies, by taking advantage of this parallelization code, demonstrate that applying either mirror image or feature selection in the TSR-based algorithms improves the classifications of protein and amino acid 3D structures. The TSR keys have the advantage of performing structure-based BLAST searches. The parallelization code could be used as a reference for similar future studies.

Loading

Article metrics loading...

/content/journals/cbio/10.2174/0115748936306625240724102438
2024-08-01
2025-09-10
Loading full text...

Full text loading...

References

  1. BernsteinF.C. KoetzleT.F. WilliamsG.J.B. The protein data bank: A computer-based archival file for macromolecular structures.J. Mol. Biol.1977112353554210.1016/S0022‑2836(77)80200‑3 875032
    [Google Scholar]
  2. BermanH.M. WestbrookJ. FengZ. The protein data bank.Nucleic Acids Res.200028235242
    [Google Scholar]
  3. BermanH.M. GieraschL.M. How the protein data bank changed biology: An introduction to the JBC reviews thematic series, part 1.J. Biol. Chem.202129610060
    [Google Scholar]
  4. KufarevaI. AbagyanR. Methods of protein structure comparison.Methods Mol. Biol.201185723125710.1007/978‑1‑61779‑588‑6_10 22323224
    [Google Scholar]
  5. KryshtafovychA. VenclovasČ. FidelisK. MoultJ. Progress over the first decade of CASP experiments.Proteins200561S7Suppl. 722523610.1002/prot.20740 16187365
    [Google Scholar]
  6. ZemlaA. LGA: A method for finding 3D similarities in protein structures.Nucleic Acids Res.200331133370337410.1093/nar/gkg571 12824330
    [Google Scholar]
  7. ShindyalovI.N. BourneP.E. A database and tools for 3-D protein structure comparison and alignment using the Combinatorial Extension (CE) algorithm.Nucleic Acids Res.200129122822910.1093/nar/29.1.228 11125099
    [Google Scholar]
  8. ShindyalovI.N. BourneP.E. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path.Protein Eng. Des. Sel.199811973974710.1093/protein/11.9.739 9796821
    [Google Scholar]
  9. GellyJ.C. JosephA.P. SrinivasanN. de BrevernA.G. iPBA: A tool for protein structure comparison using sequence alignment strategies.Nucleic Acids Res.201139Suppl. 2W18-2310.1093/nar/gkr333 21586582
    [Google Scholar]
  10. AndonovR. Malod-DogninN. YanevN. Maximum contact map overlap revisited.J. Comput. Biol.2011181274110.1089/cmb.2009.0196 21210730
    [Google Scholar]
  11. WohlersI. PetzoldL. DominguesF.S. KlauG.W. PAUL: Protein structural alignment using integer linear programming and Lagrangian relaxation.BMC Bioinformatics200910Suppl. 13210.1186/1471‑2105‑10‑S13‑P2
    [Google Scholar]
  12. WohlersI. Malod-DogninN. AndonovR. KlauG.W. CSA: Comprehensive comparison of pairwise protein structure alignments.Nucleic Acids Res.201240W1W303-910.1093/nar/gks362 22553365
    [Google Scholar]
  13. ZhangY. SkolnickJ. TM-align: A protein structure alignment algorithm based on the TM-score.Nucleic Acids Res.20053372302230910.1093/nar/gki524 15849316
    [Google Scholar]
  14. HolmL. SanderC. Protein structure comparison by alignment of distance matrices.J. Mol. Biol.1993233112313810.1006/jmbi.1993.1489 8377180
    [Google Scholar]
  15. RazmaraJ. DerisS. ParvizpourS. A rapid protein structure alignment algorithm based on a text modeling technique.Bioinformation20116934434710.6026/97320630006344 21814392
    [Google Scholar]
  16. KleywegtG.J. Alwyn JonesT. Detecting folding motifs and similarities in protein structures.Methods Enzymol.199727752554510.1016/S0076‑6879(97)77029‑0 18488323
    [Google Scholar]
  17. OrtizA.R. StraussC.E.M. OlmeaO. MAMMOTH (Matching molecular models obtained from theory): An automated method for model comparison.Protein Sci.200211112606262110.1110/ps.0215902 12381844
    [Google Scholar]
  18. LevittM. GersteinM. A unified statistical framework for sequence comparison and structure comparison.Proc. Natl. Acad. Sci. USA199895115913592010.1073/pnas.95.11.5913 9600892
    [Google Scholar]
  19. GersteinM. LevittM. Comprehensive assessment of automatic structural alignment against a manual standard, the scop classification of proteins.Protein Sci.19987244545610.1002/pro.5560070226 9521122
    [Google Scholar]
  20. ShapiroJ BrutlagD. FoldMiner and LOCK 2: Protein structure comparison and motif discovery on the web.Nucleic Acids Res200432(Web Server): W536-4110.1093/nar/gkh389 15215444
    [Google Scholar]
  21. SzustakowskiJ.D. WengZ. Protein structure alignment using a genetic algorithm.Proteins200038442844010.1002/(SICI)1097‑0134(20000301)38:4<428:AID‑PROT8>3.0.CO;2‑N 10707029
    [Google Scholar]
  22. KleywegtG.J. Use of non-crystallographic symmetry in protein structure refinement.Acta Crystallogr. D Biol. Crystallogr.199652484285710.1107/S0907444995016477 15299650
    [Google Scholar]
  23. KawabataT. NishikawaK. Protein structure comparison using the Markov transition model of evolution.Proteins200041110812210.1002/1097‑0134(20001001)41:1<108:AID‑PROT130>3.0.CO;2‑S 10944398
    [Google Scholar]
  24. KawabataT. MATRAS: A program for protein 3D structure comparison.Nucleic Acids Res.200331133367336910.1093/nar/gkg581 12824329
    [Google Scholar]
  25. YangA.S. HonigB. An integrated approach to the analysis and modeling of protein sequences and structures. I. Protein structural alignment and a quantitative measure for protein structural distance 1 1Edited by F. E. Cohen.J Mol Biol200030136657810.1006/jmbi.2000.3973 10966776
    [Google Scholar]
  26. LacknerP. KoppensteinerW.A. SipplM.J. DominguesF.S. ProSup: A refined tool for protein structure alignment.Protein Eng. Des. Sel.2000131174575210.1093/protein/13.11.745 11161105
    [Google Scholar]
  27. KrissinelE. HenrickK. Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions.Acta Crystallogr. D Biol. Crystallogr.200460122256226810.1107/S0907444904026460 15572779
    [Google Scholar]
  28. YeY. GodzikA. Flexible structure alignment by chaining aligned fragment pairs allowing twists.Bioinformatics200319Suppl. 2ii246ii25510.1093/bioinformatics/btg1086 14534198
    [Google Scholar]
  29. JungJ. LeeB. Protein structure alignment using environmental profiles.Protein Eng. Des. Sel.200013853554310.1093/protein/13.8.535 10964982
    [Google Scholar]
  30. BacharO. FischerD. NussinovR. WolfsonH. A computer vision based technique for 3-D sequence-independent structural comparison of proteins.Protein Eng. Des. Sel.19936327928710.1093/protein/6.3.279 8506262
    [Google Scholar]
  31. MadejT. LanczyckiC.J. ZhangD. MMDB and VAST+: Tracking structural similarities between macromolecular complexes.Nucleic Acids Res.201442D1D297D30310.1093/nar/gkt1208 24319143
    [Google Scholar]
  32. MadejT. GibratJ.F. BryantS.H. Threading a database of protein cores.Proteins199523335636910.1002/prot.340230309 8710828
    [Google Scholar]
  33. NadzirinN. GardinerE.J. WillettP. ArtymiukP.J. Firdaus-RaihM. SPRITE and ASSAM: Web servers for side chain 3D-motif searching in protein structures.Nucleic Acids Res.201240W1W380-610.1093/nar/gks401 22573174
    [Google Scholar]
  34. NadzirinN. WillettP. ArtymiukP.J. Firdaus-RaihM. IMAAAGINE: A webserver for searching hypothetical 3D amino acid side chain arrangements in the Protein Data Bank.Nucleic Acids Res.201341W1W432-4010.1093/nar/gkt431 23716645
    [Google Scholar]
  35. GolovinA. HenrickK. MSDmotif: Exploring protein sites and motifs.BMC Bioinformatics200891312210.1186/1471‑2105‑9‑312 18637174
    [Google Scholar]
  36. SinghA.P. BrutlagD.L. Hierarchical protein structure superposition using both secondary structure and atomic representations.Proc. Int. Conf. Intell. Syst. Mol. Biol.19975284293 9322051
    [Google Scholar]
  37. KoncJ. JanežičD. ProBiS algorithm for detection of structurally similar protein binding sites by local structural alignment.Bioinformatics20102691160116810.1093/bioinformatics/btq100 20305268
    [Google Scholar]
  38. KoncJ. JanežičD. ProBiS-ligands: A web server for prediction of ligands by examination of protein binding sites.Nucleic Acids Res.201442W1W215-2010.1093/nar/gku460 24861616
    [Google Scholar]
  39. Lo ConteL. AileyB. HubbardT.J.P. BrennerS.E. MurzinA.G. ChothiaC. SCOP: A structural classification of proteins database.Nucleic Acids Res.200028125725910.1093/nar/28.1.257 10592240
    [Google Scholar]
  40. GreeneLH LewisTE AddouS The CATH domain structure database: New protocols and classification levels give a more comprehensive resource for exploring evolution.Nucleic Acids Res200735DatabaseD291D29710.1093/nar/gkl959 17135200
    [Google Scholar]
  41. AlvesN.A. MartinezA.S. Inferring topological features of proteins from amino acid residue networks.Physica A2007375133634410.1016/j.physa.2006.09.014
    [Google Scholar]
  42. BartoliL. FariselliP. CasadioR. The effect of backbone on the small-world properties of protein contact maps.Phys. Biol.200844L1L510.1088/1478‑3975/4/4/L01 18185011
    [Google Scholar]
  43. KonnoS. NamikiT. IshimoriK. Quantitative description and classification of protein structures by a novel robust amino acid network: interaction selective network (ISN).Sci. Rep.2019911665410.1038/s41598‑019‑52766‑6
    [Google Scholar]
  44. RussellR.B. BartonG.J. Multiple protein sequence alignment from tertiary structure comparison: Assignment of global and residue confidence levels.Proteins199214230932310.1002/prot.340140216 1409577
    [Google Scholar]
  45. KondraS. SarkarT. RaghavanV. XuW. Development of a TSR-based method for protein 3-D structural comparison with its applications to protein classification and motif discovery.Front Chem.2021860229110.3389/fchem.2020.602291 33520934
    [Google Scholar]
  46. SarkarT. RaghavanV.V. ChenF. RileyA. ZhouS. XuW. Exploring the effectiveness of the TSR-based protein 3-D structural comparison method for protein clustering, and structural motif identification and discovery of protein kinases, hydrolases, and SARS-CoV-2’s protein via the application of amino acid grouping.Comput. Biol. Chem.20219210747910.1016/j.compbiolchem.2021.107479 33951604
    [Google Scholar]
  47. SarkarT. ReauxC.R. LiJ. RaghavanV.V. XuW. The specific applications of the TSR-based method in identifying Zn2+ binding sites of proteases and ACE/ACE2.Data Brief20224510862910.1016/j.dib.2022.108629 36426009
    [Google Scholar]
  48. SarkarT. ChenY. WangY. Introducing mirror-image discrimination capability to the TSR-based method for capturing stereo geometry and understanding hierarchical structure relationships of protein receptor family.Comput. Biol. Chem.202310310782410.1016/j.compbiolchem.2023.107824 36753783
    [Google Scholar]
  49. KondraS. ChenF. ChenY. ChenY. ColletteC.J. XuW. A study of a hierarchical structure of proteins and ligand binding sites of receptors using the triangular spatial relationship‐based structure comparison method and development of a size‐filtering feature designed for comparing different sizes of protein structures.Proteins202290123925710.1002/prot.26215 34392570
    [Google Scholar]
  50. LuoL. DaigleJ. ChenY. Structural and functional studies of a eukaryotic type Ser/Thr kinase, Slr0599, of Synechocystis sp. PCC 6803 using a combination of experimental and computational approaches. HouH.J.M. AllakhverdievS.I. Photosynthesis From Plants to Nanomaterials.Academic Press202310.1016/B978‑0‑323‑98391‑4.00001‑0
    [Google Scholar]
  51. XuW. XieX.J. FaustA.K. All-atomic molecular dynamic studies of human and Drosophila CDK8: Insights into their kinase domains, the LXXLL motifs, and drug binding site.Int. J. Mol. Sci.20202120751110.3390/ijms21207511 33053834
    [Google Scholar]
  52. Message Passing ForumM.P.I. A Message-Passing Interface Standard.University of Tennessee1994
    [Google Scholar]
  53. DagumL. MenonR. OpenM.P. OpenM.P. An industry standard API for shared-memory programming.IEEE Comput. Sci. Eng.199851465510.1109/99.660313
    [Google Scholar]
  54. RabenseifnerR. HagerG. JostG. Hybrid MPI/OpenMP parallel programming on clusters of multi-core SMP nodes.2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing, . Weimar, Germany, 18-20 February 2009,427436
    [Google Scholar]
  55. GuruD.S. NagabhushanP. Triangular spatial relationship: A new approach for spatial knowledge representation.Pattern Recognit. Lett.2001229999100610.1016/S0167‑8655(01)00043‑5
    [Google Scholar]
  56. JaccardP. Comparative study of floral distribution in a portion of the Alps and the Jura.Bull. Soc. Vaud. Sci. Nat.190137547579
    [Google Scholar]
  57. AckermanM. Ben-DavidS. A characterization of linkage-based hierarchical clustering.J. Mach. Learn. Res.20161781828198
    [Google Scholar]
  58. LuG. MoriyamaE.N. VectorN.T.I. Vector NTI, a balanced all-in-one sequence analysis suite.Brief. Bioinform.20045437838810.1093/bib/5.4.378 15606974
    [Google Scholar]
  59. HumphreyW DalkeA SchultenK. VMD: Visual molecular dynamics.J Mol Graph199614133-38, 27-2810.1016/0263‑7855(96)00018‑5 8744570
    [Google Scholar]
  60. KumarS. StecherG. TamuraK. MEGA7: Molecular evolutionary genetics analysis version 7.0 for bigger datasets.Mol. Biol. Evol.20163371870187410.1093/molbev/msw054 27004904
    [Google Scholar]
  61. GroppW.D. LuskE. SkjellumA. Using MPI: Portable parallel programming with the message-passing interface.MIT press1999
    [Google Scholar]
  62. OpenMP Architecture Review BoardsOpenMP Compilers & Tools.2019Available from: https://www.openmp.org/resources/openmp-compilers-tools/
    [Google Scholar]
  63. NickollsJ. DallyW.J. The GPU computing era.IEEE Micro2010302566910.1109/MM.2010.41
    [Google Scholar]
  64. KirkD.B. Wen-MeiW.H. Programming massively parallel processors: A hands-on approach.Morgan kaufmann2016
    [Google Scholar]
  65. The openacc application programming interface, version 2.5.2015Available from: https://www.openacc.org/sites/default/files/inline-files/OpenACC_2pt5.pdf
  66. NVidia CorporationPGI version 18.10 Documentation for x86 and NVIDIA Processors.2018Available from: https://docs.nvidia.com/hpc-sdk/pgi-compilers/18.10/x86/index.htm
    [Google Scholar]
  67. The HDF GroupHierarchical Data Format, version 5.1999Available from: https://www.loc.gov/preservation/digital/formats/fdd/fdd000229.shtml
    [Google Scholar]
  68. PachecoP.S. An introduction to parallel programming.Burlington, Mass.Morgan Kaufmann Publishers2011
    [Google Scholar]
  69. GuzenkoD. BurleyS.K. DuarteJ.M. Real time structural search of the protein data bank.PLOS Comput. Biol.2020167e100797010.1371/journal.pcbi.1007970 32639954
    [Google Scholar]
  70. Romero RomeroM.L. YangF. LinY.R. Simple yet functional phosphate-loop proteins.Proc. Natl. Acad. Sci. USA201811551E11943E1195010.1073/pnas.1812400115 30504143
    [Google Scholar]
  71. WalkerJ.E. SarasteM. RunswickM.J. GayN.J. Distantly related sequences in the alpha- and beta-subunits of ATP synthase, myosin, kinases and other ATP-requiring enzymes and a common nucleotide binding fold.EMBO J.19821894595110.1002/j.1460‑2075.1982.tb01276.x 6329717
    [Google Scholar]
  72. NietoJ.J. TorresA. GeorgiouD.N. KarakasidisT.E. Fuzzy polynucleotide spaces and metrics.Bull. Math. Biol.200668370372510.1007/s11538‑005‑9020‑5 16794951
    [Google Scholar]
/content/journals/cbio/10.2174/0115748936306625240724102438
Loading
/content/journals/cbio/10.2174/0115748936306625240724102438
Loading

Data & Media loading...

Supplements

Supplementary material is available on the publisher’s website along with the published article. Declaration of Generative AI and AI-assisted Technologies in the writing process. The authors did not use any AI-assisted technologies during the preparation of this manuscript.

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test