Predicting Polymerase Chain Reaction Success: Integrating the K-Word Order Model, Physicochemical Properties Modeling of Double Bases, and Support Vector Machine

Long Yan; Yong Liu; Yan Yang

doi:10.2174/0113862073351071250102100221

image of Predicting Polymerase Chain Reaction Success: Integrating the K-Word Order Model, Physicochemical Properties Modeling of Double Bases, and Support Vector Machine

Predicting Polymerase Chain Reaction Success: Integrating the K-Word Order Model, Physicochemical Properties Modeling of Double Bases, and Support Vector Machine
Authors: Long Yan¹, Yong Liu² and Yan Yang¹
View Affiliations Hide Affiliations

¹ The College of Health Humanities, Jinzhou Medical University, Jinzhou 121001, China; ² Department of Orthopedics, Meihe Hospital of the First Hospital of Jilin University, Changchun 130000, China
Source: Combinatorial Chemistry & High Throughput Screening
Available online: 23 January 2025
DOI: https://doi.org/10.2174/0113862073351071250102100221
- Received: 02 Sep 2024
- Accepted: 26 Nov 2024
- Available online: 23 Jan 2025

Abstract

Introduction

Polymerase Chain Reaction (PCR) has been a pivotal scientific technique since the twentieth century, and it is widely applied across various domains. Despite its ubiquity, challenges persist in efficiently amplifying specific DNA templates.

Method

While PCR experimental procedures have garnered significant attention, the analysis of the DNA template, which is the experiment's focal point, has been notably overlooked. This study addresses the uncertainty surrounding the amplification of DNA fragments using conventional Taq DNA polymerase-based PCR protocols. The imperative need to characterize DNA templates and devise a reliable method for predicting PCR success is underscored.

Result

In this study, we formulate a 72-dimensional feature vector representing a DNA template through the utilization of k-word order and modeling of physicochemical properties of double bases. Subsequently, a Support Vector Machine (SVM) model is employed to assess PCR results.

Conclusion

A jackknife cross-validation test is used to evaluate the anticipated success rates, resulting in an overall accuracy of 95.77%. Sensitivity, specificity, and Matthew's Correlation Coefficient (MCC) stand at 95.75%, 95.79%, and 0.915, respectively.

Article metrics loading...

/content/journals/cchts/10.2174/0113862073351071250102100221

2025-01-23

2026-03-04

From This Site

/content/journals/cchts/10.2174/0113862073351071250102100221

dcterms_title,dcterms_subject,pub_keyword

-contentType:Contributor -contentType:Concept -contentType:Institution

10

5

Full text loading...

References

Kleppe K. Ohtsuka E. Kleppe R. Molineux I. Khorana H.G. Studies on polynucleotides. J. Mol. Biol. 1971 56 2 341 361 10.1016/0022‑2836(71)90469‑4 4927950
[Google Scholar]
Mullis K.B. The unusual origin of the polymerase chain reaction. Sci. Am. 1990 262 4 56 61 10.1038/scientificamerican0490‑56
[Google Scholar]
Saiki R.K. Gelfand D.H. Stoffel S. Scharf S.J. Higuchi R. Horn G.T. Mullis K.B. Erlich H.A. Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science 1988 239 4839 487 491 10.1126/science.2448875 2448875
[Google Scholar]
Jacobs B.K.M. Goetghebeur E. Clement L. Impact of variance components on reliability of absolute quantification using digital PCR. BMC Bioinformatics 2014 15 1 283 10.1186/1471‑2105‑15‑283 25147026
[Google Scholar]
Hayden R.T. Primary quantitative reference standards for viral nucleic acids should be developed using digital polymerase chain reaction instead of consensus testing. J. Clin. Microbiol. 2023 61 1 e01338-22 10.1128/jcm.01338‑22 36475837
[Google Scholar]
Leonaviciene G. Mazutis L. RNA cytometry of single-cells using semi-permeable microcapsules. Nucleic Acids Res. 2023 51 1 e2 10.1093/nar/gkac918 36268865
[Google Scholar]
Zhong Z. Wang J. He S. Su X. Huang W. Chen M. Zhuo Z. Zhu X. Fang M. Li T. Zhang S. Ge S. Zhang J. Xia N. An encodable multiplex microsphere-phase amplification sensing platform detects SARS-CoV-2 mutations. Biosens. Bioelectron. 2022 203 114032 10.1016/j.bios.2022.114032 35131697
[Google Scholar]
Liu H. Gao X. Xu C. Liu D. SERS tags for biomedical detection and bioimaging. Theranostics 2022 12 4 1870 1903 10.7150/thno.66859 35198078
[Google Scholar]
Li Y. Solis-Ruiz J. Yang F. Long N. Tong C.H. Lacbawan F.L. Racke F.K. Press R.D. NGS-defined measurable residual disease (MRD) after initial chemotherapy as a prognostic biomarker for acute myeloid leukemia. Blood Cancer J. 2023 13 1 59 10.1038/s41408‑023‑00833‑7 37088803
[Google Scholar]
Armstrong E. Hemmerling A. Miller S. Burke K.E. Newmann S.J. Morris S.R. Reno H. Huibner S. Kulikova M. Liu R. Crawford E.D. Castañeda G.R. Nagelkerke N. Coburn B. Cohen C.R. Kaul R. Metronidazole treatment rapidly reduces genital inflammation through effects on bacterial vaginosis–associated bacteria rather than lactobacilli. J. Clin. Invest. 2022 132 6 e152930 10.1172/JCI152930 35113809
[Google Scholar]
Sanz-Garcia E. Zhao E. Bratman S.V. Siu L.L. Monitoring and adapting cancer treatment using circulating tumor DNA kinetics: Current research, opportunities, and challenges. Sci. Adv. 2022 8 4 eabi8618 10.1126/sciadv.abi8618 35080978
[Google Scholar]
Chen X. Qiu T.T. Wang Y. Xu L.Y. Sun J. Jiang Z.H. Zhao W. Tao T. Zhou Y.W. Wei L.S. Li Y.Q. Zheng Y.Y. Zhou G.H. Chen H.Q. Zhang J. Feng X.B. Wang F.Y. Li N. Zhang X.N. Jiang J. Zhu M.S. A Shigella species variant is causally linked to intractable functional constipation. J. Clin. Invest. 2022 132 14 e150097 10.1172/JCI150097 35617029
[Google Scholar]
Shrivastava P. Jain T. Kumawat R.K. Direct PCR amplification from saliva sample using non-direct multiplex STR kits for forensic DNA typing. Sci. Rep. 2021 11 1 7112 10.1038/s41598‑021‑86633‑0 33782478
[Google Scholar]
Holman A.P. Kurouski D. Role of race/ethnicity, sex, and age in surface-enhanced raman spectroscopy and infrared spectroscopy-based analysis of artificial colorants on hair. ACS Omega 2023 8 23 20675 20683 10.1021/acsomega.3c01241 37332797
[Google Scholar]
Innis M.A. Gelfand D.H. Optimization of PCRs. PCR Protocols. Innis M.A. Gelfand D.H. Sninsky T.J. White T.J. New York, USA Academic Press 1990 3 12
[Google Scholar]
Varadaraj K. Skinner D.M. Denaturants or cosolvents improve the specificity of PCR amplification of a G + C-rich DNA using genetically engineered DNA polymerases. Gene 1994 140 1 1 5 10.1016/0378‑1119(94)90723‑4 8125324
[Google Scholar]
Haas S. Vingron M. Poustka A. Wiemann S. Primer design for large scale sequencing. Nucleic Acids Res. 1998 26 12 3006 3012 10.1093/nar/26.12.3006 9611248
[Google Scholar]
Zhang Z.Z. Meng L.Y. Yu X.Q. Li C. An ALE-Index based algorithm facilitated prediction of success for polymerase chain reactions. 3rd International Conference on Bioinformatics and Biomedical Engineering, Beijing, China, 11-13 June 2009, pp. 1-6
[Google Scholar]
Rychlik W. 1993 Selection of primers for polymerase chain reaction. 10.1385/0‑89603‑244‑2:31
[Google Scholar]
Rychlik W. Spencer W.J. Rhoads R.E. Optimization of the annealing temperature for DNA amplification in vitro. Nucleic Acids Res. 1990 18 21 6409 6412 10.1093/nar/18.21.6409 2243783
[Google Scholar]
Rozen S. Skaletsky H. Primer3 on the WWW for general users and for biologist programmers. Methods Mol. Biol. 2000 132 365 386 10547847
[Google Scholar]
Erdem A. Eksin E. Zip Nucleic acid-based genomagnetic assay for electrochemical detection of microRNA-34a. Biosensors 2023 13 1 144 10.3390/bios13010144 36671979
[Google Scholar]
Zhang R. Zhang C.T. Z curves, an intutive tool for visualizing and analyzing the DNA sequences. J. Biomol. Struct. Dyn. 1994 11 4 767 782 10.1080/07391102.1994.10508031 8204213
[Google Scholar]
Nandy A. Basak S.C. New approaches to drug-DNA interactions based on graphical representation and numerical characterization of DNA sequences. Curr. Computeraided Drug Des. 2010 6 4 283 289 10.2174/1573409911006040283 20883203
[Google Scholar]
Ghosh A. Nandy A. Graphical representation and mathematical characterization of protein sequences and applications to viral proteins. Adv. Protein Chem. Struct. Biol. 2011 83 1 42 10.1016/B978‑0‑12‑381262‑9.00001‑X 21570664
[Google Scholar]
Lin X. Chen J. Lu W. Guo H. An edge-weighted graph triumvirate to represent modular building layouts. Autom. Construct. 2024 157 1 105140 10.1016/j.autcon.2023.105140
[Google Scholar]
Randić M. Vračko M. Lerš N. Plavšić D. Novel 2-D graphical representation of DNA sequences and their numerical characterization. Chem. Phys. Lett. 2003 368 1-2 1 6 10.1016/S0009‑2614(02)01784‑0
[Google Scholar]
Randić M. Novič M. Vračko M. Plavšić D. Study of proteome maps using partial ordering. J. Theor. Biol. 2010 266 1 21 28 10.1016/j.jtbi.2010.06.008 20542044
[Google Scholar]
Randić M. Zupan J. Balaban A.T. Vikić-Topić D. Plavšić D. Graphical representation of proteins. Chem. Rev. 2011 111 2 790 862 10.1021/cr800198j 20939561
[Google Scholar]
Randić M. Novič M. Plavšić D. Milestones in graphical bioinformatics. Int. J. Quantum Chem. 2013 113 22 2413 2446 10.1002/qua.24479
[Google Scholar]
Blaisdell B.E. A measure of the similarity of sets of sequences not requiring sequence alignment. Proc. Natl. Acad. Sci. USA 1986 83 14 5155 5159 10.1073/pnas.83.14.5155 3460087
[Google Scholar]
Burge C. Campbell A.M. Karlin S. Over- and under-representation of short oligonucleotides in DNA sequences. Proc. Natl. Acad. Sci. USA 1992 89 4 1358 1362 10.1073/pnas.89.4.1358 1741388
[Google Scholar]
Karlin S. Ladunga I. Comparisons of eukaryotic genomic sequences. Proc. Natl. Acad. Sci. USA 1994 91 26 12832 12836 10.1073/pnas.91.26.12832 7809130
[Google Scholar]
Kariin S. Burge C. Dinucleotide relative abundance extremes: A genomic signature. Trends Genet. 1995 11 7 283 290 10.1016/S0168‑9525(00)89076‑9 7482779
[Google Scholar]
Rocha E.P.C. Viari A. Danchin A. Oligonucleotide bias in Bacillus subtilis: General trends and taxonomic comparisons. Nucleic Acids Res. 1998 26 12 2971 2980 10.1093/nar/26.12.2971 9611243
[Google Scholar]
Chen W. Feng P.M. Lin H. Chou K.C. iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition. Nucleic Acids Res. 2013 41 6 e68 10.1093/nar/gks1450 23303794
[Google Scholar]
Xu Y. Ding J. Wu L.Y. Chou K.C. iSNO-PseAAC: Predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition. PLoS One 2013 8 2 e55844 10.1371/journal.pone.0055844 23409062
[Google Scholar]
Qiu W.R. Xiao X. Chou K.C. iRSpot-TNCPseAAC: Identify recombination spots with trinucleotide composition and pseudo amino acid components. Int. J. Mol. Sci. 2014 15 2 1746 1766 10.3390/ijms15021746 24469313
[Google Scholar]
Hao B. Qi J. Wang B. Prokaryotic phylogeny based on complete genomes without sequence aligment. Mod. Phys. Lett. B 2003 17 3 91 94 10.1142/S0217984903004968
[Google Scholar]
Qi J. Wang B. Hao B.I. Whole proteome prokaryote phylogeny without sequence alignment: A K-string composition approach. J. Mol. Evol. 2004 58 1 1 11 10.1007/s00239‑003‑2493‑7 14743310
[Google Scholar]
Gao L. Qi J. Sun J. Hao B. Prokaryote phylogeny meets taxonomy: An exhaustive comparison of composition vector trees with systematic bacteriology. Sci. China C Life Sci. 2007 50 5 587 599 10.1007/s11427‑007‑0084‑3 17879055
[Google Scholar]
Wang H. Xu Z. Gao L. Hao B. A fungal phylogeny based on 82 complete genomes using the composition vector method. BMC Evol. Biol. 2009 9 1 195 10.1186/1471‑2148‑9‑195 19664262
[Google Scholar]
Pride D.T. Meinersmann R.J. Wassenaar T.M. Blaser M.J. Evolutionary implications of microbial genome tetranucleotide frequency biases. Genome Res. 2003 13 2 145 158 10.1101/gr.335003 12566393
[Google Scholar]
Wu X. Cai Z. Wan X.F. Hoang T. Goebel R. Lin G. Nucleotide composition string selection in HIV-1 subtyping using whole genomes. Bioinformatics 2007 23 14 1744 1752 10.1093/bioinformatics/btm248 17495995
[Google Scholar]
He P.A. Xia L. Oligonucleotide profiling for discriminating bacteria in bacterial communities. Comb. Chem. High Throughput Screen. 2007 10 4 247 255 10.2174/138620707780636646 17506707
[Google Scholar]
Dai Q. Wang T. Comparison study on k-word statistical measures for protein: From sequence to ‘sequence space’. BMC Bioinform. 2008 9 1 394 10.1186/1471‑2105‑9‑394 18811946
[Google Scholar]
Zhang Y. Wang X. Kang L. A k -mer scheme to predict piRNAs and characterize locust piRNAs. Bioinformatics 2011 27 6 771 776 10.1093/bioinformatics/btr016 21224287
[Google Scholar]
Yu X. Zheng X. Meng L. Li C. Wang J. A support vector machine based method to predict success for polymerase chain reactions. Comb. Chem. High Throughput Screen. 2012 15 6 486 491 10.2174/138620712800563936 22292777
[Google Scholar]
Ding S. Li Y. Yang X. Wang T. A simple k-word interval method for phylogenetic analysis of DNA sequences. J. Theor. Biol. 2013 317 192 199 10.1016/j.jtbi.2012.10.010 23085256
[Google Scholar]
Yang X. Wang T. A novel statistical measure for sequence comparison on the basis of k-word counts. J. Theor. Biol. 2013 318 91 100 10.1016/j.jtbi.2012.10.035 23147229
[Google Scholar]
Li C. Yang Y. Jia M. Zhang Y. Yu X. Wang C. Phylogenetic analysis of DNA sequences based on -word and rough set theory. Physica A 2014 398 162 171 10.1016/j.physa.2013.12.025
[Google Scholar]
Allesøe R.L. Lemvigh C.K. Phan M.V.T. Clausen P.T.L.C. Florensa A.F. Koopmans M.P.G. Lund O. Cotten M. Automated download and clean-up of family-specific databases for kmer-based virus identification. Bioinformatics 2021 37 5 705 710 10.1093/bioinformatics/btaa857 33031509
[Google Scholar]
Stephens Z. Ferrer A. Boardman L. Iyer R.K. Kocher J.P.A. Telogator: A method for reporting chromosome-specific telomere lengths from long reads. Bioinformatics 2022 38 7 1788 1793 10.1093/bioinformatics/btac005 35022670
[Google Scholar]
Friedel M. Nikolajewa S. Sühnel J. Wilhelm T. DiProDB: A database for dinucleotide properties. Nucleic Acids Res. 2009 37 Database issue Suppl. 1 D37 D40 10.1093/nar/gkn597 18805906
[Google Scholar]
Zhang W.Y. Xu J. Wang J. Zhou Y.K. Chen W. Du P.F. KNIndex: A comprehensive database of physicochemical properties for k -tuple nucleotides. Brief. Bioinform. 2021 22 4 bbaa284 10.1093/bib/bbaa284 33147622
[Google Scholar]
Li C. Yang Y. Fei W. He P. Yu X. Zhang D. Yi S. Li X. Zhu J. Wang C. Wang Z. Prediction of success for polymerase chain reactions using the Markov maximal order model and support vector machine. J. Theor. Biol. 2015 369 51 58 10.1016/j.jtbi.2015.01.017 25636491
[Google Scholar]
He P. Wang J. Characteristic sequences for DNA primary sequence. J. Chem. Inf. Comput. Sci. 2002 42 5 1080 1085 10.1021/ci010131z 12376994
[Google Scholar]
Zhang J.D. Xue C. Kolachalama V.B. Donald W.A. Interpretable machine learning on metabolomics data reveals biomarkers for parkinson’s disease. ACS Cent. Sci. 2023 9 5 1035 1045 10.1021/acscentsci.2c01468 37252351
[Google Scholar]
Brito-Rocha T. Constâncio V. Henrique R. Jerónimo C. Shifting the cancer screening paradigm: The rising potential of blood-based multi-cancer early detection tests. Cells 2023 12 6 935 10.3390/cells12060935 36980276
[Google Scholar]
Javeed A. Dallora A.L. Berglund J.S. Idrisoglu A. Ali L. Rauf H.T. Anderberg P. Early prediction of dementia using feature extraction battery (FEB) and optimized support vector machine (SVM) for classification. Biomedicines 2023 11 2 439 10.3390/biomedicines11020439 36830975
[Google Scholar]
Joshi V.R. Srinivasan K. Vincent P.M.D.R. Rajinikanth V. Chang C.Y. A multistage heterogeneous stacking ensemble model for augmented infant cry classification. Front. Public Health 2022 10 819865 10.3389/fpubh.2022.819865 35400062
[Google Scholar]
Ranyal E. Sadhu A. Jain K. Road condition monitoring using smart sensing and artificial intelligence: A review. Sensors 2022 22 8 3044 10.3390/s22083044 35459034
[Google Scholar]
Vapnik V. Statistical Learning Theory. Wiley New York, NY. 1998
[Google Scholar]
Steinwart I. Christmann A. Support Vector Machines Springer-Verlag New York, NY 2008
[Google Scholar]
Chang C.C. Lin C.J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2007 2 3 1 27 10.1145/1961189.1961199
[Google Scholar]
Bai Y. Li Y. Shen Y. Yang M. Zhang W. Cui B. Auto DC: An automatic machine learning framework for disease classification. Bioinformatics 2022 38 13 3415 3421 10.1093/bioinformatics/btac334 35583303
[Google Scholar]
Mardia K.V. Kent J.T. Bibby J.M. Multivariate Analysis. Academic Press London 1979
[Google Scholar]
Chou K.C. Zhang C.T. Prediction of protein structural classes. Crit. Rev. Biochem. Mol. Biol. 1995 30 4 275 349 10.3109/10409239509083488 7587280
[Google Scholar]
Feng Z.P. Prediction of the subcellular location of prokaryotic proteins based on a new representation of the amino acid composition. Biopolymers 2001 58 5 491 499 10.1002/1097‑0282(20010415)58:5<491::AID‑BIP1024>3.0.CO;2‑I 11241220
[Google Scholar]
Yu X. Gao H. Zheng X. Li C. Wang J. A computational method of predicting regulatory interactions in Arabidopsis based on gene expression data and sequence information. Comput. Biol. Chem. 2014 51 36 41 10.1016/j.compbiolchem.2014.04.003 24861532
[Google Scholar]
Zhao H. Li Y. Wang J. A convolutional neural network and graph convolutional network-based method for predicting the classification of anatomical therapeutic chemicals. Bioinformatics 2021 37 18 2841 2847 10.1093/bioinformatics/btab204 33769479
[Google Scholar]
Li X. Zhang S. Shi H. An improved residual network using deep fusion for identifying RNA 5-methylcytosine sites. Bioinformatics 2022 38 18 4271 4277 10.1093/bioinformatics/btac532 35866985
[Google Scholar]
Shao Y.T. Chou K.C. pLoc_Deep-mAnimal: A novel deep CNN-BLSTM network to predict subcellular localization of animal proteins. Nat. Sci. (Irvine Calif.) 2020 12 5 281 291 10.4236/ns.2020.125024
[Google Scholar]
Harini K. Sekijima M. Gromiha M.M. PRA-Pred: Structure-based prediction of protein-RNA binding affinity. Int. J. Biol. Macromol. 2024 259 Pt 2 129490 10.1016/j.ijbiomac.2024.129490 38224813
[Google Scholar]
Ighalo J. Kirby E.D. Song X. Fickling S.D. Pawlowski G. Hajra S.G. Liu C.C. Menon C. Shah S.A. Knoefel F. D’Arcy R.C.N. Brain vital signs as a quantitative measure of cognition: Methodological implementation in a care home environment. Heliyon 2024 10 7 e28982 10.1016/j.heliyon.2024.e28982 38576563
[Google Scholar]
Paraseth P. Banerjee K. Goat weed (Ageratum conyzoides L.): A biological threat to plant diversity in Eastern Ghats of India. J. Biosci. 2024 49 3 72 10.1007/s12038‑024‑00455‑6 39046033
[Google Scholar]

/content/journals/cchts/10.2174/0113862073351071250102100221

Predicting Polymerase Chain Reaction Success: Integrating the K-Word Order Model, Physicochemical Properties Modeling of Double Bases, and Support Vector Machine

Bentham Science Publishers ; https://doi.org/10.2174/0113862073351071250102100221

/content/journals/cchts/10.2174/0113862073351071250102100221

Data & Media loading...

Article Type: Research Article

Keywords: double bases ; support vector machine ; DNA template ; jackknife cross-validation ; K-word ; polymerase chain reactions

Predicting Polymerase Chain Reaction Success: Integrating the K-Word Order Model, Physicochemical Properties Modeling of Double Bases, and Support Vector Machine

Abstract

From This Site

Most Read This Month

Most Cited Most Cited RSS feed

Privileged Structures: Applications in Drug Discovery

Computational Methods in Developing Quantitative Structure-Activity Relationships (QSAR): A Review

Recent Advances on Potentiometric Membrane Sensors for Pharmaceutical Analysis

Label-Free Detection of Biomolecular Interactions Using BioLayer Interferometry for Kinetic Characterization

Metalloproteinase Inhibitors for the Disintegrin-Like Metalloproteinases ADAM10 and ADAM17 that Differentially Block Constitutive and Phorbol Ester-Inducible Shedding of Cell Surface Molecules

On Various Metrics Used for Validation of Predictive QSAR Models with Applications in Virtual Screening and Focused Library Design

Diversity Among Microbial Cyclic Lipopeptides: Iturins and Surfactins. Activity-Structure Relationships to Design New Bioactive Agents

Building a Tiered Approach to In Vitro Predictive Toxicity Screening: A Focus on Assays with In Vivo Relevance

Antioxidants and Inflammatory Disease: Synthetic and Natural Antioxidants with Anti-Inflammatory Activity

Machine Learning in Virtual Screening