Skip to content
2000
Volume 19, Issue 9
  • ISSN: 1872-2121
  • E-ISSN: 2212-4047

Abstract

Introduction

In order to assess how well conventional and hybrid pitch detection techniques perform in speech processing applications, a comparative analysis of the two types of methods is conducted.

Methods

A proposed hybrid approach, Proposed PEF+CEP, is examined alongside five traditional algorithms, namely Normalized Correlation Function (NCF), Pitch Estimation Filter (PEF), Log-Harmonic Summation (LHS), Summation of Residual Harmonics (SRH) and Cepstrum Pitch Determination (CEP). The effectiveness is evaluated using performance metrics like accuracy, specificity, sensitivity, and Gross Pitch Error (GPE).

Results and Discussion

Our findings show that the accuracy and specificity of the traditional methods are impressive; the accuracy and sensitivity of the suggested hybrid method surpass their performance, with an astounding 98.8% accuracy and 99.2% sensitivity.

Conclusion

Furthermore, the Proposed PEF+CEP method is a promising solution for accurate and dependable pitch detection in speech processing applications because it strikes a strong balance between computational efficiency and robustness. These results open up new avenues for research in the field of speech processing and demonstrate the potential of hybrid approaches.

Loading

Article metrics loading...

/content/journals/eng/10.2174/0118722121312618240612093010
2024-06-26
2025-11-29
Loading full text...

Full text loading...

References

  1. KimJ.W. SalamonJ. LiP. BelloJ.P. Crepe: A convolutional representation for pitch estimation.2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), April 19, 2018, pp. 161-165.10.1109/ICASSP.2018.8461329
    [Google Scholar]
  2. GfellerB. FrankC. RoblekD. SharifiM. TagliasacchiM. VelimirovićM. SPICE: Self-supervised pitch estimation.IEEE/ACM Trans. Audio Speech Lang. Process.2020281118112810.1109/TASLP.2020.2982285
    [Google Scholar]
  3. ChristensenM. JakobssonA. Multi-pitch estimation.Springer Nature2022
    [Google Scholar]
  4. RiouA. LattnerS. HadjeresG. PeetersG. Pesto: Pitch estimation with self-supervised transposition-equivariant objective.International Society for Music Information Retrieval Conference (ISMIR 2023), Milan, Italy, Nov 2023, pp. 1-10.
    [Google Scholar]
  5. ZhangW. WangR. ZhangQ. FangS. A joint pitch estimation and voicing detection method for melody extraction.Appl. Acoust.202016610733810.1016/j.apacoust.2020.107338
    [Google Scholar]
  6. IllnerV. SovkaP. RuszJ. Validation of freely-available pitch detection algorithms across various noise levels in assessing speech captured by smartphone in Parkinson’s disease.Biomed. Signal Process. Control20205810183110.1016/j.bspc.2019.101831
    [Google Scholar]
  7. SaddlerM.R. GonzalezR. McDermottJ.H. Deep neural network models reveal interplay of peripheral coding and stimulus statistics in pitch perception.Nat. Commun.2021121727810.1038/s41467‑021‑27366‑634907158
    [Google Scholar]
  8. CuestaH. McFeeB. GómezE. Multiple f0 estimation in vocal ensembles using convolutional neural networks.arXiv preprint 2009.04172,2020
    [Google Scholar]
  9. EngelJ. SwavelyR. HantrakulL.H. RobertsA. HawthorneC. Self-supervised pitch detection by inverse audio synthesis.ICML 2020 Workshop on Self-supervision in Audio and Speech, 02 Jul 2020.
    [Google Scholar]
  10. CarrA.N. BerthetQ. BlondelM. TeboulO. ZeghidourN. Self-supervised learning of audio representations from permutations with differentiable ranking.IEEE Signal Process. Lett.20212870871210.1109/LSP.2021.3067635
    [Google Scholar]
  11. HelmholtzH.L. On the Sensations of Tone as a Physiological Basis for the Theory of Music.Cambridge University Press200910.1017/CBO9780511701801
    [Google Scholar]
  12. FlorioA. AvitabileG. A study on the impact of the array pitch on the angle of arrival estimation through an interferometric approach.2020 7th International Conference on Information Science and Control Engineering (ICISCE), Changsha, China, 18-20 December 2020, pp. 155-158.10.1109/ICISCE50968.2020.00042
    [Google Scholar]
  13. GokiS.H. GhazviniM. HamzenejadiS. A wavelet transform based scheme to extract speech pitch and formant frequencies.arXiv preprint 2209.00733, 2022
    [Google Scholar]
  14. HanK. WangD. Neural network based pitch tracking in very noisy speech.IEEE/ACM Trans. Audio Speech Lang. Process.201422122158216810.1109/TASLP.2014.2363410
    [Google Scholar]
  15. NaoumiS. BazziA. BomfinR. ChafiiM. Complex neural network based joint AoA and AoD estimation for bistatic ISAC.IEEE J. Sel. Top. Signal Process.202411510.1109/JSTSP.2024.3387299
    [Google Scholar]
  16. NaoumiS. BazziA. BomfinR. ChafiiM. Deep learningenabled angle estimation in bistatic ISAC systems.IEEE Globecom Workshops.Kuala Lumpur, MalaysiaGC Wkshps202385485910.1109/GCWkshps58843.2023.10464930
    [Google Scholar]
  17. DelamouM. BazziA. ChafiiM. AmhoudE.M. Deep learning-based estimation for multitarget radar detection.2023 IEEE 97th Vehicular Technology Conference (VTC2023-Spring), Florence, Italy, 20-23 June 2023, pp. 1-5.10.1109/VTC2023‑Spring57618.2023.10200157
    [Google Scholar]
  18. NjimaW. BazziA. ChafiiM. DNN-based indoor localization under limited dataset using GANs and semi-supervised learning.IEEE Access202210698966990910.1109/ACCESS.2022.3187837
    [Google Scholar]
  19. ChowdaryA. BazziA. ChafiiM. On hybrid radar fusion for integrated sensing and communication.IEEE Trans. Wirel. Commun.2024110.1109/TWC.2024.3357573
    [Google Scholar]
  20. Rajesh ImmanuelR. SangeethaS.K.B. Decoding emotions using deep learning approach to EEG-based emotion recognition.Intelligent Computing and Control for Engineering and Business Systems.Chennai, IndiaICCEBS20231610.1109/ICCEBS58601.2023.10449107
    [Google Scholar]
  21. ImmanuelR.R. SangeethaS.K.B. Analysis of EEG signal with feature and feature extraction techniques for emotion recognition using deep learning techniques.International Conference on Computational Intelligence and Data Engineering Singapore: Springer Nature Singapore, 2022, pp. 141-154.
    [Google Scholar]
  22. ImmanuelR.R. SangeethaS.K.B. Implementation of an automatic EEG feature extraction with gated recurrent neural network for emotion recognition.Computer Vision and Machine Intelligence Paradigms for SDGs: Select Proceedings of ICRTAC-CVMIP 2021 Singapore: Springer Nature Singapore, 2023, pp. 133-150.10.1007/978‑981‑19‑7169‑3_13
    [Google Scholar]
  23. BashaH.A. SangeethaS.K.B. SasikumarS. ArunnehruJ. SubramaniamM. A proficient video recommendation framework using hybrid fuzzy C means clustering and Kullback-Leibler divergence algorithms.Multimedia Tools Appl.20238214209892100410.1007/s11042‑023‑14460‑8
    [Google Scholar]
  24. JeejoeA. HarishivV. VenkateshP. SangeethaS.K.B. Building a recommender system using collaborative filtering algorithms and analyzing its performance.Adv. Sci. Technol.202312447848510.4028/p‑1h18ig
    [Google Scholar]
  25. ImmanuelR. R. SangeethaS. K. B. Advancing emotion recognition via EEG signals using a deep learning approach with ensemble modelJ. Intell. Fuzzy Syst.202411210.3233/JIFS‑237884
    [Google Scholar]
  26. ChristensenM. JakobssonA. Multi-pitch estimation.Springer Nature2022
    [Google Scholar]
  27. BhardwajV. KukrejaV. Effect of pitch enhancement in Punjabi children’s speech recognition system under disparate acoustic conditions.Appl. Acoust.202117710791810.1016/j.apacoust.2021.107918
    [Google Scholar]
  28. SinghS. WangR. QiuY. DeepF0: End-to-end fundamental frequency estimation for music and speech signals.ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 06-11 June 2021, pp. 61-65.
    [Google Scholar]
  29. ValinJ.M. IsikU. PhansalkarN. GiriR. HelwaniK. KrishnaswamyA. A perceptually-motivated approach for low-complexity, real-time enhancement of fullband speech.arXiv preprint 2008.04259,202010.21437/Interspeech.2020‑2730
    [Google Scholar]
  30. GiriR. VenkataramaniS. ValinJ.M. IsikU. KrishnaswamyA. Personalized percepnet: Real-time, low-complexity target voice separation and enhancement.arXiv preprint 2106.04129,202110.21437/Interspeech.2021‑694
    [Google Scholar]
  31. ValinJ.M. TennetiS. HelwaniK. IsikU. KrishnaswamyA. Low-complexity, real-time joint neural echo control and speech enhancement based on percepnet.ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 06-11 June 2021, pp. 7133-7137.10.1109/ICASSP39728.2021.9414140
    [Google Scholar]
  32. ShahnawazuddinS. KumarA. KumarV. KumarS. AhmadW. Robust children’s speech recognition in zero resource condition.Appl. Acoust.202218510838210.1016/j.apacoust.2021.108382
    [Google Scholar]
  33. ThimmarajaY.G. NagarajaB.G. JayannaH.S. Speech enhancement and encoding by combining SS-VAD and LPC.Int. J. Speech Technol.202124116517210.1007/s10772‑020‑09786‑9
    [Google Scholar]
  34. BogachN. BoitsovaE. ChernonogS. LamtevA. LesnichayaM. LezheninI. NovopashennyA. SvechnikovR. TsikachD. VasilievK. PyshkinE. BlakeJ. Speech processing for language learning: A practical approach to computer-assisted pronunciation teaching.Electronics202110323510.3390/electronics10030235
    [Google Scholar]
  35. QiJ. HuH. WangY. YangC.H.H. SiniscalchiS.M. LeeC.H. Tensor-to-vector regression for multi-channel speech enhancement based on tensor-train network.ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 04-08 May 2020, pp. 7504-7508.10.1109/ICASSP40776.2020.9052938
    [Google Scholar]
  36. HayesB. ShierJ. FazekasG. McPhersonA. SaitisC. A review of differentiable digital signal processing for music and speech synthesis.Front. Signal Process20243128410010.3389/frsip.2023.1284100
    [Google Scholar]
  37. KaurJ. SinghA. KadyanV. Automatic speech recognition system for tonal languages: State-of-the-art survey.Arch. Comput. Methods Eng.20212831039106810.1007/s11831‑020‑09414‑4
    [Google Scholar]
  38. GaoQ. XiangJ. HouS. TangH. ZhongY. YeS. Method using L-kurtosis and enhanced clustering-based segmentation to detect faults in axial piston pumps.Mech. Syst. Signal Process.202114710713010.1016/j.ymssp.2020.107130
    [Google Scholar]
  39. SchroterH. Escalante-BA.N. RosenkranzT. MaierA. Deepfilternet: A low complexity speech enhancement framework for full-band audio based on deep filtering. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, 23-27 May 2022, pp. 7407-7411.10.1109/ICASSP43922.2022.9747055
    [Google Scholar]
  40. HemaC. Garcia MarquezF.P. Emotional speech Recognition using CNN and Deep learning techniques.Appl. Acoust.202321110949210.1016/j.apacoust.2023.109492
    [Google Scholar]
  41. Vanden Bosch der NederlandenC.M. JoanisseM.F. GrahnJ.A. Music as a scaffold for listening to speech: Better neural phase-locking to song than speech.Neuroimage202021411676710.1016/j.neuroimage.2020.11676732217165
    [Google Scholar]
  42. WaliA. AlamgirZ. KarimS. FawazA. AliM.B. AdanM. MujtabaM. Generative adversarial networks for speech processing: A review.Comput. Speech Lang.20227210130810.1016/j.csl.2021.101308
    [Google Scholar]
  43. ZhouX. ZhangM. ZhouY. WuZ. LiH. Accented text-to-speech synthesis with limited data.IEEE/ACM Trans. Audio Speech Lang. Process.2024321699171110.1109/TASLP.2024.3363414
    [Google Scholar]
  44. ZhangC. ZhangC. ZhengS. ZhangM. QamarM. BaeS.H. KweonI.S. A survey on audio diffusion models: Text to speech synthesis and enhancement in generative ai.arXiv preprint 2303.133362023
    [Google Scholar]
  45. EniM. DinsteinI. IlanM. MenasheI. MeiriG. ZigelY. Estimating autism severity in young children from speech signals using a deep neural network.IEEE Access2020813948913950010.1109/ACCESS.2020.3012532
    [Google Scholar]
  46. OzakiY. SatoS. McBrideJ.M. PfordresherP.Q. TierneyA.T. SixJ. SavageP.E. Automatic acoustic analyses quantify pitch discreteness within and between human music, speech, and bird song.Proc. 10th Int. Folk Music Anal. Work202239
    [Google Scholar]
  47. RenY. LeiM. HuangZ. ZhangS. ChenQ. YanZ. ZhaoZ. Prosospeech: Enhancing prosody with quantized vector pre-training in text-to-speech.ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, 23-27 May 2022, pp. 7577-7581.
    [Google Scholar]
  48. KrumpholzC. QuigleyC. AmeenK. ReuterC. FusaniL. LederH. The effects of pitch manipulation on male ratings of female speakers and their voices.Front. Psychol.20221391185410.3389/fpsyg.2022.91185435874336
    [Google Scholar]
  49. AlharbiS. AlrazganM. AlrashedA. AlnomasiT. AlmojelR. AlharbiR. AlharbiS. AlturkiS. AlshehriF. AlmojilM. Automatic speech recognition: Systematic literature review.IEEE Access2021913185813187610.1109/ACCESS.2021.3112535
    [Google Scholar]
  50. TsaiH.S. ChangH.J. HuangW.C. HuangZ. LakhotiaK. YangS.W. SUPERB-SG: Enhanced speech processing universal PERformance benchmark for semantic and generative capabilities.arXiv preprint 2203.06849202210.18653/v1/2022.acl‑long.580
    [Google Scholar]
  51. XiaY. BraunS. ReddyC.K. DubeyH. CutlerR. TashevI. Weighted speech distortion losses for neural-network-based real-time speech enhancement.ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 04-08 May 2020, pp. 871-875.10.1109/ICASSP40776.2020.9054254
    [Google Scholar]
  52. PattanayakB. PradhanG. Pitch-robust acoustic feature using single frequency filtering for children’s KWS.Pattern Recognit. Lett.202115018318810.1016/j.patrec.2021.07.015
    [Google Scholar]
  53. GomathyM. Optimal feature selection for speech emotion recognition using enhanced cat swarm optimization algorithm.Int. J. Speech Technol.202124115516310.1007/s10772‑020‑09776‑x
    [Google Scholar]
  54. TorresB. PeetersG. RichardG. Unsupervised harmonic parameter estimation using differentiable DSP and spectral optimal transport.ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, Republic of, 14-19 April 2024, pp. 1176-1180.10.1109/ICASSP48485.2024.10447011
    [Google Scholar]
  55. DasN. ChakrabortyS. ChakiJ. PadhyN. DeyN. Fundamentals, present and future perspectives of speech enhancement.Int. J. Speech Technol.202124488390110.1007/s10772‑020‑09674‑2
    [Google Scholar]
  56. MehrishA. MajumderN. BharadwajR. MihalceaR. PoriaS. A review of deep learning techniques for speech processing.Inf. Fusion20239910186910.1016/j.inffus.2023.101869
    [Google Scholar]
  57. AbdusalomovA.B. SafarovF. RakhimovM. TuraevB. WhangboT.K. Improved feature parameter extraction from speech signals using machine learning algorithm.Sensors (Basel)20222221812210.3390/s2221812236365819
    [Google Scholar]
  58. SerràJ. PascualS. PonsJ. ArazR.O. ScainiD. Universal speech enhancement with score-based diffusion.arXiv preprint 2206.03065,2022
    [Google Scholar]
  59. MorrisonM. KumarR. KumarK. SeetharamanP. CourvilleA. BengioY. Chunked autoregressive gan for conditional waveform synthesis.arXiv preprint 2110.10139,2021
    [Google Scholar]
  60. HosodaY. KawamuraA. IiguniY. Complex-domain pitch estimation algorithm for narrowband speech signals.IEEE/ACM Trans. Audio Speech Lang. Process.2023312067207810.1109/TASLP.2023.3278488
    [Google Scholar]
  61. LiB. ZhangX. A pitch estimation algorithm for speech in complex noise environments based on the radon transform.IEEE Access2023119876988910.1109/ACCESS.2023.3240181
    [Google Scholar]
  62. TamerN.C. ÖzerY. MüllerM. SerraX. TAPE: An end-to-end timbre-aware pitch estimator.ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023, pp. 1-5.10.1109/ICASSP49357.2023.10096762
    [Google Scholar]
  63. BarysenkaS.Y. VorobiovV.I. SNR-based inter-component phase estimation using bi-phase prior statistics for single-channel speech enhancement.IEEE/ACM Trans. Audio Speech Lang. Process.2023312365238110.1109/TASLP.2023.3284514
    [Google Scholar]
  64. ShirahataY. YamamotoR. SongE. TerashimaR. KimJ.M. TachibanaK. Period VITS: Variational inference with explicit pitch modeling for end-to-end emotional speech synthesis.ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 04-10 June 2023, pp. 1-5.10.1109/ICASSP49357.2023.10096480
    [Google Scholar]
  65. WangT. ZhuW. GaoY. ZhangS. FengJ. Harmonic attention for monaural speech enhancement.IEEE/ACM Trans. Audio Speech Lang. Process.2023312424243610.1109/TASLP.2023.3284522
    [Google Scholar]
  66. JoladB. KhanaiR. An approach for speech enhancement with dysarthric speech recognition using optimization based machine learning frameworks.Int. J. Speech Technol.202326228730510.1007/s10772‑023‑10019‑y
    [Google Scholar]
  67. ChenZ. ShiT. SongP. LiC. CaoY. YanY. Improved pitch control strategy for the robust operation of wind energy conversion system in the high wind speed condition.Int. J. Electr. Power Energy Syst.202315310938110.1016/j.ijepes.2023.109381
    [Google Scholar]
  68. LiX. YanY. SoraghanJ. WangZ. RenJ. A music cognition–guided framework for multi-pitch estimation.Cognit. Comput.2023151233510.1007/s12559‑022‑10031‑5
    [Google Scholar]
  69. OzturkM.Z. WuC. WangB. WuM. LiuK.J.R. RadioSES: MmWave-based audioradio speech enhancement and separation system.IEEE/ACM Trans. Audio Speech Lang. Process.2023311333134710.1109/TASLP.2023.3250846
    [Google Scholar]
  70. LatifS. ZaidiA. CuayahuitlH. ShamshadF. ShoukatM. QadirJ. Transformers in speech processing: A survey.arXiv preprint 2303.11607,2023
    [Google Scholar]
  71. XuY. CaiZ. KongX. Improved pitch shifting data augmentation for ship radiated noise classification.2023Available from: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4347042
  72. KrauseM. WeißC. MüllerM. Soft dynamic time warping for multi-pitch estimation and beyond.ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 04-10 June 2023, pp. 1-5.10.1109/ICASSP49357.2023.10095907
    [Google Scholar]
  73. VijayK. KrithigaP. KavirakeshS. Pitch extraction and notes generation implementation using tensor flow.2023 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 23-25 January 2023, pp. 1-8.10.1109/ICCCI56745.2023.10128544
    [Google Scholar]
  74. ChenW. XingX. XuX. PangJ. DuL. SpeechFormer++: A hierarchical efficient framework for paralinguistic speech processing.IEEE/ACM Trans. Audio Speech Lang. Process.20233177578810.1109/TASLP.2023.3235194
    [Google Scholar]
/content/journals/eng/10.2174/0118722121312618240612093010
Loading
/content/journals/eng/10.2174/0118722121312618240612093010
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test