Exploring Hybrid Techniques for Enhanced Pitch Estimation in Speech Processing

S.K.B. Sangeetha; Chandran K.; Sandeep Kumar Mathivanan; Hariharan Rajadurai; Basu Dev Shivahare

doi:10.2174/0118722121312618240612093010

ISSN: 1872-2121
E-ISSN: 2212-4047

Exploring Hybrid Techniques for Enhanced Pitch Estimation in Speech Processing
Authors: S.K.B. Sangeetha¹, Chandran K.², Sandeep Kumar Mathivanan³, Hariharan Rajadurai⁴ and Basu Dev Shivahare³
View Affiliations Hide Affiliations

¹ Department of Computer Science and Engineering, SRM Institute of Science and Technology, Vadapalani Campus, Chennai, India ; ² Department of Humanities, Madanapalle Institute of Technology and Science, Madanapalle, Andhra Pradesh, India; ³ School of Computer Science and Engineering, Galgotias University, Greater Noida 203201, India; ⁴ School of Computing Science and Engineering, Vellore Institute of Technology VIT, Bhopal University, Bhopal-Indore Highway Kothrikalan, Sehore, 466114, India
Source: Recent Patents on Engineering, Volume 19, Issue 9, Dec 2025, E020724231322
DOI: https://doi.org/10.2174/0118722121312618240612093010
- Received: 28 Mar 2024
- Accepted: 24 May 2024
- Available online: 26 Jun 2024

Abstract

Introduction

In order to assess how well conventional and hybrid pitch detection techniques perform in speech processing applications, a comparative analysis of the two types of methods is conducted.

Methods

A proposed hybrid approach, Proposed PEF+CEP, is examined alongside five traditional algorithms, namely Normalized Correlation Function (NCF), Pitch Estimation Filter (PEF), Log-Harmonic Summation (LHS), Summation of Residual Harmonics (SRH) and Cepstrum Pitch Determination (CEP). The effectiveness is evaluated using performance metrics like accuracy, specificity, sensitivity, and Gross Pitch Error (GPE).

Results and Discussion

Our findings show that the accuracy and specificity of the traditional methods are impressive; the accuracy and sensitivity of the suggested hybrid method surpass their performance, with an astounding 98.8% accuracy and 99.2% sensitivity.

Conclusion

Furthermore, the Proposed PEF+CEP method is a promising solution for accurate and dependable pitch detection in speech processing applications because it strikes a strong balance between computational efficiency and robustness. These results open up new avenues for research in the field of speech processing and demonstrate the potential of hybrid approaches.

Article metrics loading...

/content/journals/eng/10.2174/0118722121312618240612093010

2024-06-26

2026-02-22

From This Site

/content/journals/eng/10.2174/0118722121312618240612093010

dcterms_title,dcterms_subject,pub_keyword

-contentType:Contributor -contentType:Concept -contentType:Institution

10

5

Full text loading...

References

KimJ.W. SalamonJ. LiP. BelloJ.P. Crepe: A convolutional representation for pitch estimation.2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), April 19, 2018, pp. 161-165.10.1109/ICASSP.2018.8461329
[Google Scholar]
GfellerB. FrankC. RoblekD. SharifiM. TagliasacchiM. VelimirovićM. SPICE: Self-supervised pitch estimation.IEEE/ACM Trans. Audio Speech Lang. Process.2020281118112810.1109/TASLP.2020.2982285
[Google Scholar]
ChristensenM. JakobssonA. Multi-pitch estimation.Springer Nature2022
[Google Scholar]
RiouA. LattnerS. HadjeresG. PeetersG. Pesto: Pitch estimation with self-supervised transposition-equivariant objective.International Society for Music Information Retrieval Conference (ISMIR 2023), Milan, Italy, Nov 2023, pp. 1-10.
[Google Scholar]
ZhangW. WangR. ZhangQ. FangS. A joint pitch estimation and voicing detection method for melody extraction.Appl. Acoust.202016610733810.1016/j.apacoust.2020.107338
[Google Scholar]
IllnerV. SovkaP. RuszJ. Validation of freely-available pitch detection algorithms across various noise levels in assessing speech captured by smartphone in Parkinson’s disease.Biomed. Signal Process. Control20205810183110.1016/j.bspc.2019.101831
[Google Scholar]
SaddlerM.R. GonzalezR. McDermottJ.H. Deep neural network models reveal interplay of peripheral coding and stimulus statistics in pitch perception.Nat. Commun.2021121727810.1038/s41467‑021‑27366‑634907158
[Google Scholar]
CuestaH. McFeeB. GómezE. Multiple f0 estimation in vocal ensembles using convolutional neural networks.arXiv preprint 2009.04172,2020
[Google Scholar]
EngelJ. SwavelyR. HantrakulL.H. RobertsA. HawthorneC. Self-supervised pitch detection by inverse audio synthesis.ICML 2020 Workshop on Self-supervision in Audio and Speech, 02 Jul 2020.
[Google Scholar]
CarrA.N. BerthetQ. BlondelM. TeboulO. ZeghidourN. Self-supervised learning of audio representations from permutations with differentiable ranking.IEEE Signal Process. Lett.20212870871210.1109/LSP.2021.3067635
[Google Scholar]
HelmholtzH.L. On the Sensations of Tone as a Physiological Basis for the Theory of Music.Cambridge University Press200910.1017/CBO9780511701801
[Google Scholar]
FlorioA. AvitabileG. A study on the impact of the array pitch on the angle of arrival estimation through an interferometric approach.2020 7th International Conference on Information Science and Control Engineering (ICISCE), Changsha, China, 18-20 December 2020, pp. 155-158.10.1109/ICISCE50968.2020.00042
[Google Scholar]
GokiS.H. GhazviniM. HamzenejadiS. A wavelet transform based scheme to extract speech pitch and formant frequencies.arXiv preprint 2209.00733, 2022
[Google Scholar]
HanK. WangD. Neural network based pitch tracking in very noisy speech.IEEE/ACM Trans. Audio Speech Lang. Process.201422122158216810.1109/TASLP.2014.2363410
[Google Scholar]
NaoumiS. BazziA. BomfinR. ChafiiM. Complex neural network based joint AoA and AoD estimation for bistatic ISAC.IEEE J. Sel. Top. Signal Process.202411510.1109/JSTSP.2024.3387299
[Google Scholar]
NaoumiS. BazziA. BomfinR. ChafiiM. Deep learningenabled angle estimation in bistatic ISAC systems.IEEE Globecom Workshops.Kuala Lumpur, MalaysiaGC Wkshps202385485910.1109/GCWkshps58843.2023.10464930
[Google Scholar]
DelamouM. BazziA. ChafiiM. AmhoudE.M. Deep learning-based estimation for multitarget radar detection.2023 IEEE 97th Vehicular Technology Conference (VTC2023-Spring), Florence, Italy, 20-23 June 2023, pp. 1-5.10.1109/VTC2023‑Spring57618.2023.10200157
[Google Scholar]
NjimaW. BazziA. ChafiiM. DNN-based indoor localization under limited dataset using GANs and semi-supervised learning.IEEE Access202210698966990910.1109/ACCESS.2022.3187837
[Google Scholar]
ChowdaryA. BazziA. ChafiiM. On hybrid radar fusion for integrated sensing and communication.IEEE Trans. Wirel. Commun.2024110.1109/TWC.2024.3357573
[Google Scholar]
Rajesh ImmanuelR. SangeethaS.K.B. Decoding emotions using deep learning approach to EEG-based emotion recognition.Intelligent Computing and Control for Engineering and Business Systems.Chennai, IndiaICCEBS20231610.1109/ICCEBS58601.2023.10449107
[Google Scholar]
ImmanuelR.R. SangeethaS.K.B. Analysis of EEG signal with feature and feature extraction techniques for emotion recognition using deep learning techniques.International Conference on Computational Intelligence and Data Engineering Singapore: Springer Nature Singapore, 2022, pp. 141-154.
[Google Scholar]
ImmanuelR.R. SangeethaS.K.B. Implementation of an automatic EEG feature extraction with gated recurrent neural network for emotion recognition.Computer Vision and Machine Intelligence Paradigms for SDGs: Select Proceedings of ICRTAC-CVMIP 2021 Singapore: Springer Nature Singapore, 2023, pp. 133-150.10.1007/978‑981‑19‑7169‑3_13
[Google Scholar]
BashaH.A. SangeethaS.K.B. SasikumarS. ArunnehruJ. SubramaniamM. A proficient video recommendation framework using hybrid fuzzy C means clustering and Kullback-Leibler divergence algorithms.Multimedia Tools Appl.20238214209892100410.1007/s11042‑023‑14460‑8
[Google Scholar]
JeejoeA. HarishivV. VenkateshP. SangeethaS.K.B. Building a recommender system using collaborative filtering algorithms and analyzing its performance.Adv. Sci. Technol.202312447848510.4028/p‑1h18ig
[Google Scholar]
ImmanuelR. R. SangeethaS. K. B. Advancing emotion recognition via EEG signals using a deep learning approach with ensemble modelJ. Intell. Fuzzy Syst.202411210.3233/JIFS‑237884
[Google Scholar]
ChristensenM. JakobssonA. Multi-pitch estimation.Springer Nature2022
[Google Scholar]
BhardwajV. KukrejaV. Effect of pitch enhancement in Punjabi children’s speech recognition system under disparate acoustic conditions.Appl. Acoust.202117710791810.1016/j.apacoust.2021.107918
[Google Scholar]
SinghS. WangR. QiuY. DeepF0: End-to-end fundamental frequency estimation for music and speech signals.ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 06-11 June 2021, pp. 61-65.
[Google Scholar]
ValinJ.M. IsikU. PhansalkarN. GiriR. HelwaniK. KrishnaswamyA. A perceptually-motivated approach for low-complexity, real-time enhancement of fullband speech.arXiv preprint 2008.04259,202010.21437/Interspeech.2020‑2730
[Google Scholar]
GiriR. VenkataramaniS. ValinJ.M. IsikU. KrishnaswamyA. Personalized percepnet: Real-time, low-complexity target voice separation and enhancement.arXiv preprint 2106.04129,202110.21437/Interspeech.2021‑694
[Google Scholar]
ValinJ.M. TennetiS. HelwaniK. IsikU. KrishnaswamyA. Low-complexity, real-time joint neural echo control and speech enhancement based on percepnet.ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 06-11 June 2021, pp. 7133-7137.10.1109/ICASSP39728.2021.9414140
[Google Scholar]
ShahnawazuddinS. KumarA. KumarV. KumarS. AhmadW. Robust children’s speech recognition in zero resource condition.Appl. Acoust.202218510838210.1016/j.apacoust.2021.108382
[Google Scholar]
ThimmarajaY.G. NagarajaB.G. JayannaH.S. Speech enhancement and encoding by combining SS-VAD and LPC.Int. J. Speech Technol.202124116517210.1007/s10772‑020‑09786‑9
[Google Scholar]
BogachN. BoitsovaE. ChernonogS. LamtevA. LesnichayaM. LezheninI. NovopashennyA. SvechnikovR. TsikachD. VasilievK. PyshkinE. BlakeJ. Speech processing for language learning: A practical approach to computer-assisted pronunciation teaching.Electronics202110323510.3390/electronics10030235
[Google Scholar]
QiJ. HuH. WangY. YangC.H.H. SiniscalchiS.M. LeeC.H. Tensor-to-vector regression for multi-channel speech enhancement based on tensor-train network.ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 04-08 May 2020, pp. 7504-7508.10.1109/ICASSP40776.2020.9052938
[Google Scholar]
HayesB. ShierJ. FazekasG. McPhersonA. SaitisC. A review of differentiable digital signal processing for music and speech synthesis.Front. Signal Process20243128410010.3389/frsip.2023.1284100
[Google Scholar]
KaurJ. SinghA. KadyanV. Automatic speech recognition system for tonal languages: State-of-the-art survey.Arch. Comput. Methods Eng.20212831039106810.1007/s11831‑020‑09414‑4
[Google Scholar]
GaoQ. XiangJ. HouS. TangH. ZhongY. YeS. Method using L-kurtosis and enhanced clustering-based segmentation to detect faults in axial piston pumps.Mech. Syst. Signal Process.202114710713010.1016/j.ymssp.2020.107130
[Google Scholar]
SchroterH. Escalante-BA.N. RosenkranzT. MaierA. Deepfilternet: A low complexity speech enhancement framework for full-band audio based on deep filtering. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, 23-27 May 2022, pp. 7407-7411.10.1109/ICASSP43922.2022.9747055
[Google Scholar]
HemaC. Garcia MarquezF.P. Emotional speech Recognition using CNN and Deep learning techniques.Appl. Acoust.202321110949210.1016/j.apacoust.2023.109492
[Google Scholar]
Vanden Bosch der NederlandenC.M. JoanisseM.F. GrahnJ.A. Music as a scaffold for listening to speech: Better neural phase-locking to song than speech.Neuroimage202021411676710.1016/j.neuroimage.2020.11676732217165
[Google Scholar]
WaliA. AlamgirZ. KarimS. FawazA. AliM.B. AdanM. MujtabaM. Generative adversarial networks for speech processing: A review.Comput. Speech Lang.20227210130810.1016/j.csl.2021.101308
[Google Scholar]
ZhouX. ZhangM. ZhouY. WuZ. LiH. Accented text-to-speech synthesis with limited data.IEEE/ACM Trans. Audio Speech Lang. Process.2024321699171110.1109/TASLP.2024.3363414
[Google Scholar]
ZhangC. ZhangC. ZhengS. ZhangM. QamarM. BaeS.H. KweonI.S. A survey on audio diffusion models: Text to speech synthesis and enhancement in generative ai.arXiv preprint 2303.133362023
[Google Scholar]
EniM. DinsteinI. IlanM. MenasheI. MeiriG. ZigelY. Estimating autism severity in young children from speech signals using a deep neural network.IEEE Access2020813948913950010.1109/ACCESS.2020.3012532
[Google Scholar]
OzakiY. SatoS. McBrideJ.M. PfordresherP.Q. TierneyA.T. SixJ. SavageP.E. Automatic acoustic analyses quantify pitch discreteness within and between human music, speech, and bird song.Proc. 10th Int. Folk Music Anal. Work202239
[Google Scholar]
RenY. LeiM. HuangZ. ZhangS. ChenQ. YanZ. ZhaoZ. Prosospeech: Enhancing prosody with quantized vector pre-training in text-to-speech.ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, 23-27 May 2022, pp. 7577-7581.
[Google Scholar]
KrumpholzC. QuigleyC. AmeenK. ReuterC. FusaniL. LederH. The effects of pitch manipulation on male ratings of female speakers and their voices.Front. Psychol.20221391185410.3389/fpsyg.2022.91185435874336
[Google Scholar]
AlharbiS. AlrazganM. AlrashedA. AlnomasiT. AlmojelR. AlharbiR. AlharbiS. AlturkiS. AlshehriF. AlmojilM. Automatic speech recognition: Systematic literature review.IEEE Access2021913185813187610.1109/ACCESS.2021.3112535
[Google Scholar]
TsaiH.S. ChangH.J. HuangW.C. HuangZ. LakhotiaK. YangS.W. SUPERB-SG: Enhanced speech processing universal PERformance benchmark for semantic and generative capabilities.arXiv preprint 2203.06849202210.18653/v1/2022.acl‑long.580
[Google Scholar]
XiaY. BraunS. ReddyC.K. DubeyH. CutlerR. TashevI. Weighted speech distortion losses for neural-network-based real-time speech enhancement.ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 04-08 May 2020, pp. 871-875.10.1109/ICASSP40776.2020.9054254
[Google Scholar]
PattanayakB. PradhanG. Pitch-robust acoustic feature using single frequency filtering for children’s KWS.Pattern Recognit. Lett.202115018318810.1016/j.patrec.2021.07.015
[Google Scholar]
GomathyM. Optimal feature selection for speech emotion recognition using enhanced cat swarm optimization algorithm.Int. J. Speech Technol.202124115516310.1007/s10772‑020‑09776‑x
[Google Scholar]
TorresB. PeetersG. RichardG. Unsupervised harmonic parameter estimation using differentiable DSP and spectral optimal transport.ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, Republic of, 14-19 April 2024, pp. 1176-1180.10.1109/ICASSP48485.2024.10447011
[Google Scholar]
DasN. ChakrabortyS. ChakiJ. PadhyN. DeyN. Fundamentals, present and future perspectives of speech enhancement.Int. J. Speech Technol.202124488390110.1007/s10772‑020‑09674‑2
[Google Scholar]
MehrishA. MajumderN. BharadwajR. MihalceaR. PoriaS. A review of deep learning techniques for speech processing.Inf. Fusion20239910186910.1016/j.inffus.2023.101869
[Google Scholar]
AbdusalomovA.B. SafarovF. RakhimovM. TuraevB. WhangboT.K. Improved feature parameter extraction from speech signals using machine learning algorithm.Sensors (Basel)20222221812210.3390/s2221812236365819
[Google Scholar]
SerràJ. PascualS. PonsJ. ArazR.O. ScainiD. Universal speech enhancement with score-based diffusion.arXiv preprint 2206.03065,2022
[Google Scholar]
MorrisonM. KumarR. KumarK. SeetharamanP. CourvilleA. BengioY. Chunked autoregressive gan for conditional waveform synthesis.arXiv preprint 2110.10139,2021
[Google Scholar]
HosodaY. KawamuraA. IiguniY. Complex-domain pitch estimation algorithm for narrowband speech signals.IEEE/ACM Trans. Audio Speech Lang. Process.2023312067207810.1109/TASLP.2023.3278488
[Google Scholar]
LiB. ZhangX. A pitch estimation algorithm for speech in complex noise environments based on the radon transform.IEEE Access2023119876988910.1109/ACCESS.2023.3240181
[Google Scholar]
TamerN.C. ÖzerY. MüllerM. SerraX. TAPE: An end-to-end timbre-aware pitch estimator.ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023, pp. 1-5.10.1109/ICASSP49357.2023.10096762
[Google Scholar]
BarysenkaS.Y. VorobiovV.I. SNR-based inter-component phase estimation using bi-phase prior statistics for single-channel speech enhancement.IEEE/ACM Trans. Audio Speech Lang. Process.2023312365238110.1109/TASLP.2023.3284514
[Google Scholar]
ShirahataY. YamamotoR. SongE. TerashimaR. KimJ.M. TachibanaK. Period VITS: Variational inference with explicit pitch modeling for end-to-end emotional speech synthesis.ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 04-10 June 2023, pp. 1-5.10.1109/ICASSP49357.2023.10096480
[Google Scholar]
WangT. ZhuW. GaoY. ZhangS. FengJ. Harmonic attention for monaural speech enhancement.IEEE/ACM Trans. Audio Speech Lang. Process.2023312424243610.1109/TASLP.2023.3284522
[Google Scholar]
JoladB. KhanaiR. An approach for speech enhancement with dysarthric speech recognition using optimization based machine learning frameworks.Int. J. Speech Technol.202326228730510.1007/s10772‑023‑10019‑y
[Google Scholar]
ChenZ. ShiT. SongP. LiC. CaoY. YanY. Improved pitch control strategy for the robust operation of wind energy conversion system in the high wind speed condition.Int. J. Electr. Power Energy Syst.202315310938110.1016/j.ijepes.2023.109381
[Google Scholar]
LiX. YanY. SoraghanJ. WangZ. RenJ. A music cognition–guided framework for multi-pitch estimation.Cognit. Comput.2023151233510.1007/s12559‑022‑10031‑5
[Google Scholar]
OzturkM.Z. WuC. WangB. WuM. LiuK.J.R. RadioSES: MmWave-based audioradio speech enhancement and separation system.IEEE/ACM Trans. Audio Speech Lang. Process.2023311333134710.1109/TASLP.2023.3250846
[Google Scholar]
LatifS. ZaidiA. CuayahuitlH. ShamshadF. ShoukatM. QadirJ. Transformers in speech processing: A survey.arXiv preprint 2303.11607,2023
[Google Scholar]
XuY. CaiZ. KongX. Improved pitch shifting data augmentation for ship radiated noise classification.2023Available from: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4347042
KrauseM. WeißC. MüllerM. Soft dynamic time warping for multi-pitch estimation and beyond.ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 04-10 June 2023, pp. 1-5.10.1109/ICASSP49357.2023.10095907
[Google Scholar]
VijayK. KrithigaP. KavirakeshS. Pitch extraction and notes generation implementation using tensor flow.2023 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 23-25 January 2023, pp. 1-8.10.1109/ICCCI56745.2023.10128544
[Google Scholar]
ChenW. XingX. XuX. PangJ. DuL. SpeechFormer++: A hierarchical efficient framework for paralinguistic speech processing.IEEE/ACM Trans. Audio Speech Lang. Process.20233177578810.1109/TASLP.2023.3235194
[Google Scholar]

/content/journals/eng/10.2174/0118722121312618240612093010

Exploring Hybrid Techniques for Enhanced Pitch Estimation in Speech Processing

Recent Pat Eng 19, E020724231322 (2025); https://doi.org/10.2174/0118722121312618240612093010

/content/journals/eng/10.2174/0118722121312618240612093010

Data & Media loading...

Article Type: Research Article

Keyword(s): cepstral method; gross pitch error; K-neural network; phase error filtering; pitch detection; Speech processing

Exploring Hybrid Techniques for Enhanced Pitch Estimation in Speech Processing

Abstract

From This Site

Most Read This Month

Most Cited Most Cited RSS feed

A Review of Clustering Algorithms: Comparison of DBSCAN and K-mean with Oversampling and t-SNE

Research Progress on Superhydrophobic Surface Preparation Methods and Mechanical Durability

Recent Methods and Challenges in Brain Tumor Detection Using Medical Image Processing

Numerical Analysis of Johnson-Cook Damage Model Parameters Effects on the Cutting Simulation of AISI 1045

Convolutional Neural Network Based Intelligent Advertisement Search Framework for Online English Newspapers

A Review of Hardware-In-The-Loop Simulation for Control Performance Verification of Permanent Magnet Synchronous Motors

Channel Estimation for Underwater Acoustic OFDM Communications: Recent Advances

A Blockchain based Fund Management Scheme for Financial Transactions in NGOs

Numerical Investigation of Polymer-based Biomaterials for Artificial Hip Joint with Diverse Boundary Conditions

An Extensive Review on Gas Hydrates: Recent Patents, Properties, Formation, Detection, Production, Importance, and Challenges