Multilingual Speaker Recognition using Mel-frequency Cepstral Coefficients and Gaussian Mixture Model

Mayur Rahul; Sonu Kumar Jha; Ayushi Prakash; Sarvachan Verma; Vikash Yadav

doi:10.2174/0123520965280852231212041006

ISSN: 2352-0965
E-ISSN: 2352-0973

Multilingual Speaker Recognition using Mel-frequency Cepstral Coefficients and Gaussian Mixture Model
Authors: Mayur Rahul¹, Sonu Kumar Jha², Ayushi Prakash³, Sarvachan Verma³ and Vikash Yadav⁴
View Affiliations Hide Affiliations

¹ Department of Computer Application, CSJM University Kanpur, Uttar Pradesh, India; ² Galgotias University, Greater Noida, Uttar Pradesh, India; ³ Ajay Kumar Garg Engineering College, Ghaziabad, Uttar Pradesh, India ; ⁴ Government Polytechnic Bighapur Unnao, Department of Technical Education, Uttar Pradesh, India
Source: Recent Advances in Electrical & Electronic Engineering, Volume 18, Issue 5, Jun 2025, p. 637 - 643
DOI: https://doi.org/10.2174/0123520965280852231212041006
- Received: 05 Sep 2023
- Accepted: 10 Nov 2023
- Available online: 01 Jun 2025

Abstract

Introduction

People can recognize a speaker with the help of their voice via mobile or digital devices.

Methods

To obtain this congenital human being ability, authentication techniques based on speaker biometrics like automated speaker recognition (ASR) have been proposed. An ASR identifies speakers by speech signals analysis and salient feature extraction from their voices.

Results

This will become an important part of recent research in the voice biometrics field. This paper proposes multilingual speaker recognition with the help of MFCC as feature extraction and GMM as classification techniques using various available datasets such as TIMIT, librespeech, etc.

Conclusion

The results achieved from the given datasets enhance the recognition rate of 70.98% with MFCC.

Article metrics loading...

/content/journals/raeeng/10.2174/0123520965280852231212041006

2025-06-01

2026-03-01

From This Site

/content/journals/raeeng/10.2174/0123520965280852231212041006

dcterms_title,dcterms_subject,pub_keyword

-contentType:Contributor -contentType:Concept -contentType:Institution

10

5

Full text loading...

References

SinghN. AgrawalA. KhanR. The development of speaker recognition technology.Int. J. Adv. Res. Eng. Technol.201893816[IJARET].
[Google Scholar]
JayannaH.S. Mahadeva PrasannaS.R. Analysis, feature extraction, modeling and testing techniques for speaker recognition.IETE Tech. Rev.200926318119010.4103/0256‑4602.50702
[Google Scholar]
HasanM.R. JamilM. RahmanM. Speaker identification using mel frequency cepstral coefficients.3rd International Conference on Electrical & Computer Engineering,Dhaka, Bangladesh, 28-30 Dec, 2004.
[Google Scholar]
SoongF.K. RosenbergA.E. JuangB.H. RabinerL.R. Report: A vector quantization approach to speaker recognition.ATT Tech. J.1987662142610.1002/j.1538‑7305.1987.tb00198.x
[Google Scholar]
HeigoldG. VanhouckeV. SeniorA. NguyenP. RanzatoM. DevinM. DeanJ. Multilingual acoustic models using distributed deep neural networks.IEEE International Conference on Acoustics, Speech and Signal Processing,Vancouver, BC, Canada, 26-31 May,2013, pp. 8619-8623.10.1109/ICASSP.2013.6639348
[Google Scholar]
ThomasS. GanapathyS. HermanskyH. Cross-lingual and multi-stream posterior features for low resource lvcsr systems.Eleventh Annual Conference of the International Speech Communication Association,Makuhari, Chiba, Japan, 26-30 Sep, 2010.10.21437/Interspeech.2010‑295
[Google Scholar]
ThomasS. GanapathyS. HermanskyH. Multilingual mlp features for low-resource lvcsr systems.2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),Kyoto, Japan, 25-30 Mar, 2012, pp. 4269-4272.10.1109/ICASSP.2012.6288862
[Google Scholar]
HajibabaeiM. DaiD. Unified hypersphere embedding for speaker recognition.arXiv2018201808312
[Google Scholar]
ZhuL. YangQ. Speaker recognition system based on weighted feature parameter.Phys. Procedia2012251515152210.1016/j.phpro.2012.03.270
[Google Scholar]
YamanS. PelecanosJ. SarikayaR. Bottleneck features for speaker recognition.The Speaker and Language Recognition Workshop.2012
[Google Scholar]
SaonG. SoltauH. NahamooD. PichenyM. Speaker adaptation ofneural network acoustic models using i-vectors.IEEE Workshopon Automatic Speech Recognition and Understanding,Olomouc, Czech Republic, 08-12 Dec, 2013, pp. 55-59.
[Google Scholar]
RichardsonF. ReynoldsD. DehakN. Deep neural network approachesto speaker and language recognition.IEEE Signal Process. Lett.201522101671167510.1109/LSP.2015.2420092
[Google Scholar]
ZhangM. ChenY. LiL. WangD. Speaker recognition withcough, laugh and" wei".arXiv2017201707860
[Google Scholar]
CaiW. ChenJ. LiM. Exploring the encoding layer and lossfunction in end-to-end speaker and language recognition system.arXiv20182018
[Google Scholar]
ChungJ.S. NagraniA. ZissermanA. Voxceleb2: Deep speaker recognition.arXiv2018201805622
[Google Scholar]
LiuZ. WuZ. LiT. LiJ. ShenC. Gmm and cnn hybrid methodfor short utterance speaker recognition.IEEE Trans. Industr. Inform.20181473244325210.1109/TII.2018.2799928
[Google Scholar]
ShonS. TangH. GlassJ. Frame-level speaker embeddings fortext-independent speaker recognition and analysis of end-to-end model.IEEE Spoken Language Technology Workshop (SLT).Athens, Greece, 18-21 Dec, 2018, pp. 1007-1013.
[Google Scholar]
KyeS.M. JungY. LeeH.B. HwangS.J. KimH. Meta-learningfor short utterance speaker recognition with imbalance length pairs.arxiv2020202002863
[Google Scholar]
ChowdhuryA. RossA. Fusing mfcc and lpc features using 1dtriplet cnn for speaker recognition in severely degraded audio signals.IEEE Trans. Inf. Forensics Security2020151616162910.1109/TIFS.2019.2941773
[Google Scholar]
T¨uskeZ. PintoJ. WillettD. Schl¨uterR. Investigation on cross-and multilingual mlp features under matched and mismatched acoustical conditions.IEEE international conference on acoustics, speech and signal processing,Vancouver, BC, Canada, 26-31 May, 2013, pp. 7349-7353.
[Google Scholar]
T¨uskeZ. Schl¨uterR. NeyH. Multilingual hierarchical mrasta features for asr.Proc. Interspeech.201310.21437/Interspeech.2013‑523
[Google Scholar]
GhoshalA. SwietojanskiP. RenalsS. Multilingual training of deep neural networks.IEEE international conference on acoustics, speech and signal processing,Vancouver, BC, Canada, 26-31 May, 2013, pp. 7319-7323.10.1109/ICASSP.2013.6639084
[Google Scholar]
HuangJ-T. LiJ. YuD. DengL. GongY. Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers.2013 IEEE International Conference on Acoustics, Speech and Signal Processing,Vancouver, BC, Canada, 26-31 May, 2013, pp. 7304-7308.10.1109/ICASSP.2013.6639081
[Google Scholar]
GravesA. JaitlyN. MohamedA-r. Hybrid speech recognition with deep bidirectional lstm.IEEE workshop on automatic speech recognition and understanding,Olomouc, Czech Republic, 08-12 Dec, 2013, pp. 273-278.10.1109/ASRU.2013.6707742
[Google Scholar]
ZhouS. ZhaoY. XuS. XuB. Multilingual recurrent neural networks with residual learning for low-resource speech recognition.INTERSPEECH,Stockholm, Sweden, 20–24 Aug, 2017.10.21437/Interspeech.2017‑111
[Google Scholar]
SrivastavaB.M.L. SitaramS. MehtaR.K. MohanK.D. MataniP. SatpalS. BaliK. SrikanthR. NayakN. Interspeech 2018 low resource automatic speech recognition challenge for indian languages.6th Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU 2018),Gurugram, India, 29-31 Aug, 2018.10.21437/SLTU.2018‑3
[Google Scholar]
NarangS. GuptaM.D. Speech feature extraction techniques: A review.Int. J. Comput. Sci. Mobile Comput.201543107114
[Google Scholar]
ReynoldsD.A. Gaussian mixture models.Encyclopedia of Biometrics.Boston, MASpringer200910.1007/978‑0‑387‑73003‑5_196
[Google Scholar]
RavanelliM. BengioY. Speaker recognition from raw waveform with sincnet.Proceedings of the 2018 IEEE Spoken Language Technology Workshop (SLT),Athens, Greece, 18–21 Dec, 2018, pp. 1021-1028.10.1109/SLT.2018.8639585
[Google Scholar]
NunesJ.A.C. MacêdoD. ZanchettinC. Additive margin sincnet for speaker recognition.Proceedings of the 2019 IEEE International Joint Conference on Neural Networks (IJCNN),Budapest, Hungary, 2019, pp. 1-5.
[Google Scholar]
NagraniA. ChungJ.S. ZissermanA. VoxCeleb: A large-scale speaker identification dataset.arxiv2017201795010.21437/Interspeech.2017‑950
[Google Scholar]
B. K.P. Development of multilingual speech database for speaker recognition in indian languages.2022 International Conference on Wireless Communications Signal Processing and Networking (WiSPNET),Chennai, India, 2022, pp. 54-58.10.1109/WiSPNET54241.2022.9767127
[Google Scholar]
KamińskiK.A. DobrowolskiA.P. Automatic speaker recognition system based on gaussian mixture models, cepstral analysis, and genetic selection of distinctive features.Sensors20222223937010.3390/s2223937036502072
[Google Scholar]

/content/journals/raeeng/10.2174/0123520965280852231212041006

Multilingual Speaker Recognition using Mel-frequency Cepstral Coefficients and Gaussian Mixture Model

Recent Advances in Electrical & Electronic Engineering 18, 637 (2025); https://doi.org/10.2174/0123520965280852231212041006

/content/journals/raeeng/10.2174/0123520965280852231212041006

Data & Media loading...

Article Type: Research Article

Keyword(s): ASR model; biometrics; GMM; MFCC; Multilingual speaker recognition; TIMIT

Multilingual Speaker Recognition using Mel-frequency Cepstral Coefficients and Gaussian Mixture Model

Abstract

From This Site

Most Read This Month

Most Cited Most Cited RSS feed

PMSM Drives and its Application: An Overview

HSLE: A Hybrid Ensemble Classifier for Prediction of Heart Disease

Federated Learning: An Approach for Managing Data Privacy and Security in Collaborative Learning

ANN-fuzzy Hybrid Control Strategy for MPPT of Grid-connected PV

An Efficient Approach for Diabetes Classification Using Feature Selection and Hyperparameter Tuning

(XAI-AGUWEM) Explainable Artificial Intelligence-based Attention Guided Uncertainty Weighting Ensemble Model for the Classification of COVID-19 and Pneumonia in X-ray Medical Images

Performance Analysis of Approximate Parallel Prefix Adders Realized with Field-programmable Gate Array Technology

Multimodal Medical Image Fusion Method based on the Swin Transformer and Self-supervised Contrast Learning

Integrating Machine Learning and Feature Extraction for Islanding Detection in Grid-connected Photovoltaic Systems: A Hybrid Intelligent Approach

Multi-objective based Hybrid Artificial Intelligence Controlled Parallel Inverter in Islanded and Grid Connected Operations