Skip to content
2000
Volume 18, Issue 5
  • ISSN: 2352-0965
  • E-ISSN: 2352-0973

Abstract

Introduction

People can recognize a speaker with the help of their voice mobile or digital devices.

Methods

To obtain this congenital human being ability, authentication techniques based on speaker biometrics like automated speaker recognition (ASR) have been proposed. An ASR identifies speakers by speech signals analysis and salient feature extraction from their voices.

Results

This will become an important part of recent research in the voice biometrics field. This paper proposes multilingual speaker recognition with the help of MFCC as feature extraction and GMM as classification techniques using various available datasets such as TIMIT, librespeech,

Conclusion

The results achieved from the given datasets enhance the recognition rate of 70.98% with MFCC.

Loading

Article metrics loading...

/content/journals/raeeng/10.2174/0123520965280852231212041006
2025-06-01
2025-11-05
Loading full text...

Full text loading...

References

  1. SinghN. AgrawalA. KhanR. The development of speaker recognition technology.Int. J. Adv. Res. Eng. Technol.201893816[IJARET].
    [Google Scholar]
  2. JayannaH.S. Mahadeva PrasannaS.R. Analysis, feature extraction, modeling and testing techniques for speaker recognition.IETE Tech. Rev.200926318119010.4103/0256‑4602.50702
    [Google Scholar]
  3. HasanM.R. JamilM. RahmanM. Speaker identification using mel frequency cepstral coefficients.3rd International Conference on Electrical & Computer Engineering,Dhaka, Bangladesh, 28-30 Dec, 2004.
    [Google Scholar]
  4. SoongF.K. RosenbergA.E. JuangB.H. RabinerL.R. Report: A vector quantization approach to speaker recognition.ATT Tech. J.1987662142610.1002/j.1538‑7305.1987.tb00198.x
    [Google Scholar]
  5. HeigoldG. VanhouckeV. SeniorA. NguyenP. RanzatoM. DevinM. DeanJ. Multilingual acoustic models using distributed deep neural networks.IEEE International Conference on Acoustics, Speech and Signal Processing,Vancouver, BC, Canada, 26-31 May,2013, pp. 8619-8623.10.1109/ICASSP.2013.6639348
    [Google Scholar]
  6. ThomasS. GanapathyS. HermanskyH. Cross-lingual and multi-stream posterior features for low resource lvcsr systems.Eleventh Annual Conference of the International Speech Communication Association,Makuhari, Chiba, Japan, 26-30 Sep, 2010.10.21437/Interspeech.2010‑295
    [Google Scholar]
  7. ThomasS. GanapathyS. HermanskyH. Multilingual mlp features for low-resource lvcsr systems.2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),Kyoto, Japan, 25-30 Mar, 2012, pp. 4269-4272.10.1109/ICASSP.2012.6288862
    [Google Scholar]
  8. HajibabaeiM. DaiD. Unified hypersphere embedding for speaker recognition.arXiv2018201808312
    [Google Scholar]
  9. ZhuL. YangQ. Speaker recognition system based on weighted feature parameter.Phys. Procedia2012251515152210.1016/j.phpro.2012.03.270
    [Google Scholar]
  10. YamanS. PelecanosJ. SarikayaR. Bottleneck features for speaker recognition.The Speaker and Language Recognition Workshop.2012
    [Google Scholar]
  11. SaonG. SoltauH. NahamooD. PichenyM. Speaker adaptation ofneural network acoustic models using i-vectors.IEEE Workshopon Automatic Speech Recognition and Understanding,Olomouc, Czech Republic, 08-12 Dec, 2013, pp. 55-59.
    [Google Scholar]
  12. RichardsonF. ReynoldsD. DehakN. Deep neural network approachesto speaker and language recognition.IEEE Signal Process. Lett.201522101671167510.1109/LSP.2015.2420092
    [Google Scholar]
  13. ZhangM. ChenY. LiL. WangD. Speaker recognition withcough, laugh and" wei".arXiv2017201707860
    [Google Scholar]
  14. CaiW. ChenJ. LiM. Exploring the encoding layer and lossfunction in end-to-end speaker and language recognition system.arXiv20182018
    [Google Scholar]
  15. ChungJ.S. NagraniA. ZissermanA. Voxceleb2: Deep speaker recognition.arXiv2018201805622
    [Google Scholar]
  16. LiuZ. WuZ. LiT. LiJ. ShenC. Gmm and cnn hybrid methodfor short utterance speaker recognition.IEEE Trans. Industr. Inform.20181473244325210.1109/TII.2018.2799928
    [Google Scholar]
  17. ShonS. TangH. GlassJ. Frame-level speaker embeddings fortext-independent speaker recognition and analysis of end-to-end model.IEEE Spoken Language Technology Workshop (SLT).Athens, Greece, 18-21 Dec, 2018, pp. 1007-1013.
    [Google Scholar]
  18. KyeS.M. JungY. LeeH.B. HwangS.J. KimH. Meta-learningfor short utterance speaker recognition with imbalance length pairs.arxiv2020202002863
    [Google Scholar]
  19. ChowdhuryA. RossA. Fusing mfcc and lpc features using 1dtriplet cnn for speaker recognition in severely degraded audio signals.IEEE Trans. Inf. Forensics Security2020151616162910.1109/TIFS.2019.2941773
    [Google Scholar]
  20. T¨uskeZ. PintoJ. WillettD. Schl¨uterR. Investigation on cross-and multilingual mlp features under matched and mismatched acoustical conditions.IEEE international conference on acoustics, speech and signal processing,Vancouver, BC, Canada, 26-31 May, 2013, pp. 7349-7353.
    [Google Scholar]
  21. T¨uskeZ. Schl¨uterR. NeyH. Multilingual hierarchical mrasta features for asr.Proc. Interspeech.201310.21437/Interspeech.2013‑523
    [Google Scholar]
  22. GhoshalA. SwietojanskiP. RenalsS. Multilingual training of deep neural networks.IEEE international conference on acoustics, speech and signal processing,Vancouver, BC, Canada, 26-31 May, 2013, pp. 7319-7323.10.1109/ICASSP.2013.6639084
    [Google Scholar]
  23. HuangJ-T. LiJ. YuD. DengL. GongY. Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers.2013 IEEE International Conference on Acoustics, Speech and Signal Processing,Vancouver, BC, Canada, 26-31 May, 2013, pp. 7304-7308.10.1109/ICASSP.2013.6639081
    [Google Scholar]
  24. GravesA. JaitlyN. MohamedA-r. Hybrid speech recognition with deep bidirectional lstm.IEEE workshop on automatic speech recognition and understanding,Olomouc, Czech Republic, 08-12 Dec, 2013, pp. 273-278.10.1109/ASRU.2013.6707742
    [Google Scholar]
  25. ZhouS. ZhaoY. XuS. XuB. Multilingual recurrent neural networks with residual learning for low-resource speech recognition.INTERSPEECH,Stockholm, Sweden, 20–24 Aug, 2017.10.21437/Interspeech.2017‑111
    [Google Scholar]
  26. SrivastavaB.M.L. SitaramS. MehtaR.K. MohanK.D. MataniP. SatpalS. BaliK. SrikanthR. NayakN. Interspeech 2018 low resource automatic speech recognition challenge for indian languages.6th Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU 2018),Gurugram, India, 29-31 Aug, 2018.10.21437/SLTU.2018‑3
    [Google Scholar]
  27. NarangS. GuptaM.D. Speech feature extraction techniques: A review.Int. J. Comput. Sci. Mobile Comput.201543107114
    [Google Scholar]
  28. ReynoldsD.A. Gaussian mixture models.Encyclopedia of Biometrics.Boston, MASpringer200910.1007/978‑0‑387‑73003‑5_196
    [Google Scholar]
  29. RavanelliM. BengioY. Speaker recognition from raw waveform with sincnet.Proceedings of the 2018 IEEE Spoken Language Technology Workshop (SLT),Athens, Greece, 18–21 Dec, 2018, pp. 1021-1028.10.1109/SLT.2018.8639585
    [Google Scholar]
  30. NunesJ.A.C. MacêdoD. ZanchettinC. Additive margin sincnet for speaker recognition.Proceedings of the 2019 IEEE International Joint Conference on Neural Networks (IJCNN),Budapest, Hungary, 2019, pp. 1-5.
    [Google Scholar]
  31. NagraniA. ChungJ.S. ZissermanA. VoxCeleb: A large-scale speaker identification dataset.arxiv2017201795010.21437/Interspeech.2017‑950
    [Google Scholar]
  32. B. K.P. Development of multilingual speech database for speaker recognition in indian languages.2022 International Conference on Wireless Communications Signal Processing and Networking (WiSPNET),Chennai, India, 2022, pp. 54-58.10.1109/WiSPNET54241.2022.9767127
    [Google Scholar]
  33. KamińskiK.A. DobrowolskiA.P. Automatic speaker recognition system based on gaussian mixture models, cepstral analysis, and genetic selection of distinctive features.Sensors20222223937010.3390/s2223937036502072
    [Google Scholar]
/content/journals/raeeng/10.2174/0123520965280852231212041006
Loading
/content/journals/raeeng/10.2174/0123520965280852231212041006
Loading

Data & Media loading...


  • Article Type:
    Research Article
Keyword(s): ASR model; biometrics; GMM; MFCC; Multilingual speaker recognition; TIMIT
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test