Document Type : Original Article

Author

Assistant Professor of Linguistics University of Isfahan, Isfahan, Iran.

Abstract

This study acoustically examines voice quality parameters in two groups of Persian-speaking men and women. It aimed to assess the ability of voice quality parameters to differentiate Persian speakers and to evaluate the extent to which these parameters capture speaker-specific information. Additionally, this research sought to expand existing knowledge in the field of voice quality and address the limited scope of previous studies on Persian. Acoustic data were collected from 20 female and 20 male speakers in a laboratory setting. Multivariate Analysis of Variance (MANOVA) was used to analyze inter-speaker differences, and the Random Forest Algorithm was employed to assess feature importance. Six voice quality parameters were selected for analysis: jitter (frequency perturbation), shimmer (amplitude perturbation), harmonic-to-noise ratio (HNR), the ratio of the amplitudes of the first and second harmonics (H1-H2), cepstral peak prominence (CPP), and fundamental frequency (F0). The results demonstrated significant acoustic differences among Persian speakers based on voice quality features, though the discriminative power of these features was not uniform. For male speakers, CPP, HNR, and H1-H2 were identified as the most discriminative features, respectively. For female speakers, F0, CPP, and HNR emerged as the key features for speaker identification. The findings highlight the significant role of voice quality parameters in identifying Persian speakers. However, achieving higher accuracy in speaker recognition systems requires considering gender differences and the relative importance of various variables. Moreover, the limited number of participants may affect the generalizability of the results. Thus, future studies are recommended to include larger and more diverse speaker samples.

Keywords

Main Subjects

Authors retain the copyright and full publishing rights. This is an open access article distributed under Creative Commons Attribution 4.0 International License (CC BY 4.0).

اسدی، ه.، حسینی کیونانی، ن.، و نوربخش، م. (1394). بررسیِ تأثیر پوششِ صورت بر ویژگی‌های آکوستیکی سایشی‌های بی‌واک زبان فارسی: پژوهشی در چارچوب آواشناسی قضایی. زبانشناسی و گویش‌های خراسان. (7) 13، 148-135. https://doi.org/10.22067/lj.v7i13.56117
Abercrombie, D. (1967). Elements of general phonetics. Edinburgh University Press.
Asadi, H., Hosseini-Kivanani, N., & Nourbakhsh, M. (2015). Effects of face covering on acoustic properties of voiceless fricatives in Farsi: A forensic approach. Journal of Linguistics & Khorasan Dialects7(13), 135-148. [In Persian] https://doi.org/10.22067/lj.v7i13.56117
Asadi, H., Nourbakhsh, M., Sasani, F., & Dellwo, V. (2018). Examining long-term formant frequency as a forensic cue for speaker identification: An experiment on Persian. In Proceedings of the First International Conference on Laboratory Phonetics and Phonology (pp. 21-28).
Belin, P., Fecteau, S., & Bédard, C. (2004). Thinking the voice: Neural correlates of voice perception. Trends in Cognitive Sciences8(3), 129–135.
Boersma, P., & Weenink, D. (2024). Praat: Doing phonetics by computer. http://www.praat.org
Chai, Y., & Garellek, M. (2022). On H1–H2 as an acoustic measure of linguistic phonation type. The Journal of the Acoustical Society of America152(3), 1856-1870.
Corretge, R. (2022). Praat vocal toolkit. http://www.praatvocaltoolkit.com
Esling, J. H. (2000). Crosslinguistic aspects of voice quality. In R. D. Kent & M. J. Ball (Eds.), Voice quality measurement (pp. 25–35). Singular Publishing Group.
Esling, J. H. (2006). Voice quality. In K. Brown (Ed.), Encyclopedia of Language & Linguistics (pp. 470–474). Elsevier. https://doi.org/10.1016/b0-08-044854-2/00032-8
Esling, J. H. (2013). Voice quality. In C. A. Chapelle (Ed.), The encyclopedia of applied linguistics (pp. 6144–6150). Blackwell.
Fernandes, J. F., Freitas, D., Júnior, A.C., & Teixeira, J. P. (2023). Determination of harmonic parameters in pathological voices—Efficient algorithm. Applied Sciences, 13(4), 2333. https://doi.org/10.3390/app13042333
Hughes, V., Cardoso, A., Foulkes, P., French, P., Gully, A., & Harrison, P. (2023). Speaker-specificity in speech production: The contribution                             of source and filter. Journal of Phonetics97, 101224. https://doi.org/10.1016/j.wocn.2023.101224
Jessen, M., Koster, O., & Gfroerer, S. (2005). Influence of vocal effort on average and variability of fundamental frequency. The International Journal of Speech, Language and the Law12(2), 174-213. https://doi.org/10.1558/sll.2005.12.2.174
Kinoshita, K. (2001). Testing realistic forensic speaker identification in Japanese: A likelihood ratio based approach using formants (Unpublished doctoral dissertation). Australian National University.
Kreiman, J., & Sidtis, D. (2011). Foundations of voice studies. Wiley-Blackwell.
Labutin, P., Koval, S., & Raev, A. (2007). Speaker identification based on the statistical analysis of f0. In Proceedings of IAFPA.
Ladefoged, P., & Johnson, K. (2015). A course in phonetics (7th ed.). Cengage Learning.
Laver, J. (1968). Voice quality and indexical information. International Journal of Language & Communication Disorders, 3(1), 43–54. https://doi.org/10.3109/13682826809011440
Lee, Y., & Kreiman, J. (2022a). Acoustic voice variation in spontaneous speech. The Journal of the Acoustical Society of America151(5), 3462. https://doi.org/10.1121/10.0011471
Lee, Y., & Kreiman, J. (2022b). Linguistic versus biological factors governing acoustic voice variation. Interspeech Proceedings. http://doi.org/10.21437/Interspeech.2022-10847
Lee, Y., Keating, P., & Kreiman, J. (2019). Acoustic voice variation within and between speakers. The Journal of the Acoustical Society of America146(3), 1568.
McDougall, K. (2004). Speaker-specific formant dynamics: An experiment on Australian English /aI/. International Journal of Speech Language and The Law, 11, 103-130. https://doi.org/10.1558/sll.2004.11.1.103
Murton, O., Hillman, R., & Mehta, D. (2020). Cepstral peak prominence values for clinical voice evaluation. American Journal of Speech-Language Pathology29(3),1596–1607.
Nolan, F. (1983). The phonetic bases of speaker recognition. Cambridge University Press, Cambridge.
Park, S. J., Sigouin, C., Kreiman, J., Keating, P. A., Guo, J., Yeung, G., & Alwan, A. (2016). Speaker identity and voice quality: Modeling human responses and automatic speaker recognition. In Interspeech (pp. 1044-1048). http://doi.org/10.21437/Interspeech.2016-523
Park, S. J., Yeung, G., Kreiman, J., Keating, P. A., & Alwan, A. (2017). Using voice quality features to improve short-utterance, text-independent speaker verification systems. In Interspeech (pp. 1522-1526). http:/doi.org/10.21437/Interspeech.2017-157
R Core Team. (2024). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.r-project.org/
Rose, P. (2002). Forensic speaker identification. cRc Press.
Shama, K., Krishna, A., & Cholayya, N. U. (2006). Study of harmonics-to-noise ratio and critical-band energy spectrum of speech as acoustic indicators of laryngeal and voice pathology. EURASIP Journal on Advances in Signal Processing, 1-9.
Teixeira, J. P., Oliveira, C., & Lopes, C. (2013). Vocal acoustic analysis– jitter, shimmer and hnr parameters. Procedia Technology9, 1112-1122. https://doi.org/10.1016/j.protcy.2013.12.124
Wagner, A., & Braun, A. (2003). Is voice quality language-dependent? Acoustic analyses based on speakers of three different languages. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS) (pp. 651-654). Barcelona, Spain.
Woubie, A., Koivisto, L., & Bäckström, T. (2021). Voice-quality features for deep neural network based speaker verification systems. In 29th European Signal Processing Conference (EUSIPCO) (pp.176-180). IEEE. https://doi.org/10.23919/EUSIPCO54536.2021.9616242
CAPTCHA Image