تحلیل پارامترهای آکوستیکی کیفیت صدا برای شناسایی گویندگان فارسی‌زبان

اسدی, هما

doi:10.22067/jlkd.2025.91521.1295

تحلیل پارامترهای آکوستیکی کیفیت صدا برای شناسایی گویندگان فارسی‌زبان

نوع مقاله : مقاله پژوهشی

نویسنده

هما اسدی

استادیار زبانشناسی، دانشگاه اصفهان، اصفهان، ایران.

10.22067/jlkd.2025.91521.1295

چکیده

پژوهش حاضر به بررسی آکوستیکی پارامترهای کیفیت صدا در دو گروه زنان و مردان فارسی‌زبان می‌پردازد. این مطالعه با هدف ارزیابی توانایی پارامترهای کیفیت صدا در تمایز گویندگان فارسی‌زبان و بررسی میزان اثرگذاری این پارامترها در شناسایی ویژگی‌های گوینده‌محور طراحی شده است. علاوه بر این، با هدف گسترش دانش موجود در حوزۀ کیفیت صدا و پر کردن خلأ مطالعات محدود پیشین در زبان فارسی انجام شده است. داده‌های آوایی از ۲۰ گویشور زن و ۲۰ گویشور مرد در محیط آزمایشگاهی ضبط شدند. برای تحلیل تفاوت‌های میان گویندگان از آزمون تحلیل واریانس چندمتغیره و برای ارزیابی اهمیت ویژگی‌ها، از الگوریتم جنگل تصادفی بهره گرفته شد. شش پارامتر کیفیت صدا شامل فرکانس‌پریشی، دامنه‌پریشی، نسبت هارمونیک به نویز، نسبت دامنۀ هارمونیک‌های اول و دوم، برجستگی قلۀ طیفی و فرکانس پایه انتخاب شدند. نتایج نشان داد ویژگی‌های کیفیت صدا در نشان‌دادن تفاوت‌های آکوستیکی میان گویندگان فارسی‌زبان دارای تفاوت‌های معناداری بوده‌اند، اگرچه توانایی آن‌ها در تمایز گویندگان به‌طور یکسان نبوده است. برای گویندگان مرد، پارامترهای برجستگی قلۀ طیفی، نسبت هارمونیک به نویز و نسبت دامنۀ هارمونیک‌های اول و دوم، به ترتیب بیشترین توانایی را در تمایز آن‌ها از یکدیگر دارند. برای گویندگان زن، فرکانس پایه، برجستگی قلۀ طیفی و نسبت هارمونیک به نویز به‌عنوان مهم‌ترین ویژگی‌ها برای تشخیص هویت شناخته شدند. نتایج این پژوهش نشان می‌دهد پارامترهای کیفیت صدا نقش قابل‌توجهی در شناسایی گویندگان فارسی‌زبان دارند. بااین‌حال، برای دستیابی به دقت بالاتر در سیستم‌های شناسایی گوینده، توجه به تفاوت‌های جنسیتی و اهمیت متغیرهای مختلف ضروری است. از طرف دیگر، محدودیت تعداد شرکت‌کنندگان ممکن است بر تعمیم‌پذیری نتایج تأثیر بگذارد؛ بنابراین، پیشنهاد می‌شود در پژوهش‌های آینده، از نمونه‌های بزرگتر و تنوع بیشتر در گویندگان استفاده شود.

کلیدواژه‌ها

موضوعات

آوا شناسی

عنوان مقاله [English]

Analysis of Acoustic Voice Quality Parameters for Identifying Persian Speakers

نویسنده [English]

Homa Asadi

Assistant Professor of Linguistics University of Isfahan, Isfahan, Iran.

چکیده [English]

This study acoustically examines voice quality parameters in two groups of Persian-speaking men and women. It aimed to assess the ability of voice quality parameters to differentiate Persian speakers and to evaluate the extent to which these parameters capture speaker-specific information. Additionally, this research sought to expand existing knowledge in the field of voice quality and address the limited scope of previous studies on Persian. Acoustic data were collected from 20 female and 20 male speakers in a laboratory setting. Multivariate Analysis of Variance (MANOVA) was used to analyze inter-speaker differences, and the Random Forest Algorithm was employed to assess feature importance. Six voice quality parameters were selected for analysis: jitter (frequency perturbation), shimmer (amplitude perturbation), harmonic-to-noise ratio (HNR), the ratio of the amplitudes of the first and second harmonics (H1-H2), cepstral peak prominence (CPP), and fundamental frequency (F0). The results demonstrated significant acoustic differences among Persian speakers based on voice quality features, though the discriminative power of these features was not uniform. For male speakers, CPP, HNR, and H1-H2 were identified as the most discriminative features, respectively. For female speakers, F0, CPP, and HNR emerged as the key features for speaker identification. The findings highlight the significant role of voice quality parameters in identifying Persian speakers. However, achieving higher accuracy in speaker recognition systems requires considering gender differences and the relative importance of various variables. Moreover, the limited number of participants may affect the generalizability of the results. Thus, future studies are recommended to include larger and more diverse speaker samples.

کلیدواژه‌ها [English]

Acoustic Phonetics
Speaker-Specific Information
Voice Quality
Persian Speech

Authors retain the copyright and full publishing rights. This is an open access article distributed under Creative Commons Attribution 4.0 International License (CC BY 4.0).

مراجع

اسدی، ه.، حسینی کیونانی، ن.، و نوربخش، م. (1394). بررسیِ تأثیر پوششِ صورت بر ویژگی‌های آکوستیکی سایشی‌های بی‌واک زبان فارسی: پژوهشی در چارچوب آواشناسی قضایی. زبانشناسی و گویش‌های خراسان. (7) 13، 148-135. https://doi.org/10.22067/lj.v7i13.56117

Abercrombie, D. (1967). Elements of general phonetics. Edinburgh University Press.

Asadi, H., Hosseini-Kivanani, N., & Nourbakhsh, M. (2015). Effects of face covering on acoustic properties of voiceless fricatives in Farsi: A forensic approach. Journal of Linguistics & Khorasan Dialects, 7(13), 135-148. [In Persian] https://doi.org/10.22067/lj.v7i13.56117

Asadi, H., Nourbakhsh, M., Sasani, F., & Dellwo, V. (2018). Examining long-term formant frequency as a forensic cue for speaker identification: An experiment on Persian. In Proceedings of the First International Conference on Laboratory Phonetics and Phonology (pp. 21-28).

Belin, P., Fecteau, S., & Bédard, C. (2004). Thinking the voice: Neural correlates of voice perception. Trends in Cognitive Sciences, 8(3), 129–135.

https://doi.org/10.1016/j.tics.2004.01.008

Boersma, P., & Weenink, D. (2024). Praat: Doing phonetics by computer. http://www.praat.org

Chai, Y., & Garellek, M. (2022). On H1–H2 as an acoustic measure of linguistic phonation type. The Journal of the Acoustical Society of America, 152(3), 1856-1870.

https://doi.org/10.1121/10.0014175

Corretge, R. (2022). Praat vocal toolkit. http://www.praatvocaltoolkit.com

Esling, J. H. (2000). Crosslinguistic aspects of voice quality. In R. D. Kent & M. J. Ball (Eds.), Voice quality measurement (pp. 25–35). Singular Publishing Group.

Esling, J. H. (2006). Voice quality. In K. Brown (Ed.), Encyclopedia of Language & Linguistics (pp. 470–474). Elsevier. https://doi.org/10.1016/b0-08-044854-2/00032-8

Esling, J. H. (2013). Voice quality. In C. A. Chapelle (Ed.), The encyclopedia of applied linguistics (pp. 6144–6150). Blackwell.

Fernandes, J. F., Freitas, D., Júnior, A.C., & Teixeira, J. P. (2023). Determination of harmonic parameters in pathological voices—Efficient algorithm. Applied Sciences, 13(4), 2333. https://doi.org/10.3390/app13042333

Hughes, V., Cardoso, A., Foulkes, P., French, P., Gully, A., & Harrison, P. (2023). Speaker-specificity in speech production: The contribution of source and filter. Journal of Phonetics, 97, 101224. https://doi.org/10.1016/j.wocn.2023.101224

Jessen, M., Koster, O., & Gfroerer, S. (2005). Influence of vocal effort on average and variability of fundamental frequency. The International Journal of Speech, Language and the Law, 12(2), 174-213. https://doi.org/10.1558/sll.2005.12.2.174

Kinoshita, K. (2001). Testing realistic forensic speaker identification in Japanese: A likelihood ratio based approach using formants (Unpublished doctoral dissertation). Australian National University.

Kreiman, J., & Sidtis, D. (2011). Foundations of voice studies. Wiley-Blackwell.

Labutin, P., Koval, S., & Raev, A. (2007). Speaker identification based on the statistical analysis of f0. In Proceedings of IAFPA.

Ladefoged, P., & Johnson, K. (2015). A course in phonetics (7th ed.). Cengage Learning.

Laver, J. (1968). Voice quality and indexical information. International Journal of Language & Communication Disorders, 3(1), 43–54. https://doi.org/10.3109/13682826809011440

Lee, Y., & Kreiman, J. (2022a). Acoustic voice variation in spontaneous speech. The Journal of the Acoustical Society of America, 151(5), 3462. https://doi.org/10.1121/10.0011471

Lee, Y., & Kreiman, J. (2022b). Linguistic versus biological factors governing acoustic voice variation. Interspeech Proceedings. http://doi.org/10.21437/Interspeech.2022-10847

Lee, Y., Keating, P., & Kreiman, J. (2019). Acoustic voice variation within and between speakers. The Journal of the Acoustical Society of America, 146(3), 1568.

https://doi.org/10.1121/1.5125134

McDougall, K. (2004). Speaker-specific formant dynamics: An experiment on Australian English /aI/. International Journal of Speech Language and The Law, 11, 103-130. https://doi.org/10.1558/sll.2004.11.1.103

Murton, O., Hillman, R., & Mehta, D. (2020). Cepstral peak prominence values for clinical voice evaluation. American Journal of Speech-Language Pathology, 29(3),1596–1607.

https://doi.org/10.1044/2020_AJSLP-20-00001

Nolan, F. (1983). The phonetic bases of speaker recognition. Cambridge University Press, Cambridge.

Park, S. J., Sigouin, C., Kreiman, J., Keating, P. A., Guo, J., Yeung, G., & Alwan, A. (2016). Speaker identity and voice quality: Modeling human responses and automatic speaker recognition. In Interspeech (pp. 1044-1048). http://doi.org/10.21437/Interspeech.2016-523

Park, S. J., Yeung, G., Kreiman, J., Keating, P. A., & Alwan, A. (2017). Using voice quality features to improve short-utterance, text-independent speaker verification systems. In Interspeech (pp. 1522-1526). http:/doi.org/10.21437/Interspeech.2017-157

R Core Team. (2024). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.r-project.org/

Rose, P. (2002). Forensic speaker identification. cRc Press.

Shama, K., Krishna, A., & Cholayya, N. U. (2006). Study of harmonics-to-noise ratio and critical-band energy spectrum of speech as acoustic indicators of laryngeal and voice pathology. EURASIP Journal on Advances in Signal Processing, 1-9.

https://doi.org/10.1155/2007/85286

Teixeira, J. P., Oliveira, C., & Lopes, C. (2013). Vocal acoustic analysis– jitter, shimmer and hnr parameters. Procedia Technology, 9, 1112-1122. https://doi.org/10.1016/j.protcy.2013.12.124

Wagner, A., & Braun, A. (2003). Is voice quality language-dependent? Acoustic analyses based on speakers of three different languages. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS) (pp. 651-654). Barcelona, Spain.

Woubie, A., Koivisto, L., & Bäckström, T. (2021). Voice-quality features for deep neural network based speaker verification systems. In 29^th European Signal Processing Conference (EUSIPCO) (pp.176-180). IEEE. https://doi.org/10.23919/EUSIPCO54536.2021.9616242

نام و نام خانوادگی *

پست الکترونیکی *

وابستگی سازمانی *

توضیحات *

شناسه امنیتی *

دوره 17، شماره 1 - شماره پیاپی 38
اردیبهشت 1404
صفحه 21-1

تعداد مشاهده مقاله: 661
تعداد دریافت فایل اصل مقاله: 385

تحلیل پارامترهای آکوستیکی کیفیت صدا برای شناسایی گویندگان فارسی‌زبان

Analysis of Acoustic Voice Quality Parameters for Identifying Persian Speakers

مراجع

ارسال نظر در مورد این مقاله

دوره 17، شماره 1 - شماره پیاپی 38
اردیبهشت 1404
صفحه 21-1

فایل ها

هم رسانی

ارجاع به این مقاله

آمار

تحلیل پارامترهای آکوستیکی کیفیت صدا برای شناسایی گویندگان فارسی‌زبان

Analysis of Acoustic Voice Quality Parameters for Identifying Persian Speakers

مراجع

ارسال نظر در مورد این مقاله

دوره 17، شماره 1 - شماره پیاپی 38اردیبهشت 1404صفحه 21-1

فایل ها

هم رسانی

ارجاع به این مقاله

آمار

دوره 17، شماره 1 - شماره پیاپی 38
اردیبهشت 1404
صفحه 21-1