Voice-Based Depression Pattern Recognition Using Mel-Frequency Cepstral Coefficients Feature Extraction

Wahju Tjahjo Saputro; Abdul Fadlil; Murinto Murinto

doi:10.33998/processor.2025.20.2.2513

Authors

Wahju Tjahjo Saputro Universitas Muhammadiyah Purworejo
Abdul Fadlil Universitas Ahmad Dahlan
Murinto Murinto Universitas Ahmad Dahlan

DOI:

https://doi.org/10.33998/processor.2025.20.2.2513

Keywords:

Pengenalan pola, Suara, Depresi, Sehat, MFCC

Abstract

The identification of depression patterns from human voices is important because depression can interfere with activities, reduce interest in learning, and hinder socialisation. Depression is a significant problem today because there has been a global increase in the number of people suffering from it. The factors contributing to depression are numerous and complex, and can affect all groups, from children to the elderly. The purpose of this study was to identify depression patterns based on voice feature extraction. The feature extraction method used is Mel-Frequency Cepstral Coefficients (MFCC). The MFCC method is capable of extracting features that closely resemble the human auditory system. The dataset used is the EATD-Corpus, which contains 162 recordings of students from Tongji University in China. The results of the study show that depression and healthy patterns can be distinguished using MFCC parameters, namely 25 measurements per frame, 10 frame intervals, an alpha value of 0.97 as the pre-emphasis coefficient, a maximum of 40 Mel filterbank coefficients, and 12 cepstral coefficients. Classification thresholds can be obtained for two classes: healthy with thresholds < 53.00 and depressed ≥ 53.00 using the Self-Rating Depression Scale.

Downloads

Download data is not yet available.

References

Y. Zhang et al., “Identifying Depression-Related Topics in Smartphone-Collected Free-Response Speech Recordings Using an Automatic Speech Recognition System and A Deep Learning Topic Model,” J. Affect. Disord., vol. 355, no. September 2023, hal. 40–49, 2024, doi: 10.1016/j.jad.2024.03.106.

L. Munira, P. Liamputtong, dan P. Viwattanakulvanid, “Barriers and facilitators to access mental health services among people with mental disorders in Indonesia: A qualitative study,” Belitung Nurs. J., vol. 9, no. 2, hal. 110–117, 2023, doi: 10.33546/bnj.2521.

E. Rejaibi, A. Komaty, F. Meriaudeau, S. Agrebi, dan A. Othmani, “MFCC-based Recurrent Neural Network for automatic clinical depression recognition and assessment from speech,” Biomed. Signal Process. Control, vol. 71, no. PA, hal. 103107, 2022, doi: 10.1016/j.bspc.2021.103107.

R. Cheung, S. O’Donnell, N. Madi, dan E. M. Goldner, “Factors associated with delayed diagnosis of mood and/or anxiety disorders,” Heal. Promot. Chronic Dis. Prev. Canada, vol. 37, no. 5, hal. 137–148, 2017, doi: 10.24095/hpcdp.37.5.02.

R. Huerta-Ramírez, J. Bertsch, M. Cabello, M. Roca, J. M. Haro, dan J. L. Ayuso-Mateos, “Diagnosis delay in first episodes of major depression: A study of primary care patients in Spain,” J. Affect. Disord., vol. 150, no. 3, hal. 1247–1250, 2013, doi: 10.1016/j.jad.2013.06.009.

L. J. Barney, K. M. Griffiths, A. F. Jorm, dan H. Christensen, “Stigma about depression and its impact on help-seeking intentions,” Aust. N. Z. J. Psychiatry, vol. 40, no. 1, hal. 51–54, 2006, doi: 10.1111/j.1440-1614.2006.01741.x.

E. Samari et al., “Perceived mental illness stigma among family and friends of young people with depression and its role in help-seeking: a qualitative inquiry,” BMC Psychiatry, vol. 22, no. 1, hal. 1–13, 2022, doi: 10.1186/s12888-022-03754-0.

R. M. Epstein et al., “‘I didn’t know what was wrong:’ How people with undiagnosed depression recognize, name and explain their distress,” J. Gen. Intern. Med., vol. 25, no. 9, hal. 954–961, 2010, doi: 10.1007/s11606-010-1367-0.

F. Farajullah dan M. Murinto, “Sistem Pakar Deteksi Dini Gangguan Kecemasan (Anxiety) Menggunakan Metode Forward Chaining Berbasis Web,” JSTIE (Jurnal Sarj. Tek. Inform., vol. 7, no. 1, hal. 1, 2019, doi: 10.12928/jstie.v7i1.15800.

M. Kapitány-Fövény, M. Vetró, G. Révy, D. Fabó, D. Szirmai, dan G. Hullám, “EEG based depression detection by machine learning: Does inner or overt speech condition provide better biomarkers when using emotion words as experimental cues?,” J. Psychiatr. Res., vol. 178, no. January, hal. 66–76, 2024, doi: 10.1016/j.jpsychires.2024.08.002.

H. Kim, S. H. Lee, S. E. Lee, S. Hong, H. J. Kang, dan N. Kim, “Depression prediction by using ecological momentary assessment, actiwatch data, and machine learning: Observational study on older adults living alone,” JMIR mHealth uHealth, vol. 7, no. 10, 2019, doi: 10.2196/14149.

T. M. H. Li et al., “Detection of Suicidal Ideation in Clinical Interviews for Depression Using Natural Language Processing and Machine Learning: Cross-Sectional Study,” JMIR Med. Informatics, vol. 11, no. 1, hal. 1–13, 2023, doi: 10.2196/50221.

Bahman Zohuri dan Siamak Zadeh, “The Utility of Artificial Intelligence for Mood Analysis, Depression Detection, and Suicide Risk Management,” J. Heal. Sci., vol. 8, no. 2, hal. 67–73, 2020, doi: 10.17265/2328-7136/2020.02.003.

P. Lopez-Otero, L. Docio-Fernandez, A. Abad, dan C. Garcia-Mateo, “Depression detection using automatic transcriptions of de-identified speech,” Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, vol. 2017-Augus, hal. 3157–3161, 2017, doi: 10.21437/Interspeech.2017-1201.

Z. Liu, D. Wang, L. Zhang, dan B. Hu, “A Novel Decision Tree for Depression Recognition in Speech,” 2020, [Daring]. Tersedia pada: http://arxiv.org/abs/2002.12759

B. Sumali et al., “Speech quality feature analysis for classification of depression and dementia patients,” Sensors (Switzerland), vol. 20, no. 12, hal. 1–17, 2020, doi: 10.3390/s20123599.

E. Villatoro-Tello, S. P. Dubagunta, J. Fritsch, G. Raḿirez-De-La-Rosa, P. Motlicek, dan M. Magimai.-Doss, “Late fusion of the available lexicon and raw waveform-based acoustic modeling for depression and dementia recognition,” Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, vol. 1, hal. 161–165, 2021, doi: 10.21437/Interspeech.2021-1288.

O. Simantiraki, P. Charonyktakis, A. Pampouchidou, M. Tsiknakis, dan M. Cooke, “Glottal source features for automatic speech-based depression assessment,” Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, vol. 2017-Augus, hal. 2700–2704, 2017, doi: 10.21437/Interspeech.2017-1251.

N. Seneviratne dan C. Espy-Wilson, “Speech based depression severity level classification using a multi-stage dilated CNN-LSTM model,” Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, vol. 2, hal. 746–750, 2021, doi: 10.21437/Interspeech.2021-1967.

S. A. Nasser, I. A. Hashim, dan W. H. Ali, “A review on depression detection and diagnoses based on visual facial cues,” 2020 3rd Int. Conf. Eng. Technol. its Appl. IICETA 2020, hal. 35–40, 2020, doi: 10.1109/IICETA50496.2020.9318860.

L. R. Demenescu, R. Kortekaas, J. A. den Boer, dan A. Aleman, “Impaired attribution of emotion to facial expressions in anxiety and major depression,” PLoS One, vol. 5, no. 12, 2010, doi: 10.1371/journal.pone.0015058.

X. Zhou, K. Jin, Y. Shang, dan G. Guo, “Visually Interpretable Representation Learning for Depression Recognition from Facial Images,” IEEE Trans. Affect. Comput., vol. 11, no. 3, hal. 542–552, 2020, doi: 10.1109/TAFFC.2018.2828819.

M. Teychenne, K. Ball, dan J. Salmon, “Sedentary behavior and depression among adults: A review,” Int. J. Behav. Med., vol. 17, no. 4, hal. 246–254, 2010, doi: 10.1007/s12529-010-9075-z.

M. C. Lovejoy, P. A. Graczyk, E. O’Hare, dan G. Neuman, “Maternal depression and parenting behavior,” Clin. Psychol. Rev., vol. 20, no. 5, hal. 561–592, 2000, doi: 10.1016/s0272-7358(98)00100-7.

M. R. Morales, S. Scherer, dan R. Levitan, “A cross-modal review of indicators for depression detection systems,” Proc. Annu. Meet. Assoc. Comput. Linguist., hal. 1–12, 2017, doi: 10.18653/v1/w17-3101.

M. B. Akçay dan K. Oğuz, “Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers,” Speech Commun., vol. 116, hal. 56–76, 2020, doi: 10.1016/j.specom.2019.12.001.

S. Helmiyah, A. Fadlil, dan A. Yudhana, “Pengenalan Pola Emosi Manusia Berdasarkan Ucapan Menggunakan Ekstraksi Fitur Mel-Frequency Cepstral Coefficients (MFCC),” CogITo Smart J., vol. 4, no. 2, hal. 372–381, 2019, doi: 10.31154/cogito.v4i2.129.372-381.

Y. Shen, H. Yang, dan L. Lin, “Automatic Depression Detection: an Emotional Audio-Textual Corpus and a Gru/Bilstm-Based Model,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. 2022-May, hal. 6247–6251, 2022, doi: 10.1109/ICASSP43922.2022.9746569.

A. Benba, A. Jilbab, dan A. Hammouch, “Analysis of multiple types of voice recordings in cepstral domain using MFCC for discriminating between patients with Parkinson’s disease and healthy people,” Int. J. Speech Technol., vol. 19, no. 3, hal. 449–456, 2016, doi: 10.1007/s10772-016-9338-4.

A. Sharif, O. S. Sitompul, dan E. B. Nababan, “Analysis Of Variation In The Number Of MFCC Features In Contrast To LSTM In The Classification Of English Accent Sounds,” J. Informatics Telecommun. Eng., vol. 6, no. 2, hal. 587–601, 2023, doi: 10.31289/jite.v6i2.8566.

T. Jain, A. Jain, P. S. Hada, H. Kumar, V. K. Verma, dan A. Patni, “Machine Learning Techniques for Prediction of Mental Health,” Proc. 3rd Int. Conf. Inven. Res. Comput. Appl. ICIRCA 2021, hal. 1606–1613, 2021, doi: 10.1109/ICIRCA51532.2021.9545061.

W. W. Zung, “A Self-rating Depression Scale,” Arch. Gen. Psychiatry, vol. 12, no. 1, hal. 63–70, 1965, doi: 10.1001/archpsyc.1965.01720310065008.

J. Ancilin dan A. Milton, “Improved Speech Emotion Recognition with Mel Frequency Magnitude Coefficient,” Appl. Acoust., vol. 179, hal. 108046, 2021, doi: 10.1016/j.apacoust.2021.108046.

V. Maheshwar, N. Venu Gopal, V. Naveen Kumar, D. Pranavi, dan Y. Padma Sai, “Development of an SVM-based Depression Detection Model using MFCC Feature Extraction,” Int. Conf. Sustain. Comput. Smart Syst. ICSCSS 2023 - Proc., no. Icscss, hal. 808–814, 2023, doi: 10.1109/ICSCSS57650.2023.10169770.

M. J. Raj dan M. Sujith Kumar, “Gender based Affection Recognition of Speech Signals using Spectral & Prosodic Feature Extraction,” Int. J. Eng. Res. Gen. Sci., vol. 3, no. 2, hal. 898–905, 2015, [Daring]. Tersedia pada: www.ijergs.org

A. Zlotnik, J. M. Montero, R. San-Segundo, dan A. Gallardo-Antolín, “Random forest-based prediction of Parkinson’s disease progression using acoustic, ASR and intelligibility features,” Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, vol. 2015-Janua, hal. 503–507, 2015, doi: 10.21437/interspeech.2015-184.

A. Ahmed, R. Sultana, M. T. R. Ullas, M. Begom, M. M. I. Rahi, dan M. A. Alam, “A Machine Learning Approach to detect Depression and Anxiety using Supervised Learning,” 2020 IEEE Asia-Pacific Conf. Comput. Sci. Data Eng. CSDE 2020, 2020, doi: 10.1109/CSDE50874.2020.9411642.

A. Arya, R. Kumari, dan P. Bansal, “Predicting Depression and Mental Illness Using Machine Learning Algorithms,” 2023 Int. Conf. Commun. Secur. Artif. Intell. ICCSAI 2023, hal. 399–404, 2023, doi: 10.1109/ICCSAI59793.2023.10421262.

A. B. Abdusalomov, F. Safarov, M. Rakhimov, B. Turaev, dan T. K. Whangbo, “Improved Feature Parameter Extraction from Speech Signals Using Machine Learning Algorithm,” Sensors, vol. 22, no. 21, 2022, doi: 10.3390/s22218122.

A. Afshan, J. Guo, S. J. Park, V. Ravi, J. Flint, dan A. Alwan, “Effectiveness of voice quality features in detecting depression,” Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, vol. 2018-Septe, no. September, hal. 1676–1680, 2018, doi: 10.21437/Interspeech.2018-1399.

M. Azhar dan H. F. Pardede, “Klasifikasi Dialek Pengujar Bahasa Inggris Menggunakan Random Forest,” J. Media Inform. Budidarma, vol. 5, no. 2, hal. 439, 2021, doi: 10.30865/mib.v5i2.2754.

N. Adhikari, S. Bhattacharya, M. Sultana, A. Maiti, dan D. Sengupta, “Identification of Bird via Acoustics for Bird Species Population Monitoring and Conservation,” Int. Conf. Big Data Anal. Bioinformatics, DABCon 2024, hal. 1–6, 2024, doi: 10.1109/DABCon63472.2024.10919276