An Investigation of Data Requirements for the Detection of Depression from Social Media Posts

Sumit Dalal; Sarika Jain; Mayank Dave

doi:10.2174/1872212117666220812110956

ISSN: 1872-2121
E-ISSN: 2212-4047

An Investigation of Data Requirements for the Detection of Depression from Social Media Posts
By Sumit Dalal, Sarika Jain and Mayank Dave
Source: Recent Patents on Engineering, Volume 17, Issue 3, May 2023, p. 89 - 101
DOI: https://doi.org/10.2174/1872212117666220812110956
- Available online: 01 May 2023

Abstract

Background: Only a fraction of the produced social media data is usable in mental health assessment. So the problem of sufficient training data for deep learning approaches arises. Data sufficiency can be presented in terms of number of users or the number of posts per user. Objective: We examine the data need of machine learning and deep learning models for a practical system and let researcher choose best fitting models depending on the dataset type available with them. We perform distinct experiments to find the effect of these issues on depression classification by various approaches. Methods: We explored various machine learning and deep learning techniques on various data set versions, taken from Twitter and Reddit, with varying numbers of users and posts per user. Diagnosed and control users are taken in different ratios to assess the impact of an imbalanced dataset. Results: The results reveal that SVM achieved 68% accuracy in depression classification for 70 users each from diagnosed and control group. It decreases for 150 users from each group, but then regains performance for 350 and 550 users from each group. Whereas Naive Bayes got 64% for the same dataset fragment (1). We observed that accuracy decreases for 150 diagnosed users, but then regains performance for 350 and 550 users. However from deep learning algorithms, HAN and BiLSTM perform better, compared to other algorithms, as the imbalance ratio increases. Conclusion: We found, mainly, that classification accuracy increases with the number of users, number of posts per user and imbalance in the number of diagnosed versus control users. We also found that posts from Reddit have better accuracy compared to tweets.

Article metrics loading...

/content/journals/eng/10.2174/1872212117666220812110956

2023-05-01

2026-02-26

From This Site

/content/journals/eng/10.2174/1872212117666220812110956

dcterms_title,dcterms_subject,pub_keyword

-contentType:Contributor -contentType:Concept -contentType:Institution

10

5

Full text loading...

/content/journals/eng/10.2174/1872212117666220812110956

Article Type: Research Article

Keyword(s): depression; machine learning; Mental health; neural network; psycholinguistic; word embedding

An Investigation of Data Requirements for the Detection of Depression from Social Media Posts

Abstract

From This Site

Most Read This Month

Most Cited Most Cited RSS feed

A Review of Clustering Algorithms: Comparison of DBSCAN and K-mean with Oversampling and t-SNE

Research Progress on Superhydrophobic Surface Preparation Methods and Mechanical Durability

Recent Methods and Challenges in Brain Tumor Detection Using Medical Image Processing

Numerical Analysis of Johnson-Cook Damage Model Parameters Effects on the Cutting Simulation of AISI 1045

Convolutional Neural Network Based Intelligent Advertisement Search Framework for Online English Newspapers

A Review of Hardware-In-The-Loop Simulation for Control Performance Verification of Permanent Magnet Synchronous Motors

Channel Estimation for Underwater Acoustic OFDM Communications: Recent Advances

A Blockchain based Fund Management Scheme for Financial Transactions in NGOs

Numerical Investigation of Polymer-based Biomaterials for Artificial Hip Joint with Diverse Boundary Conditions

An Extensive Review on Gas Hydrates: Recent Patents, Properties, Formation, Detection, Production, Importance, and Challenges