Classification & Clustering of Text Based on Doc2Vec & K-means Clustering based Similarity Measurements

- By Prakriti Kapoor1
-
View Affiliations Hide Affiliations1 Centre for Interdisciplinary Research in Business and Technology, Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India
- Source: Demystifying Emerging Trends in Machine Learning , pp 249-260
- Publication Date: February 2025
- Language: English


Classification & Clustering of Text Based on Doc2Vec & K-means Clustering based Similarity Measurements, Page 1 of 1
< Previous page | Next page > /docserver/preview/fulltext/9789815305395/chapter-23-1.gif
One crucial task in text processing is determining how similar two papers are to one another. A novel similarity metric is suggested in this study. Finding a suitable similarity metric for written materials that permits the development of coherent groupings is a significant difficulty for document clustering. After that, we use TFIDF to build a vector space, and then we use the ward's approach and the K-means algorithm to accomplish clustering. WordNet is additionally employed in the process of semantic document clustering. Visualisations and an interactive website illustrating the connections between all clusters illustrate the findings. The existence (and quantity) of words in texts are all that are taken into account while utilising the traditional bag-ofwords paradigm. This process might lead to texts with identical meanings but distinct vocabulary being placed in various groups. The findings acquired using the suggested approach are analysed for their correctness using the F-measure. Comparisons using the sentence vectors model (Doc2vec) and the bag-of-words model are made to confirm the edge of the suggested strategy. The suggested methodology may be used to decipher web chat logs and client feedback posted online. We evaluate our method on a variety of real-world data sets including examples of text classification and clustering problems. The findings prove that the proposed measure outperforms competing strategies.
-
From This Site
/content/books/9789815305395.chapter-23dcterms_subject,pub_keyword-contentType:Journal -contentType:Figure -contentType:Table -contentType:SupplementaryData105
