Skip to content
2000

Classification & Clustering of Text Based on Doc2Vec & K-means Clustering based Similarity Measurements

image of Classification & Clustering of Text Based on Doc2Vec & K-means Clustering based Similarity Measurements
Preview this chapter:

One crucial task in text processing is determining how similar two papers are to one another. A novel similarity metric is suggested in this study. Finding a suitable similarity metric for written materials that permits the development of coherent groupings is a significant difficulty for document clustering. After that, we use TFIDF to build a vector space, and then we use the ward's approach and the K-means algorithm to accomplish clustering. WordNet is additionally employed in the process of semantic document clustering. Visualisations and an interactive website illustrating the connections between all clusters illustrate the findings. The existence (and quantity) of words in texts are all that are taken into account while utilising the traditional bag-ofwords paradigm. This process might lead to texts with identical meanings but distinct vocabulary being placed in various groups. The findings acquired using the suggested approach are analysed for their correctness using the F-measure. Comparisons using the sentence vectors model (Doc2vec) and the bag-of-words model are made to confirm the edge of the suggested strategy. The suggested methodology may be used to decipher web chat logs and client feedback posted online. We evaluate our method on a variety of real-world data sets including examples of text classification and clustering problems. The findings prove that the proposed measure outperforms competing strategies.

/content/books/9789815305395.chapter-23
dcterms_subject,pub_keyword
-contentType:Journal -contentType:Figure -contentType:Table -contentType:SupplementaryData
10
5
Chapter
content/books/9789815305395
Book
false
en
Loading
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test