Identification of Websites Using an Efficient Method Employing Text Mining Methods

Madhur Taneja

Identification of Websites Using an Efficient Method Employing Text Mining Methods

By Madhur Taneja¹
View Affiliations Hide Affiliations

¹ Centre for Interdisciplinary Research in Business and Technology, Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India
Source: Demystifying Emerging Trends in Machine Learning , pp 127-138
Publication Date: February 2025
Language: English

Herein, we introduce a method for website classification using deep neural networks and mixed data extractors. We use iterative training as well as supervised learning approaches to use a gradient descent methodology to simulate the website categorization. This modern model is comprised of a webpage encoder, a convolutional neural network (CNN) feature extraction, a bidirectional long short-term memory (LSTM) feature extractor, as well as a fully connected classifier. It may retrieve various website features at various granularities. Our model may quickly select a suitable website class by concatenating mixed features obtained from mixed feature extractors. On the realistic website dataset that has been obtained, we conduct in-depth tests. The dataset is compiled using domains that were taken from the telecom operator's DNS records. The proposed categorization schema outperforms state-of-the-art models in comparison to our fresh model as well as a slew of popular machine learning algorithms in terms of accuracy, recall, F1, and precision. Other web apps may benefit from all of this as well, such as detecting fake websites as well as ads.

Hardbound ISBN: 9789815305401

Ebook ISBN: 9789815305395

Book DOI: https://doi.org/10.2174/97898153053951250201

From This Site

/content/books/9789815305395.chapter-12

dcterms_subject,pub_keyword

-contentType:Journal -contentType:Figure -contentType:Table -contentType:SupplementaryData

10

5

/content/books/9789815305395.chapter-12

dcterms_subject,pub_keyword

-contentType:Journal -contentType:Figure -contentType:Table -contentType:SupplementaryData

10

5

Chapter

content/books/9789815305395

Book

false

en

Identification of Websites Using an Efficient Method Employing Text Mining Methods

From This Site