Automating Duplicate Detection for Lexical Heterogeneous Web Databases

Anil Ahlawat; Kalpna Sagar

doi:10.2174/2666255813999200904170035

ISSN: 2666-2558
E-ISSN: 2666-2566

Automating Duplicate Detection for Lexical Heterogeneous Web Databases
By Anil Ahlawat and Kalpna Sagar
Source: Recent Advances in Computer Science and Communications, Volume 15, Issue 4, May 2022, p. 540 - 549
DOI: https://doi.org/10.2174/2666255813999200904170035
- Available online: 01 May 2022

Abstract

Introduction: The need for efficient search engines has been identified with the everincreasing technological advancement and huge growing demand for data on the web. Method: Automating duplicate detection over a query results in identifying the records from multiple web databases that point to a similar real-world entity and return non-matching records to the end-users. The proposed algorithm in this paper is based on an unsupervised approach with classifiers over heterogeneous web databases that return more accurate results with high precision, Fmeasure, and recall. Different assessments have also been executed to analyze the efficacy of the proposed algorithm for the identification of duplicates. Result: Results show that the proposed algorithm has greater precision, F-score measure, and the same recall values as compared to standard UDD. Discussion: This paper aims to introduce an algorithm that automates the process of duplicate detection for lexical heterogeneous web databases. Conclusion: This paper concludes that the proposed algorithm outperforms the standard UDD.

Article metrics loading...

/content/journals/rascs/10.2174/2666255813999200904170035

2022-05-01

2026-03-01

From This Site

/content/journals/rascs/10.2174/2666255813999200904170035

dcterms_title,dcterms_subject,pub_keyword

-contentType:Contributor -contentType:Concept -contentType:Institution

10

5

Full text loading...

/content/journals/rascs/10.2174/2666255813999200904170035

Article Type: Research Article

Keyword(s): data mining; Duplicate detection; record linkage; web browser; web databases; weighted component similarity summing

Automating Duplicate Detection for Lexical Heterogeneous Web Databases

Abstract

From This Site

Most Read This Month

Most Cited Most Cited RSS feed

Key Issues in Software Reliability Growth Models

An Ensemble of Bacterial Foraging, Genetic, Ant Colony and Particle Swarm Approach EB-GAP: A Load Balancing Approach in Cloud Computing

Remaining Useful Life Prediction of Lithium-ion Batteries Using Multiple Kernel Extreme Learning Machine

ROUGE-SS: A New ROUGE Variant for the Evaluation of Text Summarization

Extensive Review of Literature on Explainable AI (XAI) in Healthcare Applications

An Analog Circuit Fault Diagnosis Approach Based on Wavelet-based Fractal Analysis and Multiple Kernel SVM

Research on Monitoring System of Daily Statistical Indexes Through Big Data

A Study on E-Learning and Recommendation System

Container Elasticity: Based on Response Time using Docker

Revolutionizing Agriculture: A Comprehensive Review of IoT Farming Technologies