Skip to content
2000
Volume 15, Issue 4
  • ISSN: 2666-2558
  • E-ISSN: 2666-2566

Abstract

Introduction: The need for efficient search engines has been identified with the everincreasing technological advancement and huge growing demand for data on the web. Method: Automating duplicate detection over a query results in identifying the records from multiple web databases that point to a similar real-world entity and return non-matching records to the end-users. The proposed algorithm in this paper is based on an unsupervised approach with classifiers over heterogeneous web databases that return more accurate results with high precision, Fmeasure, and recall. Different assessments have also been executed to analyze the efficacy of the proposed algorithm for the identification of duplicates. Result: Results show that the proposed algorithm has greater precision, F-score measure, and the same recall values as compared to standard UDD. Discussion: This paper aims to introduce an algorithm that automates the process of duplicate detection for lexical heterogeneous web databases. Conclusion: This paper concludes that the proposed algorithm outperforms the standard UDD.

Loading

Article metrics loading...

/content/journals/rascs/10.2174/2666255813999200904170035
2022-05-01
2025-08-21
Loading full text...

Full text loading...

/content/journals/rascs/10.2174/2666255813999200904170035
Loading
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test