Increasing Performance of Boolean Retrieval Model by Data Parallelism Technique
- Authors: Mukesh Rawat1, Preksha Pratap2, Manan Gupta3, Hardik Sharma4
-
View Affiliations Hide AffiliationsAffiliations: 1 Department of Computer Science and Engineering, Meerut Institute of Engineering &Technology, Meerut, U.P., India 2 Department of Computer Science and Engineering, Meerut Institute of Engineering &Technology, Meerut, U.P., India 3 Department of Computer Science and Engineering, Meerut Institute of Engineering &Technology, Meerut, U.P., India 4 Department of Computer Science and Engineering, Meerut Institute of Engineering &Technology, Meerut, U.P., India
- Source: Recent Developments in Artificial Intelligence & Communication Technologies , pp 185-206
- Publication Date: October 2022
- Language: English
- Previous Chapter
- Table of Contents
- Next Chapter
Information retrieval (IR) is to identify documents of non-uniform behavior that fulfill information requirements from the huge repository (maintained in computer systems). Different models have been defined to retrieve/fetch information. For example, the Boolean model, the Statistical model, which focuses on the vector space and probabilistic retrieval, and the Linguistic and Knowledge-based retrieval models. The Boolean model is defined as the “perfect match” model. If the queries are not accurate, they retrieve/fetch some irrelevant documents. This is called the precision (p) rate, which is the proportion of the relevant retrieved documents. The Boolean method provides good techniques to elaborate or concise a query. The Boolean method works well for the search process because of the clarity between the concepts. The Boolean retrieval model processes the queries in which terms of the queries are in the form of Boolean expressions, that is, in which terms of the user query combined with AND( amp;), OR(||), and NOT(!) operators. The model views documents in the form of inverted indexes. The key concept of an inverted index is to maintain a dictionary of terms. For every term, there is a collection of documents in which the term occurs. Posting is a collection of documents in which a term occurs. The list is known as the postings list (or inverted list), and all the postings lists are collectively called postings. But as the number of documents is increased, the postings of documents are also increased, and processing these documents becomes time-consuming; so to resolve this problem, a multithreaded model is proposed in which the postings list is broken down into different chunks and processes, due to which Boolean operation between postings in accordance with Boolean query becomes faster. Using this data parallelism technique, the performance of the Boolean Retrieval Model is increased.
-
From This Site
/content/books/9781681089676.chap10dcterms_subject,pub_keyword-contentType:Journal105