Deep Learning-Powered Visual Augmentation for the Visually Impaired
- Authors: Gandrapu Satya Sai Surya Subrahmanya Venkata Krishna Mohan1, Mahammad Firose Shaik2, G. Usandra Babu3, Manikandan Hariharan4, Kiran Kumar Patro5
-
View Affiliations Hide Affiliations1 Department of Electronics and Communication Engineering, Aditya Institute of Technology and Management, Tekkali, India 2 Department of Electronics and Instrumentation Engineering, Velagapudi Ramakrishna Siddhartha Engineering College, Deemed to be University, Vijayawada, India 3 Department of Electronics and Communication Engineering, Aditya University, Surampalem, Andhra Pradesh, India 4 CMR Institute of Technology, Bangaluru, India 5 Department of Electronics and Communication Engineering, Aditya Institute of Technology and Management, Tekkali, India
- Source: Blockchain-Enabled Internet of Things Applications in Healthcare: Current Practices and Future Directions , pp 218-233
- Publication Date: January 2025
- Language: English
Deep Learning-Powered Visual Augmentation for the Visually Impaired, Page 1 of 1
< Previous page | Next page > /docserver/preview/fulltext/9789815305210/chapter-10-1.gif
The interdisciplinary convergence of computer vision and object detection is pivotal for advancing intelligent image analysis. This research surpasses conventional object recognition methodologies by delving into a more nuanced understanding of images, akin to human visual comprehension. It explores deep learning and established object detection systems such as convolutional neural networks (CNN), Region-based CNN (R-CNN), and you only look once (YOLO). The proposed model excels in realtime object recognition, outperforming its predecessors, as previous systems typically detect only a limited number of objects in an image and are most effective at a distance of 5-6 meters. Uniquely, it employs Google Translate for the verbal identification of detected objects, offering a crucial accessibility feature for individuals with visual impairments. This study integrates computer vision, deep learning, and real-time object recognition to enhance visual perception, providing valuable assistance to those facing visual challenges. The proposed method utilizes the Common Objects in Context (COCO) dataset for image comprehension, employing object detection and object tracking with a deep neural network (DNN). The system's output is converted into spoken words through a text-to-speech feature, empowering visually impaired individuals to comprehend their surroundings effectively. The implementation involves key technologies such as NumPy, OpenCV, pyttsx3, PyWin32, OpenCV-contribpython, and winsound, contributing to a comprehensive system for computer vision and audio processing. Results demonstrate successful execution, with the camera consistently detecting and labeling 5-6 objects in real time.
-
From This Site
/content/books/9789815305210.chapter-10dcterms_subject,pub_keyword-contentType:Journal -contentType:Figure -contentType:Table -contentType:SupplementaryData105