Skip to content
2000
Volume 17, Issue 5
  • ISSN: 2666-2558
  • E-ISSN: 2666-2566

Abstract

Introduction: Image caption generation has long been a fundamental challenge in the area of computer vision (CV) and natural language processing (NLP). In this research, we present an innovative approach that harnesses the power of Deep Convolutional Generative Adversarial Networks (DCGAN) and adversarial training to revolutionize the generation of natural and contextually relevant image captions. Method: Our method significantly improves the fluency, coherence, and contextual relevance of generated captions and showcases the effectiveness of RL reward-based fine-tuning. Through a comprehensive evaluation of COCO datasets, our model demonstrates superior performance over baseline and state-of-the-art methods. On the COCO dataset, our model outperforms current state-of-the-art (SOTA) models across all metrics, achieving BLEU-4 (0.327), METEOR (0.249), Rough (0.525) and CIDEr (1.155) scores. Result: The integration of DCGAN and adversarial training opens new possibilities in image captioning, with applications spanning from automated content generation to enhanced accessibility solutions. Conclusion: This research paves the way for more intelligent and context-aware image understanding systems, promising exciting future exploration and innovation prospects.

Loading

Article metrics loading...

/content/journals/rascs/10.2174/0126662558282389231229063607
2024-07-01
2025-10-08
Loading full text...

Full text loading...

/content/journals/rascs/10.2174/0126662558282389231229063607
Loading

  • Article Type:
    Research Article
Keyword(s): CNN; DCGAN; decoder; discriminator; encoder; generator; RNN
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test