VISION-LANGUAGE INTEGRATION FOR AUTOMATED IMAGE CAPTIONING USING CONVOLUTIONAL AND RECURRENT NEURAL NETWORKS

S.Vijay Kumar; Bijigiri Pavan Kalyan

doi:10.64751/

VISION-LANGUAGE INTEGRATION FOR AUTOMATED IMAGE CAPTIONING USING CONVOLUTIONAL AND RECURRENT NEURAL NETWORKS

Authors

S.Vijay Kumar Author
Bijigiri Pavan Kalyan Author

DOI:

https://doi.org/10.64751/

Abstract

Image captioning bridges computer vision and natural language processing by generating meaningful textual descriptions from visual content. This research presents a vision-language framework for automated image caption generation using Convolutional Neural Networks (CNN) and Long ShortTerm Memory (LSTM) networks. The CNN component extracts high-level visual features from input images, while the LSTM network decodes these features into grammatically coherent sentences. The model is trained on a large annotated dataset that aligns images with corresponding captions to learn semantic relationships between objects and linguistic structures. Experimental results demonstrate that the proposed model achieves high accuracy and fluency in caption generation, outperforming traditional template-based and rule-driven methods. This approach showcases the effectiveness of deep learning in understanding and describing visual information, making it valuable for applications in accessibility, content retrieval, and human-computer interaction

Downloads

Published

2025-11-04

Issue

Vol. 4 No. 4 (2025): Volume 4, Issue 4, 2025

Section

Articles

License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

How to Cite

S.Vijay Kumar, & Bijigiri Pavan Kalyan. (2025). VISION-LANGUAGE INTEGRATION FOR AUTOMATED IMAGE CAPTIONING USING CONVOLUTIONAL AND RECURRENT NEURAL NETWORKS. International Journal of Data Science and IoT Management System, 4(4), 282–289. https://doi.org/10.64751/