Image Tag Generation Based on Deep Features Using Deep Learning Techniques

Heba Adnan Raheem^1,*, Hiba Jabbar Aleqabie², Ameer Sameer Hamood Mohammed Ali³

¹Department Computer Science, College of Computer Science and Information Technology, University of Kerbala, Kerbala, Iraq

²Department Artificial Intelligence Engineering, College of Information Technology Engineering, Al-Zahraa University for Women, Kerbala, Iraq

³Presidency of the University of Babylon, University of Babylon TOEFL Center, Babylon, Iraq

Emails: hiba.adnan@uokerbala.edu.iq; Hiba.jabbar@Alzahraa.edu.iq; pre225.ameer.sameir@uobabylon.edu.iq

Abstract

The task of automatically generating descriptive and accurate image tags has gained significant attention in recent years due to the exponential growth of image data. Traditional methods for image tagging rely on manual annotation, which is time-consuming and subjective. Automated imagine description fills the gap between visual content and human comprehension, making it vital for activities such as information retrieval, editing, and accessibility. The expanding number of unannotated photographs makes manual tagging impossible. This paper provides a deep learning-based system that combines CNNs for feature extraction, RNNs for caption production, and attention techniques to focus on significant image areas. The model uses a sequence-to-sequence architecture to create coherent captions using pre-trained CNN features and attention-enhanced RNNs. Experiments on datasets such as Flickr8k and Flickr30k show higher performance, as evidenced by BLEU, ROUGE, and CIDEr measures. This approach provides a scalable, cutting-edge solution for image captioning, with potential applications in video analysis, enriched language production, and larger datasets.

Keywords: CNN; Deep learning; Feature extraction; Image processing; Tag generation