VISION TRANSFORMERS OF AI-GENERATED VISUAL CONTENTCLASSIFICATION
DOI:
https://doi.org/10.5281/zenodo.19145351Abstract
The rapid development of generative artificial intelligence (AI) models has significantly transformed the creation of digital visual content. Modern generative models such as diffusion models and generative adversarial networks are capable of producing highly realistic images that are often indistinguishable from genuine photographs. While these technologies have expanded opportunities in creative design, media production, and digital automation, they have also introduced serious challenges related to misinformation, deepfake dissemination, digital forgery, and copyright violations. Consequently, the ability to accurately classify AI-generated images has become a critical requirement for maintaining trust in digital media ecosystems. Traditional image classification approaches largely rely on convolutional neural networks (CNNs) that focus on local spatial features. Although CNNs have achieved strong performance in many vision tasks, they struggle to capture long-range dependencies and global contextual relationships that are important for detecting subtle artifacts present in AI-generated images. Vision Transformers (ViTs), which utilize self-attention mechanisms to model global image relationships, have emerged as a powerful alternative architecture for advanced visual understanding. This study proposes a Vision Transformer-based framework for detecting and classifying AI-generated visual content. The proposed system leverages transformer encoders to extract global contextual representations from images and improves classification performance compared to conventional CNN approaches. Experimental evaluation demonstrates that transformer-based architectures provide superior detection capability for synthetic images generated by modern AI models. The results highlight the potential of Vision Transformers in enhancing image authenticity verification systems and combating the growing threat of synthetic visual misinformation in digital platforms.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.






