Researchers have discovered that AI image generation models, despite having access to extensive visual datasets, often resort to a limited range of styles when tasked with producing images from a series of evolving prompts. A study published in the journal Patterns reveals that both the Stable Diffusion XL and LLaVA models default to just a handful of motifs, leading to generic outputs over time.
The study employed a method akin to a game of visual telephone, where the Stable Diffusion XL model generated an initial image based on a detailed prompt. For example, one prompt described a scene where an individual finds an old book while alone in nature. This image was then described by LLaVA, and that description was used to create a new image, continuing this process for 100 cycles. The outcome highlighted how the original visual details were often lost, resembling the way messages transform in a human game of telephone.
Surprisingly, in analyzing 1,000 iterations of the game, researchers noted that most sequences converged into one of just 12 dominant visual motifs. Though shifts between styles were typically gradual, there were instances of abrupt changes. The researchers characterized these styles as “visual elevator music,” akin to generic artwork found in hotel rooms, often comprising scenes like maritime lighthouses, formal interiors, and rustic architecture.
The findings persisted even when different models were used for image generation and description. While the primary coalescence around a particular style often occurred around the 100th iteration, variations emerged in the subsequent rounds; however, these variations still predominantly drew from popular motifs.
The implications of this study suggest that AI technology lacks the creative variability often seen in human communication. In human interpretations, each person introduces their biases and unique perspectives, resulting in diverse outcomes. Conversely, AI models appear to fall back on a narrow selection of familiar styles, regardless of how imaginative the original prompt may have been.
This phenomenon raises questions about the nature of creativity in AI. The models rely heavily on human-generated prompts, indicating that the limitations may lie in the datasets themselves and the inherent tendencies of human creativity. The study suggests that while AI can mimic styles effectively, teaching it a sense of taste remains a significant challenge.
See also
Multimodal AI Transforms Enterprise Efficiency, Enhancing Customer Service and Risk Management
YouTube Bans Two High-Profile Channels for Misleading AI-Generated Movie Trailers
AI Image Generators Limit Creativity, Defaulting to 12 Common Styles, Study Reveals
OpenAI Launches GPT-5 with Advanced Reasoning and Autonomous Agent Capabilities
AWS Launches Bedrock AgentCore for Custom AI Agents, Enhancing Business Automation


















































