CraftStory, a leader in artificial intelligence-generated video content, unveiled its first image-to-video model today, enabling users to create videos of up to five minutes in length. This new feature augments the company’s existing video-to-video model, named Model 2.0, which was launched in November 2025.
As the demand for video as a primary communication medium grows, the image-to-video functionality is poised to enhance applications in marketing, advertising, business communications, and educational content. It allows teams to produce consistent “on-camera” performances without the need for traditional video production methods.
Current video generation technologies often falter in producing coherent footage longer than 30 seconds. Users typically combine short clips to form extended narratives, but the inherent variations during generation can lead to mismatches in faces, outfits, lighting, and motion. While advanced AI workflows can extend video lengths beyond two minutes, longer sequences frequently devolve into disjointed visual chaos.
CraftStory’s approach to long-form video production relies on a proprietary parallelized diffusion pipeline that processes segments concurrently. This technique ensures coherence across clips, maintaining visual consistency throughout the footage. “Image-to-video is a major step toward fully script-driven video creation,” stated founder and CEO Victor Erukhimov, who previously sold his computer vision company, Itseez Inc., to Intel Corp. “You no longer need to record a video to get a realistic human performance.”
According to Erukhimov, users can now start with just a single image to generate believable human representations in long-form videos, complete with natural gestures and expressiveness that align with the narrative. The model was trained on high-frame rate footage of real actors, capturing the nuances of facial expressions, hand movements, and body language. CraftStory claims this results in high-fidelity production of human “actors” that appear fluid and lifelike rather than static or robotic.
The generated videos are available in both portrait and landscape formats at resolutions of 480p and 720p, with the option to upscale to 1080p for enhanced quality. Additionally, CraftStory has introduced support for dynamic camera movements, allowing for walk-and-talk videos up to 80 seconds in duration with realistic motion throughout the scene.
Users can create videos using a single image alongside a script or audio track. The system generates a scene that adheres to the script while AI actors lip-sync to the dialogue. Built-in gesture alignment aims to maintain natural body movements and synchronize with the emotional tone of the speech.
As companies increasingly integrate video into their communication strategies, CraftStory’s latest innovation signifies a significant advancement in AI-driven content creation. By enabling the generation of longer, coherent videos from minimal input, it opens new avenues for storytelling and engagement in various industries.
See also
OpenAI Reveals ChatGPT Plus Features, Helping Users Decide on $20 Subscription Value
AWS Executive Says Generative AI Will Unlock New Markets for Indian Companies
Diffusion Language Models Achieve Optimal Parallel Sampling with Polynomial Chains
OpenAI Reveals Efficient Generative AI Deployment Strategies for Enterprises
Sam Altman Praises ChatGPT for Improved Em Dash Handling























































