The Last Leaf

I created this short narrative film entirely in Runway.ml as a personal case study in storytelling with generative AI.

This project explores a core limitation of the medium: consistency. How do you create a character, an environment, and a sequence of shots that feel like they belong to the same world, when the system naturally wants to reinterpret everything each time?

Very quickly, it became clear that generating individual “good shots” isn’t the challenge. The challenge is maintaining coherence across shots, keeping a character’s proportions stable, preserving environment layout, and ensuring actions connect cleanly from one moment to the next.

Designing for Consistency First

The biggest shift in approach was treating the process less like shot-making and more like pre-production.

Instead of generating shots immediately, I built a consistent visual foundation first. This included defining the character, establishing the environment, and creating a small set of reusable props.

  • A defined character (the koala) with stable proportions, with and without gear

  • A controlled environment (the island) with a fixed layout and recognizable landmarks

  • A small set of props (spear gun, crate, goggles, etc.) reused across shots

By creating these elements up front, essentially building a character sheet and environment references, I was able to guide the model toward consistency instead of reinventing the scene every time.

Without this step, each shot becomes a reinterpretation. With it, the system starts to behave more like a production pipeline.

Prompting as Direction, Not Description

Another key realization was that prompting isn’t really about describing what an image should look like. it’s about constraining change.

The most effective prompts weren’t the most detailed ones, but the ones that clearly defined:

  • what must stay fixed (camera, layout, character scale)

  • what is allowed to change (motion, pose, expression)

In practice, this meant treating each frame like a locked plate and only introducing controlled variation. Small changes, like moving a hand or shifting posture had to be explicitly isolated, otherwise the entire scene would drift.

Key Frames as the Foundation

A critical part of this workflow was starting from a strong key frame.

Instead of trying to generate motion directly, I focused on creating a single, stable frame where the character, environment, and composition were exactly right. This meant using the character and prop sheets as references to lock in proportions, styling, and layout before attempting any movement.

From that point, prompts were used to guide small, controlled changes—camera movement, subtle actions, or performance—while preserving the integrity of the original frame.

If the starting frame wasn’t solid, everything that followed would drift. But with a strong key frame, the system became far more predictable and controllable.

The quality of the final result was directly tied to the quality of that initial frame.

One important detail was ensuring that the key frame fully contained the character and all relevant props within the frame.

If an element wasn’t clearly visible in the starting frame, the system would often reinterpret it when it entered later: changing proportions, design, or scale. By contrast, when everything was established up front, the model had a clear reference to maintain.

In practice, this meant introducing elements while they were fully visible, then allowing them to move or exit the frame. Starting from absence and expecting consistency later was far less reliable.

Sound Design

AI-generated audio is still difficult to rely on.

While it’s possible to prompt sound effects, getting both the visuals and audio right at the same time is very unlikely. The results are inconsistent, and syncing timing and tone to the image is challenging.

For this project, I worked with my coworker and sound designer Steve Orlando to create the final audio.

Some AI-generated sound effects were used as a starting point, but they weren’t reliable enough to carry the film.

It was much more effective to focus on getting the visuals right first, then build the sound afterward.

A Shift in Access

The introduction of generative AI feels similar to when DSLR video first became viable with cameras like the Canon EOS 5D Mark II.

At the time, it made cinematic storytelling accessible in a way it hadn’t been before. Suddenly, individuals had access to tools that were previously limited to large productions.

This feels like a similar shift, but on a much larger scale.

The barrier to creating high-quality visuals continues to drop. Tools that once required entire teams: visual effects, animation and compositing compositing are becoming accessible to individuals.

Takeaways

This project was not just about making a short film. It was about understanding how to work with generative systems instead of against them.

A few key takeaways:

  • Consistency is not solved by better prompts. It is solved by better setup

  • Pre-building characters, props, and environments is essential

  • Prompting is about control, not detail

  • Strong key frames determine the outcome

  • Cuts are often more reliable than motion

  • Iteration is the real advantage, not automation

  • AI Generated audio is near impossible to get right, so don’t count on it. I had my awesome coworker and sound designer Steve Orlando create the audio for this film. He ended up using some of the SFX created by the AI but they certainly weren’t reliable.

This entire film was created by a single person in roughly 25 hours of working time.

Traditionally, a project like this would require a small team of designers, animators, and lighting artists, and would take weeks or months to produce. Generative AI compresses that process dramatically by accelerating iteration and decision-making, not by replacing craft.

That said, the process is still far from perfect. There were multiple shots that proved difficult or impossible to achieve exactly as intended, despite extensive prompting and iteration. Consistency, motion control, and precise performance remain real challenges.

But that is also what makes this moment interesting.

The limitations are visible, and so is the trajectory.

Generative AI today is the least capable it will ever be. The gap between intent and execution is already narrow enough to produce cohesive work, and it is closing quickly.

Next
Next

The Redshift Learning Curve