tweenr
Sign inStart free
All posts
Post · 4 min read · 2026-05-15

How AI Turns a Sentence Into a 1080p MP4

Tweenr teamWriter, HK

A behind-the-scenes look at how AI motion-graphics tools turn a prompt into a video — prompt → structured plan → React/JSX code → 900 PNG frames → MP4. Not magic; just a four-step pipeline.

You type "create a credit approval popup, dark mode, with a spring animation" — six seconds later Tweenr hands you a 1080p MP4. What happens in between?

It looks like magic but it's really a four-step pipeline. Here's the breakdown, with the official Remotion docs linked at the end if you want to go deeper.

Step 1: Sentence → Plan

An LLM (we use Claude Sonnet / Haiku) reads your sentence — but it doesn't produce video. First it answers a series of design questions:

  • What's this scene's purpose? (Informational alert? Brand reveal? Emotional moment?)
  • What elements does it need? (Icons, type, background, animated graphics)
  • What's the rhythm? (Snap / smooth / epic — three preset tiers)
  • Which visual archetype fits? (Tweenr ships 150 brand-flavoured archetypes — Stripe, Apple, Linear, etc.)

The output isn't a video — it's a structured plan: scene layout, type hierarchy, colour palette, the frame numbers at which each element enters and exits. Anthropic Claude's structured-output mode helps here: it lets you force the LLM to return strict JSON instead of free text.

Step 2: Plan → Code

With the plan in hand, the LLM writes actual React/JSX code — not template fill-ins. The code uses Remotion's animation primitives, e.g.:

const opacity = interpolate(frame, [0, 30], [0, 1])

That single line means: from frame 0 to frame 30, opacity goes from 0 to 1 — a 1-second fade-in (at 30fps). Remotion's interpolate() helper is the heart of frame-driven animation, and LLMs are now fluent in writing this kind of expression.

Code editor showing TypeScript animation code
Each scene is its own React component. The LLM writes real TypeScript using primitives like interpolate / spring / Sequence — all provided by Remotion. Photo: Chris Ried / Unsplash.

Why code instead of templates? Templates mean "drop your text in, get back the same animation" — 1,000 users get 1,000 identical videos. Code means every render is a fresh composition: new scene structure, new timing, new layout. Infinite variation.

Step 3: Code → Frames

Once the code is written, Remotion transpiles it with @babel/standalone and then runs your React component on AWS Lambda — server-side, 30 times per second, capturing each frame as it goes.

A 30-second ad × 30 fps = 900 PNG frames.

StepInputOutputTypical time (30s scene)
1. PlanUser promptStructured scene plan (JSON)3–8s
2. CodeScene plan~200 lines of React/JSX10–30s
3. RenderJSX + frame numbers 0–899900 PNG frames40–90s (Lambda)
4. Encode900 PNG frames1 MP4 file10–20s

Step 4: Frames → MP4

The 900 PNGs feed into FFmpeg, get the H.264 codec applied, and out comes a 1080p MP4. End-to-end, 1–2 minutes on a Hong Kong connection.

The hard part isn't any of this

The four-step pipeline — LLM writes code, Remotion renders, FFmpeg encodes — is now commodity infrastructure. Any developer with the same stack can replicate it. Remotion now ships an official LLM system prompt, meaning the whole pipeline is standardised.

The hard part isn't generating animation — it's judging whether the animation is any good. An AI can write 100 different fade-in variations, but it can't tell you "does this fade-in fit this brand", "does this rhythm stop the scroll", "is this layout over-designed".

That's why we built a library of 150 brand-flavoured style archetypes — the LLM can write any code, but it needs design taste as a guide. More on that in the next post.

Sources

  1. Generate Remotion Code using LLMs — Remotion official docs.
  2. interpolate() helper — Remotion API reference.
  3. Just-in-time compilation of Remotion code — Remotion docs.
  4. Animating properties — Remotion docs.

Try the Cantonese editor
Free, no credit card required.
How AI Turns a Sentence Into a 1080p MP4