final.mp4 on disk. Target time: 5-8 minutes cold-start, per the perf targets. This page assumes you’ve already cloned ugc-cli and opened it in Claude Code — if not, run through Install first.
Open the project in Claude Code
From the directory you cloned into:In chat, confirm the agent sees the routing table:
are you reading AGENTS.md?The agent should confirm and list the playbooks it routes to (intake, researcher, scenarist, art-director, editor, producer). If it doesn’t, you’re probably in the wrong directory —
pwd should end in /ugc-cli. On Cursor / Copilot / Codex, run ralphy skill install once — see Connect your editor.Type your brief in chat
Drop a one-liner into chat. Be specific about platform, vibe, and POV — the agent uses this to pick a template and route the generation pipeline.
make a 15-second TikTok about my espresso bar, morning vibe, selfie POVThe agent matches the brief against the intake playbook and starts the protocol.
Answer the intake questions
The agent will ask 3-5 clarifying questions in a single turn. Expect something like:
- Target audience language? EN / RU / other. Drives audio pipeline choice (Kling
--audiofor EN, ElevenLabs for non-EN). - Aspect? 9:16 TikTok (default), 16:9 YouTube, or 1:1.
- Brand / named entity? Anything that names a real person, brand, or IP triggers the reference-required gate. For a generic “my espresso bar”, you’re fine without refs.
- Duration / clip count? 15s is the safe default for a first render; the agent confirms.
- Hard constraints? Banned music, brand colors, must-have shots.
Agent picks a template
Before improvising, the agent runs:It surfaces the top-3 matches with one-line descriptions, then proposes one. For an espresso brief the agent will likely pick a creator-lifestyle vibe-reference template. Confirm with “go” or ask for a different angle.The template encodes the postmortem-validated workflow for that vibe — scene count, model picks, prompt vocabulary. You get a head start instead of starting from blank.
Agent generates scenes one beat at a time
The agent creates the project (
espresso-001) and starts generating scene by scene. For each scene it:- Writes the prompt to
workspace/projects/espresso-001/prompts.json. - Calls
ralphy generate image --scene scene-01 ...to make the background plate. - Shows you the image, asks for OK or a variant.
- Calls
ralphy generate video --scene scene-01 ...to animate it. - Calls
ralphy generate voiceover --scene scene-01 ...for the VO line.
workspace/projects/espresso-001/assets/. Every model call writes an entry to logs/generations.jsonl with the input, output, and cost.You can reject a scene at any time. Say “ask for a variant” or “make scene-02 brighter” and the agent regenerates. The old version is preserved on disk as
.scene-02-bg-image.v1.png — Ralphy never overwrites a generation per AGENTS.md invariant #13.Agent reviews the full sequence with you
After all scenes pass, the agent shows you the asset manifest and asks for a “go” before rendering. This is your last chance to swap a model, regenerate a shot, or change a VO line — once you say render, ffmpeg kicks in.Sample manifest:
Render
Say “render” in chat. The agent runs:HyperFrames rasterizes the composition headlessly, ffmpeg encodes the final mp4, and
final.mp4 lands in workspace/projects/espresso-001/render/. The agent reports back with the file path, duration, file size, and total spend pulled from generations.jsonl.Sample output:Total time, cold start. Per the perf targets, a single 15-second video from brief to mp4 should land in ≤ 8 minutes. Most of that is model latency (Kling video gen is ~30s/scene). If you blow past 12 minutes, ask the agent for a postmortem — something’s off.
When things go sideways
- A scene looks wrong. Tell the agent “regenerate scene-02 with X different”. The old file stays on disk as
.v1; you can always promote it back. - Cost is climbing fast. Ask “show me the spend so far”. The agent reads
generations.jsonland gives you a per-model rollup. - The render fails. Run
ralphy doctor— ffmpeg or bun usually went missing. The error message points at the fix. - You hate the template. Run
ralphy template list(or ask the agent), pick another, restart withralphy template use <slug>.
Next
You have an mp4. Now read What just happened to understand which files Ralphy wrote and why — that’s the foundation for everything you’ll do next.Related
- What just happened — file-by-file walkthrough
- Talking to ralphy — phrasing for daily use
- Starting a project — the intake protocol in depth
- Intake playbook — canonical source