intake.md — read that file too if you want the full receipts.
Step 0 — Ralphy reads your profile
On the first tool call of a session, Ralphy runs bareralphy (no subcommand) and reads ~/.ralphy/user-profile.json. The output is JSON; the part Ralphy cares about looks like this:
band (one of novice, learning, intermediate, comfortable, experienced, expert) controls how chatty Ralphy is. Novices get a mini-lecture after each step (“here’s why we ask about target language”); experts get one-line confirmations. The protocol itself is identical at every level — only the verbosity scales.
If is_developer is true, the band is overridden: minimal intake, raw CLI suggestions, ship-fast. The schema is documented in cli/lib/user-profile.ts and Memory schemas.
If this is your first project (signals.projects_done === 0) and you haven’t seen the intro before, Ralphy opens with a one-paragraph “here’s the rhythm” preamble, then asks the first question. After that, the band controls verbosity but every band runs the same five-step skeleton below.
Step 1 — The five clarifying questions
These come back as a single turn — three to five questions, each with a default you can accept. The defaults come fromintake.md and your preferences.default_* if you’ve set any.
- Target audience language. EN, RU, KR, other. Drives the audio pipeline. Kling’s
--audioflag is canonical for English; for Russian and other non-EN languages, Ralphy routes voiceover through ElevenLabs. Chat language is not the same as video language — Ralphy asks because one project trip-wired about 10 minutes on a default-Russian assumption the user had to override. - Aspect / platform. 9:16 TikTok, 16:9 YouTube, 1:1 broadcast-realism. Square is the right call for “caught-on-TV” trends; portrait kills the illusion (validated by the kbo broadcast postmortem).
- Brand or named real entity. If your brief names a specific person, a recognizable brand product (e.g. “Coca-Cola can”, “iPhone 16”), or a known IP (“Mickey Mouse”), the reference-required gate fires — Ralphy refuses generation until you attach a reference image, or until you opt out with
--no-ref-consent "<reason>"on the specific failingralphy generatecall. Generic work proceeds without. See References for the gate logic. - Existing template fit. Ralphy runs
ralphy template suggest "<your brief>"and surfaces the top match. If the top result is a strong fit, Ralphy announces the pick and proceeds; if it’s a weak match, Ralphy lists three options and asks once; if nothing’s close, Ralphy enters free-form mode. See Picking a template. - Duration and hard constraints. Default 15s for first iteration, scale up after a successful test render. Any “no music”, “no English captions”, banned words, brand colors — name them now so Ralphy doesn’t volunteer them.
Step 2 — Ralphy drafts a plan
Once your answers land, Ralphy writes a plan back to chat — not to a file. The shape:Step 3 — One beat at a time
After your “go”, Ralphy generates the first beat, surfaces it to chat, and waits before the next. The default cadence:- Location-master-plate first. For any project where two or more scenes share a setting, Ralphy generates the room or location as anchor #1, before any character or scene anchor. Then it passes the plate as
--refon every downstream gen. Skipping this cost one project around $4.50 plus 45 minutes on “they keep sitting on different couches”. - Character / persona masters second. One per cast member, each generated with the location plate as
--ref. Ralphy passes both (location + character) on every scene gen so identity and setting stay locked. - Scene anchors third. scene-01 first → you say good → scene-02 → you say good → only after two solo approvals does Ralphy batch four-to-six anchors at a time.
- i2v video clips next. Same cadence as anchors. Never i2v an unapproved still.
- Voiceover and music after the visuals lock. Otherwise a re-trim cascades into a music re-sync — exactly the failure mode the playdate-pixel postmortem traced.
- Captions on the locked VO files via
ralphy generate captions. - Render via
ralphy editor preflight <id>thenralphy render <id>.
Step 4 — Mid-flight corrections
When a scene misses, Ralphy retries the same approach once. If the second attempt also misses, Ralphy redesigns the scene rather than fighting model drift — the glitter-cream postmortem documented a $0.84 + 20-minute fight between “jar near cheek” and “powder compact” that ended only when the scene was reframed. The redesign comes back to you for approval before any new generation. Old versions are preserved automatically (see Reviewing and iterating) — Ralphy never overwrites your work without explicit “delete” or “wipe” consent (AGENTS invariant #13).Step 5 — The ship gate
Before Ralphy declares “done”, it runs three checks in order:ralphy editor preflight <id>— flags aspect, fps, or music-length divergence.ralphy project verify <id>— flags any drift between the manifest and the files on disk./evaluatorskill on the final mp4 — emitseval.jsonandeval-report.mdwith scene-by-scene scoring and retention check.
Adapting to your skill band
The protocol is the same at every level. What changes is how much Ralphy explains. A novice sees mini-lectures after each gate (“WHY we anchor location first”); a comfortable user sees the gate and one-line context; an expert sees only the JSON output and a “go?” prompt. The recommendation string in yourwhoami output names which mode is active.
You can always override per-project — “explain it like I’m new” upshifts the verbosity for one session, “skip the explanation” downshifts it.
Related
- How to talk to Ralphy — phrasing that maps cleanly to playbooks
- Picking a template — what step 1.4 picks from
- Brands, personas, refs — the reference-required gate in detail
intake.md— the canonical playbook this page paraphrases- Memory schemas —
user-profile.jsonfield by field