ralphy generate is the single CLI gate for every model call. Image, video, voiceover, music, sfx, captions — all five sub-verbs land here, all log to the same JSONL, all update the same asset-manifest.json, all version automatically on regen. Most of the time the agent drives these on your behalf during the one-beat-at-a-time intake loop. You’ll call them by hand when you want to regenerate a single slot, sweep variants, or preview cost before firing.
The verb’s whole job is to be the choke point. No raw curl, no bunx tsx against a media API, no ffmpeg shells — every recipe lives behind a ralphy verb so the gen-log, the manifest, and the cost rollup all stay honest. That’s AGENTS invariant #2.
The five sub-verbs at a glance
| Sub-verb | Default model | Output | Per-call cost (typical) |
|---|---|---|---|
generate image | google/gemini-3-pro-image-preview (multi-ref / character consistency); openai/gpt-5.4-image-2 for premium typography | PNG, slot-named | ~$0.04 |
generate video | kwaivgi/kling-v3.0-pro | MP4, slot-named | 2.40 |
generate voiceover | elevenlabs/eleven_multilingual_v2 | MP3, slot-named | ~$0.30 per 1k chars |
generate music | ElevenLabs Music (instrumental default) | MP3, slot-named | ~$0.005/sec |
generate sfx | ElevenLabs Sound Generation (≤22s) | MP3, slot-named | flat per call |
generate captions | ElevenLabs Scribe v1 (word-level) | JSON Caption[], slot-named | ~$0.005 per minute |
cli/commands/generate.ts as of v0.3.0. Always cross-check MODELS.md before assuming — Claude’s training is stale, and Ralphy reads MODELS.md before every call for a reason.
When the agent drives, when you drive
Most generation happens implicitly during the intake loop. Ralphy generates the location-master-plate, the persona masters, then scene anchors one at a time, surfacing each to you for approval. You don’t type the commands; you say “go” and Ralphy fires them. The CLI invocation is identical either way — what changes is who’s at the keyboard. You’ll runralphy generate yourself in these cases:
- Regenerate one slot after a miss — “scene-03 looks off, try seedance”.
- Sweep variants —
--variants 4on an image to compare A/B/C/D in parallel. - Preview cost before firing a long video —
--dry-runreturns the resolved request and the cost estimate without spending money. - Model swap — pass
--model <id>to override the default for a single call. - Recovery mid-batch — one slot of a batch failed; rerun just that slot.
Slot IDs and the manifest
Every generated file lands in a slot, and the slot id is the file name’s prefix. The convention is{scene-id}-{type}-{descriptor}:
asset-manifest.json at the project root tracks every slot:
ralphy project show <id> --assets. Detail in Asset manifest.
Versioning on regen
When you regenerate a slot that already exists, the new file lands at.v2.<ext>, then .v3, .v4, and so on. The existing file is preserved unchanged. The manifest tracks both; only “promoting” a chosen variant on your explicit say-so flips the manifest pointer to a new winner.
--force-overwrite flag bypasses it and writes in place — you almost never want this. Detail in Reviewing and iterating.
Image — generate image
The bread-and-butter still gen.
OpenRouter model id. Default
google/gemini-3-pro-image-preview (multi-ref / character consistency, nano-banana-pro lineage, ~0.20/image, caps at 1 concurrent).Reference image(s). URL, local path, or
data: URI. Local paths auto-convert to data: URI. Repeat the flag to pass multiple refs.Size hint, default
1080x1920. Passed as prompt-level guidance — gemini and gpt image models don’t accept exact pixel dimensions and round to their natural sizes.Generate N parallel variants. Writes
<slot>-v1.png through <slot>-vN.png. Capped at 8.Negative prompt — what the image should not contain.
Print the resolved request and the cost estimate; do not submit. Always free.
Video — generate video
The expensive call. Always --dry-run first if you’re not sure.
Default
kwaivgi/kling-v3.0-pro. Switch to bytedance/seedance-2.0 for horror, POV, walking, jump-scares, or any non-default physics motion (per MODELS.md and the venom-bodywash postmortem).Seconds. Per-model
supported_durations may be discrete — hailuo accepts only 6 and 10. Run ralphy models show <id> to see the whitelist.Anchor image for image-to-video. URL, local path, or
data: URI. Strongly recommended for portrait orientation when the prompt has wide-shot bias.Enable model-native audio. Supported on
veo-3.1, kling-v3.0-pro (EN only — accent slip and age drift on RU), seedance-2.0, and most modern i2v endpoints. See MODELS.md per-model audio column.Validate params, print the resolved request and cost estimate; do not submit.
--duration, --aspect-ratio, --resolution, and --first-frame / --last-frame against the per-model whitelist from the OpenRouter catalog before submitting. If your params don’t fit the model, you get a E_VALIDATION_FAILED with the violated field and a suggestion. Override with --no-validate if you know what you’re doing.
Voiceover — generate voiceover
ElevenLabs voice id — a cloned voice or a library voice.
VO text. RU or EN supported by the default
eleven_multilingual_v2 model.0–1, default 0.55. Lower = more variation (good for emotional / cinematic deliveries); higher = monotone (good for analog-horror PSA / robo-narrator).
0–1, default 0.25. 0 = monotone broadcast register, 1 = full dramatic. The analog-horror postmortem documented
style 0 + stability 0.5 as the cold-robo-female PSA register.Music — generate music
--with-vocals if you actually want vocals — usually you don’t for ad work, since ElevenLabs Music’s ToS blocks named-artist references and the post-mix sidechain-duck under your voiceover sounds cleaner instrumental.
Captions — generate captions
workspace/projects/<id>/assets/captions/<slot>.json (default) — per-slot captions, not the legacy shared captions.json (which clobbered between calls in the noski and venom postmortems). Pass --legacy-output if you have scripts that grep the old path.
Cost preview with --dry-run
Every paid sub-verb supports --dry-run. The output names the model, the resolved request, the file Ralphy would write, and the cost estimate — all free.
--dry-run before a 10-second kling call or a long ElevenLabs Music render.
The reference-required gate
If your slot is for a named real entity and no--ref is attached, the agent layer refuses before the CLI ever fires. If you genuinely want to skip the gate on a specific call, pass --no-ref-consent "<reason>" — non-empty string required. The CLI logs stage: "no-ref-consent" to user-prompts.jsonl. Detail: Brands, personas, refs.
Related
- Reviewing and iterating — versioning, promoting a winner
- Brands, personas, refs — the
--refflag and the gate - Rendering — what
generatefeeds into - CLI: generation verbs — every flag, every sub-verb
- Prompt library — battle-tested prompt entries indexed by goal
MODELS.md— per-model pricing, lifecycle, parameterscli/commands/generate.ts— source of truth