ralphy research is the pre-production teardown — turn a URL, handle, or topic into a cited report a scenarist can act on. ralphy eval is the post-render QA — score a rendered mp4 across scene segmentation, audio, captions, and per-scene vision. Both are CLI-native entry points to skills (/researcher, /evaluator); the verbs are the machine contract.
ralphy eval video <path>
Runs the full eval pipeline on a single mp4. Output:eval-report.md (human) + eval.json (machine).
{ verdict, score, findings, severities, jsonPath, mdPath }. The full report carries scene segmentation, loudness windows, dead-air spots, caption density, per-scene visual findings — sized for a downstream fixer agent to consume.
What the pipeline does
- ffprobe the input — duration, resolution, fps, codec.
- Scene segmentation via ffmpeg scene-detect.
- Audio analysis — EBU R128 loudness windows, dead-air windows.
- Caption check — density per scene, gap windows.
- Per-scene vision pass — frames extracted, sent through
cli/lib/providers/llm.ts. - Score + verdict + findings written to
eval-report.mdandeval.json.
cli/lib/providers/llm.ts — agents never reach for raw provider SDKs.
Quality gates
When the score crosses a refuse threshold, the verb raises one ofE_GATE_SCENARIO, E_GATE_IMAGE, or E_GATE_VIDEO (see Error catalog). Two consecutive gate refusals on the same project are the agent’s stop signal — report concrete options to the user rather than render again.
Full surface: /reference/cli/eval.
ralphy research
Topic-level research that aggregates multiple sources into one synthesis. Sibling toralphy ref <verb> (single URL) — research composes refs into a cross-source report.
workspace/research/<topic>/:
/researcher skill is the human entry-point; this CLI is the contract underneath. Full surface: /reference/cli/research.
Source pulls
add-source chains ralphy ref pull → ref frames → ref analyze → ref blueprint for each URL, then registers the result against the topic. --meta-only skips the heavy passes (useful for sources you only want to cite, not analyze deeply). --frames <n> caps the sample count.
Synthesis
synthesize runs a cross-source LLM pass over every source’s blueprint + transcript + analysis. Default model: google/gemini-2.5-flash. The report includes a cited findings section and per-source pull-quotes the agent can lift verbatim into a scenario.
When to use which
| Symptom | Verb |
|---|---|
| User drops one URL, wants a teardown | ralphy research add-source (or the /researcher skill) |
| User wants a topic-wide audit (“what’s trending in X”) | ralphy research start + several add-source |
| User asks “is this video good?” | ralphy eval video <path> |
| Post-render gate before publish | ralphy eval video <render> |
Related
- Researcher playbook — the agent-side workflow
- Evaluator skill — eval triggers + invariants
- Error catalog —
E_GATE_*codes - cli/commands/eval.ts, cli/commands/research.ts