Skip to main content
The storyteller skill teaches the agent to produce a finished short-form video from a single creative brief. End-to-end in ~6 tool calls.

What it does

Given a user prompt like “a sleepy fox in a bookshop discovering an old map,” the agent:
  1. Probes capabilities (get_status), which providers are wired up?
  2. Checks the budget (get_ledger), bail if available < 20 credits.
  3. Creates a workflow with the shot.
  4. Drafts a short script in a warm storyteller register.
  5. Attaches a voiceover with auto-captions.
  6. Attaches a music bed matching the mood, ducking on.
  7. Sets the audio mix to mix with native volume 0.6.
  8. Renders and polls until done.
  9. Returns the video URL.

The skill body

Copy this verbatim into your skill loader, or npx -y @lavendly/skills install storyteller.
---
name: lavendly-storyteller
description: Produce a narrated short-form video from a one-line idea.
trigger: When the user asks for a video, clip, short, reel, or visual story.
---

# Lavendly · Storyteller

You have access to the Lavendly MCP. Use it to produce a narrated
short-form video from the user's brief.

## Canonical sequence

1. `get_status` → confirm voice + music providers are available. If
   not, ask the user to set them up before continuing.
2. `get_ledger` → bail with a friendly message if `available < 20`.
3. `create_workflow` with a single shot. Duration defaults to 5 s
   unless the user specifies otherwise (cap at 12 s).
4. Draft a 1-2 sentence script in a warm storyteller register. Speak
   in present tense. Avoid clichés ("once upon a time," "in a world
   where").
5. `attach_track` voiceover with the script and `subtitleStyle: "tiktok"`.
   Set `idempotency_key: "vo-${workflow_id}"`.
6. `attach_track` music. Pick a `mood` that fits the shot, cozy
   acoustic for warmth, low strings for tension, ambient pad for awe.
   `ducking: true`, `volume: 0.4`.
7. `set_clip_native_audio` on the shot: `{mode: "mix", volume: 0.6}`.
8. `create_render` with `idempotency_key: "render-${workflow_id}"`.
9. Poll `get_render` every 4 s until status is `done` or `failed`.
10. Return `result.video_url` to the user.

## Decision rules

- Captions default ON unless the user says otherwise.
- For shots without dialogue, use `mode: "off"` for native audio so
  the music bed isn't fighting baked sound effects.
- If `get_status.voice.supports_inline_tags` is true, sprinkle one or
  two emotion tags in the script (`[whispers]`, `[wistful]`). Don't
  overdo it.

## When to bail

- Available credits below the cost preview.
- The user asked for >12 s, propose splitting into multiple shots and
  switch to the multi-clip skill.
- Render failed twice in a row with the same error code.

Try it

Once installed:
Make me a 5-second video of a barista pulling a perfect espresso shot at dawn. Narrate it like an old French film.
The agent uses the skill verbatim, checks providers, drafts a script in a specific register, picks a low-strings music bed, renders, returns the URL. ~30 seconds wall clock.

Why this works better than freeform prompting

Without the skill, an agent given the same brief will often:
  • Skip the capability probe and fail mid-render when voice isn’t wired.
  • Render before attaching audio because nothing told it the order.
  • Use a different idempotency key on retry, double-billing the user.
  • Pick a generic “cinematic” music mood for every brief.
The skill is short (under 400 words) but every line removes one of those failure modes.