Skip to main content
The multi-clip skill teaches the agent to produce a coherent multi-shot sequence rather than a collection of unrelated clips. The characters look the same. The lighting belongs to one film. The audio threads through.

What it does

Given a brief with multiple beats, “three shots: the explorer enters a cave, finds a glowing crystal, and steps back in awe”, the agent:
  1. Drafts a single shared character anchor (visual descriptors that apply to every shot featuring the protagonist).
  2. Drafts a palette anchor (color and lighting language).
  3. Creates a workflow with one shot per beat, each prompt seeded with both anchors plus the beat-specific action.
  4. Attaches a single voiceover track that spans the full sequence, timed to land beats at the right shots.
  5. Attaches one music bed across all shots.
  6. Renders once. Captions burn in over the joined sequence.

The skill body

---
name: lavendly-multi-clip
description: Produce a coherent multi-shot sequence from a multi-beat brief.
trigger: When the user describes 2-6 distinct moments or shots in one request.
---

# Lavendly · Multi-clip

## Anchors first

Before creating the workflow, draft TWO anchor blocks in the
conversation:

- **Character anchor**, 1-2 sentences describing the protagonist's
  appearance (clothing, build, age, distinguishing features). Reused
  verbatim in every shot prompt.
- **Palette anchor**, 1 sentence on lighting, color, and overall
  visual language. Reused verbatim in every shot prompt.

Show the anchors to the user and confirm before spending credits. If
the user says "go," continue.

## Per-shot prompt template

Each shot prompt is constructed as:

```text
<character anchor>
<palette anchor>
<beat-specific action>
Place beat-specific action last so the model attends to it most.

Sequence

  1. get_status, get_ledger, standard probes.
  2. create_workflow with one shot per beat. Each shot ~3 s unless the user asks for longer. Stitched total ≤ 30 s.
  3. Draft a voiceover script that bridges the beats. Aim for one sentence per shot. Don’t over-explain, let the visuals carry.
  4. attach_track voiceover on the FIRST clip with the full script. Set subtitleStyle: "cinematic".
  5. attach_track music on the FIRST clip, single bed for the whole sequence. ducking: true, volume: 0.35.
  6. For each clip, set_clip_native_audio to {mode: "off"}, the shared voiceover and music carry everything. Avoids native dialogue fighting the narrator.
  7. create_render with idempotency_key: "multi-${workflow_id}".
  8. Poll. Return URL.

Decision rules

  • If the user provides 7+ beats, push back and propose grouping. Sequences over ~30 s lose viewer attention.
  • If a beat is dialogue-driven (character speaks on screen), switch that single clip’s native audio to mix so the baked dialogue comes through under the narrator.
  • If get_status.voice.supports_inline_tags, vary tone across the voiceover script: [whispers] for intimate beats, [building] for crescendos.

## Why it produces coherent sequences

The trick is the **anchor block discipline**. Models drift when you
write each prompt fresh, the protagonist gets a new jacket, the
lighting shifts from golden hour to blue hour, the world doesn't feel
like one film.

Seeding every shot prompt with the same two paragraphs (character +
palette) gives the model the same starting state, every time. Then
the beat-specific action varies only what's supposed to vary.

## Try it

> Three shots: explorer enters a cave, finds a glowing crystal, steps
> back in awe. Cinematic. Wide shots. Narration in a low documentary
> voice.

The agent will pause to show you the character anchor and palette
anchor before creating anything, that's the cheap moment to redirect
("make him older," "more amber, less blue"). After confirmation, six
tool calls produce the finished sequence.