multi-clip skill teaches the agent to produce a coherent
multi-shot sequence rather than a collection of unrelated clips. The
characters look the same. The lighting belongs to one film. The
audio threads through.
What it does
Given a brief with multiple beats, “three shots: the explorer enters a cave, finds a glowing crystal, and steps back in awe”, the agent:- Drafts a single shared character anchor (visual descriptors that apply to every shot featuring the protagonist).
- Drafts a palette anchor (color and lighting language).
- Creates a workflow with one shot per beat, each prompt seeded with both anchors plus the beat-specific action.
- Attaches a single voiceover track that spans the full sequence, timed to land beats at the right shots.
- Attaches one music bed across all shots.
- Renders once. Captions burn in over the joined sequence.
The skill body
Sequence
get_status,get_ledger, standard probes.create_workflowwith one shot per beat. Each shot ~3 s unless the user asks for longer. Stitched total ≤ 30 s.- Draft a voiceover script that bridges the beats. Aim for one sentence per shot. Don’t over-explain, let the visuals carry.
attach_trackvoiceover on the FIRST clip with the full script. SetsubtitleStyle: "cinematic".attach_trackmusic on the FIRST clip, single bed for the whole sequence.ducking: true,volume: 0.35.- For each clip,
set_clip_native_audioto{mode: "off"}, the shared voiceover and music carry everything. Avoids native dialogue fighting the narrator. create_renderwithidempotency_key: "multi-${workflow_id}".- Poll. Return URL.
Decision rules
- If the user provides 7+ beats, push back and propose grouping. Sequences over ~30 s lose viewer attention.
- If a beat is dialogue-driven (character speaks on screen), switch
that single clip’s native audio to
mixso the baked dialogue comes through under the narrator. - If
get_status.voice.supports_inline_tags, vary tone across the voiceover script:[whispers]for intimate beats,[building]for crescendos.