Caption-Based Content Extraction

Scenario

Your Vimeo videos contain hours of spoken content locked inside video files. This job pulls text tracks (captions/transcripts) via GET /videos/{id}/texttracks, downloads the transcript text, then sends it to Mavera Chat to extract key messages, claims, and CTAs. Finally, it uses Generate to repurpose the extracted content into blog post drafts, social media posts, and email copy. The result is a content repurposing pipeline that turns every video into 5+ written assets — without a human watching the footage.

Architecture

Code

import os, requests, time

VM = os.environ["VIMEO_ACCESS_TOKEN"]
MV = os.environ["MAVERA_API_KEY"]
VM_BASE = "https://api.vimeo.com"
MV_BASE = "https://app.mavera.io/api/v1"
VM_H = {"Authorization": f"Bearer {VM}", "Accept": "application/vnd.vimeo.*+json;version=3.4"}
MV_H = {"Authorization": f"Bearer {MV}", "Content-Type": "application/json"}

VIDEO_IDS = ["123456789", "987654321", "456789012"]

all_repurposed = []

for video_id in VIDEO_IDS:
    # 1. Fetch text tracks
    tracks_resp = requests.get(
        f"{VM_BASE}/videos/{video_id}/texttracks", headers=VM_H
    ).json()

    tracks = tracks_resp.get("data", [])
    if not tracks:
        print(f"No captions for video {video_id} — skipping")
        continue

    caption_track = next(
        (t for t in tracks if t.get("type") == "captions" and t.get("language") == "en"),
        tracks[0]
    )

    # 2. Download the transcript file
    transcript_url = caption_track.get("link")
    if not transcript_url:
        print(f"No download link for video {video_id} — skipping")
        continue

    transcript_raw = requests.get(transcript_url).text

    # 3. Clean VTT/SRT formatting to plain text
    lines = []
    for line in transcript_raw.split("\n"):
        line = line.strip()
        if not line or line.startswith("WEBVTT") or "-->" in line or line.isdigit():
            continue
        lines.append(line)
    transcript = " ".join(lines)

    # 4. Get video metadata for context
    video_meta = requests.get(f"{VM_BASE}/videos/{video_id}", headers=VM_H, params={
        "fields": "name,description,duration",
    }).json()
    video_name = video_meta.get("name", f"Video {video_id}")

    print(f"Processing: \"{video_name}\" ({len(transcript)} chars transcript)")

    # 5. Extract key messages via Mave Chat
    extraction = requests.post(f"{MV_BASE}/mave/chat", headers=MV_H, json={
        "message": f"""Extract structured content from this video transcript.

VIDEO: "{video_name}"
TRANSCRIPT:
{transcript[:8000]}

Extract:
1. **Key Messages** (3-5 core points the speaker makes)
2. **Claims & Statistics** (any data points, percentages, or factual claims)
3. **CTAs** (calls to action — explicit or implied)
4. **Quotable Lines** (5-7 sentences that work standalone as social posts)
5. **Topic Tags** (10 keywords for SEO/categorization)""",
    }).json()

    extracted = extraction.get("content", "")

    # 6. Generate repurposed content
    repurpose = requests.post(f"{MV_BASE}/generate", headers=MV_H, json={
        "prompt": f"""Using this extracted content from a video, generate repurposed written assets.

SOURCE VIDEO: "{video_name}"
EXTRACTED CONTENT:
{extracted[:4000]}

Generate these assets:

1. **Blog Post Outline** — 800-word post structure with H2s, key points per section, and SEO title
2. **LinkedIn Post** — 150-word thought leadership post with hook, insight, CTA
3. **Twitter/X Thread** — 5-tweet thread with a hook tweet and numbered insights
4. **Email Newsletter Block** — 100-word summary for a weekly newsletter with subject line
5. **Instagram Caption** — 80 words max with relevant hashtags""",
    }).json()

    all_repurposed.append({
        "video": video_name,
        "video_id": video_id,
        "transcript_length": len(transcript),
        "extracted": extracted[:1000],
        "repurposed": repurpose.get("content", "")[:2000],
    })
    time.sleep(1)

# 7. Output
print("\nCAPTION-BASED CONTENT EXTRACTION")
print("=" * 60)
for item in all_repurposed:
    print(f"\n{'─' * 60}")
    print(f"VIDEO: {item['video']} ({item['transcript_length']:,} chars)")
    print(f"\nEXTRACTED CONTENT:")
    print(item["extracted"][:600])
    print(f"\nREPURPOSED ASSETS:")
    print(item["repurposed"][:1200])

Example Output

Processing: "Q1 Product Launch Keynote" (12,480 chars transcript)
Processing: "Customer Success Webinar — Acme Corp" (8,200 chars transcript)
No captions for video 456789012 — skipping

CAPTION-BASED CONTENT EXTRACTION
============================================================

──────────────────────────────────────────────────────────────
VIDEO: Q1 Product Launch Keynote (12,480 chars)

EXTRACTED CONTENT:
## Key Messages
1. The new dashboard reduces reporting time from 4 hours to 15 minutes
2. AI-powered anomaly detection catches issues 3 days before manual review
3. Integration with existing tools requires zero code changes

## Claims & Statistics
- "4 hours to 15 minutes" (93% time reduction)
- "3 days earlier detection"
- "200+ enterprise customers in beta"
- "99.7% accuracy in anomaly detection"

## CTAs
- "Sign up for the beta at example.com/beta"
- "Book a demo with our team" (implied, slide shown at 4:32)

## Quotable Lines
- "The best dashboard is one you never have to open."
- "We didn't build another analytics tool. We built the one that tells you when to look."

REPURPOSED ASSETS:

### Blog Post Outline
Title: "How AI Anomaly Detection Catches Problems 3 Days Before You Do"
H2: The Hidden Cost of Manual Reporting (problem framing)
H2: From 4 Hours to 15 Minutes (the transformation)
H2: How Anomaly Detection Actually Works (technical credibility)
H2: What 200+ Beta Customers Discovered (social proof)
H2: Getting Started (CTA)

### LinkedIn Post
"Your team spends 4 hours building a report that's outdated by the time
it's finished. We cut that to 15 minutes — and added anomaly detection
that catches issues 3 days before manual review ever would. 200+ beta
customers are already seeing results. Here's what they found →"

### Twitter/X Thread
1/ Your reporting workflow is broken. Here's why — and what 200+
   companies are doing instead. 🧵
2/ Problem: Manual reporting takes 4 hours. By the time it's done,
   the data is stale and the damage is done.
3/ Solution: AI anomaly detection catches issues 3 days before you'd
   find them manually. 99.7% accuracy.
4/ Result: 93% reduction in reporting time. Teams spend those hours
   on strategy instead of spreadsheets.
5/ Want to see it? Beta is open → example.com/beta

Error Handling

No text tracks available

Many Vimeo videos lack captions. If texttracks returns empty, use Vimeo’s auto-captioning feature (available on paid plans) or upload an SRT file first. Alternatively, send the video directly to Mavera for transcript extraction via Video Analysis.

VTT format parsing

Vimeo text tracks can be in VTT, SRT, or DFXP format. The code strips VTT headers, timestamps, and sequence numbers. For DFXP (XML-based), use an XML parser to extract the text nodes.

Transcript length limits

Very long videos (60+ minutes) produce transcripts exceeding Mavera Chat’s context window. The code truncates to 8,000 characters. For longer content, split the transcript into 5-minute chunks and extract from each separately.

What’s Next

Vimeo Integration

Back to Vimeo integration overview

Webinar Intelligence

Track engagement across webinar series

Mave Agent

Full reference for POST /api/v1/mave/chat

Generate API

Full reference for POST /api/v1/generate

Video Engagement × Mavera Scoring Correlation

Webinar Series Intelligence

⌘I

Scenario
Architecture
Code
Example Output
Error Handling
What’s Next

Overview

Salesforce

HubSpot

Pipedrive

Close CRM

Meta Ads

Google Ads

LinkedIn Marketing

TikTok

YouTube

Reddit

X / Twitter

LinkedIn Content

Vimeo

Wistia

Google Analytics (GA4)

Mixpanel

Amplitude

Segment

Mailchimp

Klaviyo

Customer.io

SendGrid

Typeform

SurveyMonkey

Qualtrics

Shopify

Stripe

BigCommerce

SEMrush

Ahrefs

WordPress

NewsAPI

Perigon

Alpha Vantage

Slack

Discord

Twilio

Notion

Asana

Linear

Jira

OpenAI

Anthropic

ElevenLabs

Deepgram

Greenhouse

Lever

LinkedIn Talent

G2

Trustpilot

Google Business

Yelp

Documentation Index

​Scenario

​Architecture

​Code

​Example Output

​Error Handling

​What’s Next