Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.mavera.io/llms.txt

Use this file to discover all available pages before exploring further.

Scenario

Your company’s video library contains hours of your team speaking — product demos, customer calls, webinars, company updates — and that spoken content carries a distinct voice that your written marketing often fails to capture. This job pulls transcripts from all your Wistia videos, feeds the combined text into Mavera’s Brand Voice creation, and produces a spoken-content voice profile. The result is a brand voice guide derived not from a branding agency’s aspirational document, but from how your team actually sounds when they’re on camera.

Architecture

Code

import os, requests, time

WS = os.environ["WISTIA_API_TOKEN"]
MV = os.environ["MAVERA_API_KEY"]
WS_BASE = "https://api.wistia.com"
MV_BASE = "https://app.mavera.io/api/v1"
WS_H = {"Authorization": f"Bearer {WS}", "Accept": "application/json"}
MV_H = {"Authorization": f"Bearer {MV}", "Content-Type": "application/json"}

# 1. Fetch all medias
medias = []
page = 1
while True:
    resp = requests.get(f"{WS_BASE}/v1/medias.json", headers=WS_H, params={
        "per_page": 100, "page": page,
    }).json()
    if not resp:
        break
    medias.extend(resp)
    page += 1
    time.sleep(0.2)

print(f"Found {len(medias)} videos")

# 2. Pull transcripts for each video
all_transcripts = []
for media in medias:
    hashed_id = media.get("hashed_id", "")
    name = media.get("name", "Untitled")

    # Wistia auto-generates captions; fetch them
    captions_resp = requests.get(
        f"{WS_BASE}/v1/medias/{hashed_id}/captions.json", headers=WS_H
    )
    if captions_resp.status_code != 200:
        continue

    captions = captions_resp.json()
    if not captions:
        continue

    # Get English captions (or first available)
    caption_track = next(
        (c for c in captions if c.get("language") == "eng"),
        captions[0] if captions else None
    )
    if not caption_track:
        continue

    # Extract text lines from SRT data
    srt_lines = caption_track.get("text", "")
    clean_lines = []
    for line in srt_lines.split("\n"):
        line = line.strip()
        if not line or "-->" in line or line.isdigit():
            continue
        clean_lines.append(line)

    transcript = " ".join(clean_lines)
    if len(transcript) < 100:
        continue

    all_transcripts.append({
        "name": name,
        "text": transcript,
        "duration": media.get("duration", 0),
        "type": media.get("type", "Video"),
    })
    time.sleep(0.15)

print(f"Extracted transcripts from {len(all_transcripts)} videos")
total_words = sum(len(t["text"].split()) for t in all_transcripts)
print(f"Total words: {total_words:,}")

# 3. Combine transcripts (cap at context window)
combined_text = "\n\n---\n\n".join(
    f"[{t['name']}]\n{t['text'][:2000]}"
    for t in all_transcripts[:20]
)

# 4. Create Brand Voice from spoken content
brand_voice = requests.post(f"{MV_BASE}/brand-voice", headers=MV_H, json={
    "name": "Spoken Content Voice Profile",
    "source_type": "transcript",
    "content": combined_text[:30000],
    "instructions": (
        "Analyze this collection of video transcripts to extract the brand's "
        "authentic speaking voice. Focus on: vocabulary patterns, sentence rhythm, "
        "how they explain complex topics, humor style, filler words to avoid, "
        "transition phrases, opening patterns, and closing patterns. This voice "
        "should capture how the team ACTUALLY sounds — not how a branding agency "
        "WANTS them to sound."
    ),
}).json()

# 5. Poll for completion
for _ in range(30):
    time.sleep(3)
    voice_status = requests.get(
        f"{MV_BASE}/brand-voice/{brand_voice['id']}", headers=MV_H
    ).json()
    if voice_status.get("status") == "completed":
        break

profile = voice_status.get("profile", {})

print("\nWISTIA EMBEDS → BRAND VOICE PROFILE")
print("=" * 60)
print(f"Videos analyzed: {len(all_transcripts)}")
print(f"Total words: {total_words:,}")
print(f"Voice ID: {brand_voice.get('id', 'N/A')}")
print(f"\n{'─' * 60}")
print("VOICE PROFILE:")
print(f"\nTone: {profile.get('tone', 'N/A')}")
print(f"Vocabulary Level: {profile.get('vocabulary_level', 'N/A')}")
print(f"Sentence Style: {profile.get('sentence_style', 'N/A')}")
print(f"\nKey Patterns:")
for pattern in profile.get("patterns", [])[:10]:
    print(f"  • {pattern}")
print(f"\nAvoid:")
for avoid in profile.get("avoid", [])[:5]:
    print(f"  ✗ {avoid}")
print(f"\nFull Profile:\n{profile.get('summary', '')[:1500]}")

Example Output

Found 87 videos
Extracted transcripts from 62 videos
Total words: 184,200

WISTIA EMBEDS → BRAND VOICE PROFILE
============================================================
Videos: 62 | Words: 184,200 | Voice ID: bv_sp0k3nv01c3

Tone: Conversational expert — confident without being authoritative,
  approachable without being casual.
Vocabulary: Mid-level technical. Uses industry terms but always defines
  them in the same sentence ("anomaly detection — basically, the system
  flags things that look weird before you even notice").
Sentence Style: Medium-length, parenthetical-heavy. Frequent use of
  dashes and asides for clarification.

Key Patterns:
  • Opens with a question: "So, have you ever..." (found in 71% of videos)
  • Uses "right?" as a rhetorical confirmation after statements
  • Explains features through metaphor: "Think of it like a..."
  • Transitions with "Now here's where it gets interesting"
  • Closes with a challenge: "Try it and tell me if..."
  • Uses specific numbers: "saves 4 hours" not "saves time"
  • Self-deprecating humor: "We tried 6 approaches before this one worked"
  • Acknowledges the viewer's skepticism before addressing it

Avoid:
  ✗ Corporate jargon: "leverage," "synergize," "best-in-class" (never used)
  ✗ Passive voice (only 4% of sentences)
  ✗ Long preambles before the point (team gets to the point in <10 seconds)
  ✗ Hyperbolic claims without numbers ("the best" without proof)
  ✗ Acronyms without explanation on first use

Error Handling

Wistia auto-generates captions for videos on paid plans. Free accounts may not have captions. Use GET /v1/medias/{id}/captions.json to check availability. If empty, upload SRT files manually or use Mavera’s Video Analysis for transcript extraction.
Auto-generated captions have 85-95% accuracy. Speaker names, technical terms, and brand names are often misrecognized. For brand voice extraction, the volume of text (60+ videos) compensates for per-transcript errors.
The combined transcript (30,000 chars) represents roughly 20 videos. For larger libraries, prioritize videos by type: customer-facing content first, internal recordings last. The voice profile reflects the input sample.

What’s Next

Wistia Integration

Back to Wistia integration overview

CTA Performance × Focus Group

Optimize CTA placement and messaging

Lead-Qualified Viewer Follow-up

Generate personalized follow-up emails

Brand Voice API

Full reference for POST /api/v1/brand-voice