Wistia Embeds → Brand Voice Source

Scenario

Your company’s video library contains hours of your team speaking — product demos, customer calls, webinars, company updates — and that spoken content carries a distinct voice that your written marketing often fails to capture. This job pulls transcripts from all your Wistia videos, feeds the combined text into Mavera’s Brand Voice creation, and produces a spoken-content voice profile. The result is a brand voice guide derived not from a branding agency’s aspirational document, but from how your team actually sounds when they’re on camera.

Architecture

Code

import os, requests, time

WS = os.environ["WISTIA_API_TOKEN"]
MV = os.environ["MAVERA_API_KEY"]
WS_BASE = "https://api.wistia.com"
MV_BASE = "https://app.mavera.io/api/v1"
WS_H = {"Authorization": f"Bearer {WS}", "Accept": "application/json"}
MV_H = {"Authorization": f"Bearer {MV}", "Content-Type": "application/json"}

# 1. Fetch all medias
medias = []
page = 1
while True:
    resp = requests.get(f"{WS_BASE}/v1/medias.json", headers=WS_H, params={
        "per_page": 100, "page": page,
    }).json()
    if not resp:
        break
    medias.extend(resp)
    page += 1
    time.sleep(0.2)

print(f"Found {len(medias)} videos")

# 2. Pull transcripts for each video
all_transcripts = []
for media in medias:
    hashed_id = media.get("hashed_id", "")
    name = media.get("name", "Untitled")

    # Wistia auto-generates captions; fetch them
    captions_resp = requests.get(
        f"{WS_BASE}/v1/medias/{hashed_id}/captions.json", headers=WS_H
    )
    if captions_resp.status_code != 200:
        continue

    captions = captions_resp.json()
    if not captions:
        continue

    # Get English captions (or first available)
    caption_track = next(
        (c for c in captions if c.get("language") == "eng"),
        captions[0] if captions else None
    )
    if not caption_track:
        continue

    # Extract text lines from SRT data
    srt_lines = caption_track.get("text", "")
    clean_lines = []
    for line in srt_lines.split("\n"):
        line = line.strip()
        if not line or "-->" in line or line.isdigit():
            continue
        clean_lines.append(line)

    transcript = " ".join(clean_lines)
    if len(transcript) < 100:
        continue

    all_transcripts.append({
        "name": name,
        "text": transcript,
        "duration": media.get("duration", 0),
        "type": media.get("type", "Video"),
    })
    time.sleep(0.15)

print(f"Extracted transcripts from {len(all_transcripts)} videos")
total_words = sum(len(t["text"].split()) for t in all_transcripts)
print(f"Total words: {total_words:,}")

# 3. Combine transcripts (cap at context window)
combined_text = "\n\n---\n\n".join(
    f"[{t['name']}]\n{t['text'][:2000]}"
    for t in all_transcripts[:20]
)

# 4. Create Brand Voice from spoken content
brand_voice = requests.post(f"{MV_BASE}/brand-voice", headers=MV_H, json={
    "name": "Spoken Content Voice Profile",
    "source_type": "transcript",
    "content": combined_text[:30000],
    "instructions": (
        "Analyze this collection of video transcripts to extract the brand's "
        "authentic speaking voice. Focus on: vocabulary patterns, sentence rhythm, "
        "how they explain complex topics, humor style, filler words to avoid, "
        "transition phrases, opening patterns, and closing patterns. This voice "
        "should capture how the team ACTUALLY sounds — not how a branding agency "
        "WANTS them to sound."
    ),
}).json()

# 5. Poll for completion
for _ in range(30):
    time.sleep(3)
    voice_status = requests.get(
        f"{MV_BASE}/brand-voice/{brand_voice['id']}", headers=MV_H
    ).json()
    if voice_status.get("status") == "completed":
        break

profile = voice_status.get("profile", {})

print("\nWISTIA EMBEDS → BRAND VOICE PROFILE")
print("=" * 60)
print(f"Videos analyzed: {len(all_transcripts)}")
print(f"Total words: {total_words:,}")
print(f"Voice ID: {brand_voice.get('id', 'N/A')}")
print(f"\n{'─' * 60}")
print("VOICE PROFILE:")
print(f"\nTone: {profile.get('tone', 'N/A')}")
print(f"Vocabulary Level: {profile.get('vocabulary_level', 'N/A')}")
print(f"Sentence Style: {profile.get('sentence_style', 'N/A')}")
print(f"\nKey Patterns:")
for pattern in profile.get("patterns", [])[:10]:
    print(f"  • {pattern}")
print(f"\nAvoid:")
for avoid in profile.get("avoid", [])[:5]:
    print(f"  ✗ {avoid}")
print(f"\nFull Profile:\n{profile.get('summary', '')[:1500]}")

Example Output

Found 87 videos
Extracted transcripts from 62 videos
Total words: 184,200

WISTIA EMBEDS → BRAND VOICE PROFILE
============================================================
Videos: 62 | Words: 184,200 | Voice ID: bv_sp0k3nv01c3

Tone: Conversational expert — confident without being authoritative,
  approachable without being casual.
Vocabulary: Mid-level technical. Uses industry terms but always defines
  them in the same sentence ("anomaly detection — basically, the system
  flags things that look weird before you even notice").
Sentence Style: Medium-length, parenthetical-heavy. Frequent use of
  dashes and asides for clarification.

Key Patterns:
  • Opens with a question: "So, have you ever..." (found in 71% of videos)
  • Uses "right?" as a rhetorical confirmation after statements
  • Explains features through metaphor: "Think of it like a..."
  • Transitions with "Now here's where it gets interesting"
  • Closes with a challenge: "Try it and tell me if..."
  • Uses specific numbers: "saves 4 hours" not "saves time"
  • Self-deprecating humor: "We tried 6 approaches before this one worked"
  • Acknowledges the viewer's skepticism before addressing it

Avoid:
  ✗ Corporate jargon: "leverage," "synergize," "best-in-class" (never used)
  ✗ Passive voice (only 4% of sentences)
  ✗ Long preambles before the point (team gets to the point in <10 seconds)
  ✗ Hyperbolic claims without numbers ("the best" without proof)
  ✗ Acronyms without explanation on first use

Error Handling

Captions not available

Wistia auto-generates captions for videos on paid plans. Free accounts may not have captions. Use GET /v1/medias/{id}/captions.json to check availability. If empty, upload SRT files manually or use Mavera’s Video Analysis for transcript extraction.

Transcript quality

Auto-generated captions have 85-95% accuracy. Speaker names, technical terms, and brand names are often misrecognized. For brand voice extraction, the volume of text (60+ videos) compensates for per-transcript errors.

Context window limits

The combined transcript (30,000 chars) represents roughly 20 videos. For larger libraries, prioritize videos by type: customer-facing content first, internal recordings last. The voice profile reflects the input sample.

What’s Next

Wistia Integration

Back to Wistia integration overview

CTA Performance × Focus Group

Optimize CTA placement and messaging

Lead-Qualified Viewer Follow-up

Generate personalized follow-up emails

Brand Voice API

Full reference for POST /api/v1/brand-voice

CTA Performance × Focus Group

Lead-Qualified Viewer → Personalized Follow-up

⌘I

Scenario
Architecture
Code
Example Output
Error Handling
What’s Next

Overview

Salesforce

HubSpot

Pipedrive

Close CRM

Meta Ads

Google Ads

LinkedIn Marketing

TikTok

YouTube

Reddit

X / Twitter

LinkedIn Content

Vimeo

Wistia

Google Analytics (GA4)

Mixpanel

Amplitude

Segment

Mailchimp

Klaviyo

Customer.io

SendGrid

Typeform

SurveyMonkey

Qualtrics

Shopify

Stripe

BigCommerce

SEMrush

Ahrefs

WordPress

NewsAPI

Perigon

Alpha Vantage

Slack

Discord

Twilio

Notion

Asana

Linear

Jira

OpenAI

Anthropic

ElevenLabs

Deepgram

Greenhouse

Lever

LinkedIn Talent

G2

Trustpilot

Google Business

Yelp

Documentation Index

​Scenario

​Architecture

​Code

​Example Output

​Error Handling

​What’s Next

Wistia Integration

CTA Performance × Focus Group

Lead-Qualified Viewer Follow-up

Brand Voice API

Scenario

Architecture

Code

Example Output

Error Handling

What’s Next