Podcast Analysis Pipeline

Scenario

Transcribe podcast episodes with topic detection, paragraph segmentation, and utterance timing. Feed structured transcripts to Mavera for content strategy — which topics resonate, derivative content to create, and editorial calendar gaps. Flow: Deepgram POST /v1/listen?model=nova-3&detect_topics=true&paragraphs=true&utterances=true&diarize=true → topic-tagged transcript → Mavera POST /mave/chat → Content strategy

Code

import os, requests, time

DG = os.environ["DEEPGRAM_API_KEY"]
MV = os.environ["MAVERA_API_KEY"]
MV_BASE = "https://app.mavera.io/api/v1"
MV_H = {"Authorization": f"Bearer {MV}", "Content-Type": "application/json"}

EPISODES = [{"file": "ep-47-ai-marketing.mp3", "title": "EP47: AI in Marketing"},
    {"file": "ep-48-brand-strategy.mp3", "title": "EP48: Brand Strategy 2026"},
    {"file": "ep-49-content-ops.mp3", "title": "EP49: Content Ops at Scale"}]
params = {"model": "nova-3", "smart_format": "true", "punctuate": "true",
    "detect_topics": "true", "paragraphs": "true", "utterances": "true",
    "diarize": "true", "language": "en"}

all_eps = []
for ep in EPISODES:
    with open(ep["file"], "rb") as f:
        resp = requests.post("https://api.deepgram.com/v1/listen", params=params,
            headers={"Authorization": f"Token {DG}", "Content-Type": "audio/mpeg"},
            data=f, timeout=180)
    resp.raise_for_status()
    r = resp.json()
    alt = r["results"]["channels"][0]["alternatives"][0]
    topics = [t["topic"] for s in r["results"].get("topics",{}).get("segments",[])
              for t in s.get("topics",[])]
    paragraphs = alt.get("paragraphs",{}).get("paragraphs",[])
    moments = [{"time": f"{int(p['start']//60)}:{int(p['start']%60):02d}",
        "text": " ".join(s.get("text","") for s in p.get("sentences",[]))[:200]}
        for p in paragraphs[:10]]
    dur = r.get("metadata",{}).get("duration",0)
    all_eps.append({"title": ep["title"], "min": round(dur/60,1),
        "words": len(alt["transcript"].split()), "topics": topics[:8], "moments": moments})
    print(f"{ep['title']} — {round(dur/60,1)}min | {len(alt['transcript'].split())} words | {len(topics)} topics")
    time.sleep(2)

corpus = ""
for e in all_eps:
    corpus += f"\n### {e['title']} ({e['min']}min, {e['words']} words)\n"
    corpus += f"Topics: {', '.join(e['topics'][:5])}\n"
    for m in e["moments"][:5]:
        corpus += f"  [{m['time']}] {m['text'][:150]}\n"

time.sleep(1)
strategy = requests.post(f"{MV_BASE}/mave/chat", headers=MV_H, json={
    "message": f"Podcast content strategist. {len(all_eps)} episodes:\n\n{corpus[:10000]}\n\n"
        "Produce:\n1. **TOPIC HEATMAP** — Topics across episodes\n"
        "2. **RESONANCE SIGNALS** — Longest discussion topics\n"
        "3. **CONTENT DERIVATIVES** — 10 blogs, 5 social threads, 3 video clips\n"
        "4. **GUEST INSIGHTS** — Key quotes to repurpose\n"
        "5. **EDITORIAL CALENDAR** — Next 4 episode topics\n"
}).json()
print(strategy.get("content", "")[:4000])

Example Output

EP47 — 42.3min | 6,847 words | 6 topics
EP48 — 38.7min | 5,921 words | 5 topics
EP49 — 51.2min | 8,334 words | 7 topics

TOPIC HEATMAP
  "AI content generation" — 3/3 episodes (14 min total)
  "brand consistency"     — 2/3 episodes (9 min)

CONTENT DERIVATIVES
  Blog: "Why AI Won't Replace Copywriters (But Will Replace Bad Ones)"
  Clip: EP47 [6:12–8:30] — Hot take on AI content quality

EDITORIAL CALENDAR
  EP50: Content Measurement (gap: only 4 min coverage)
  EP51: AI + Brand Voice (connect EP47 + EP48 themes)

Error Handling

Long podcast episodes

Deepgram handles files up to 2 GB. Set timeout=180 for episodes over 60 minutes. Process batches sequentially with 2-second delays.

Topic detection returns few results

Topic detection needs 5+ minutes of diverse content. Monologue-heavy episodes may return 1-2 topics. Supplement with Mavera analysis of the raw transcript.

Diarization accuracy

For interviews with clear turn-taking, accuracy is 95%+. For panels with crosstalk, process each microphone track separately.

Overview

Salesforce

HubSpot

Pipedrive

Close CRM

Meta Ads

Google Ads

LinkedIn Marketing

TikTok

YouTube

Reddit

X / Twitter

LinkedIn Content

Vimeo

Wistia

Google Analytics (GA4)

Mixpanel

Amplitude

Segment

Mailchimp

Klaviyo

Customer.io

SendGrid

Typeform

SurveyMonkey

Qualtrics

Shopify

Stripe

BigCommerce

SEMrush

Ahrefs

WordPress

NewsAPI

Perigon

Alpha Vantage

Slack

Discord

Twilio

Notion

Asana

Linear

Jira

OpenAI

Anthropic

ElevenLabs

Deepgram

Greenhouse

Lever

LinkedIn Talent

G2

Trustpilot

Google Business

Yelp

Documentation Index

​Scenario

​Code

​Example Output

​Error Handling

Scenario

Code

Example Output

Error Handling