Voice of Customer → Brand Voice

Scenario

Transcribe customer interviews and testimonial recordings. Extract how customers describe pain points, features, and success stories. Aggregate language patterns across interviews and feed them into Mavera Brand Voice to create a profile that sounds like your customers, not your marketing team. Flow: Deepgram POST /v1/listen (multiple interview files) → aggregate customer language → Mavera POST /brand-voices → Customer-authentic Brand Voice

Code

import os, requests, time

DG = os.environ["DEEPGRAM_API_KEY"]
MV = os.environ["MAVERA_API_KEY"]
MV_BASE = "https://app.mavera.io/api/v1"
MV_H = {"Authorization": f"Bearer {MV}", "Content-Type": "application/json"}

INTERVIEWS = [{"file": "interviews/sarah-vp-marketing.wav", "name": "Sarah, VP Marketing"},
    {"file": "interviews/marcus-ops-lead.wav", "name": "Marcus, Ops Lead"},
    {"file": "interviews/priya-cmo.wav", "name": "Priya, CMO"},
    {"file": "interviews/james-founder.wav", "name": "James, Founder"},
    {"file": "interviews/elena-director.wav", "name": "Elena, Director"}]
params = {"model": "nova-3", "smart_format": "true", "punctuate": "true",
    "diarize": "true", "utterances": "true", "paragraphs": "true", "language": "en"}

all_text = []
for iv in INTERVIEWS:
    with open(iv["file"], "rb") as f:
        resp = requests.post("https://api.deepgram.com/v1/listen", params=params,
            headers={"Authorization": f"Token {DG}", "Content-Type": "audio/wav"},
            data=f, timeout=120)
    resp.raise_for_status()
    r = resp.json()
    utts = r.get("results",{}).get("utterances",[])
    dur = r.get("metadata",{}).get("duration",0)
    spk_wc = {}
    for u in utts:
        sp = u.get("speaker",0); spk_wc[sp] = spk_wc.get(sp,0) + len(u.get("transcript","").split())
    cust_sp = max(spk_wc, key=spk_wc.get) if spk_wc else 0
    text = " ".join(u["transcript"] for u in utts if u.get("speaker") == cust_sp)
    all_text.append({"name": iv["name"], "min": round(dur/60,1), "words": len(text.split()), "text": text})
    print(f"{iv['name']} — {round(dur/60,1)}min | {len(text.split())} customer words")
    time.sleep(2)

total = sum(c["words"] for c in all_text)
corpus = "\n\n".join(f"### {c['name']} ({c['min']}min, {c['words']} words)\n{c['text'][:2000]}"
    for c in all_text)

time.sleep(1)
voice = requests.post(f"{MV_BASE}/mave/chat", headers=MV_H, json={
    "message": f"Voice pattern analyst. {len(all_text)} customer interviews.\n\n{corpus[:10000]}\n\n"
        "Extract consistent patterns:\n"
        "1. **VOCABULARY DNA** — Words/phrases used by 3+ customers\n"
        "2. **SENTENCE PATTERNS** — How they structure descriptions\n"
        "3. **METAPHOR INVENTORY** — Analogies they reach for\n"
        "4. **EMOTIONAL REGISTER** — Where animated, where matter-of-fact\n"
        "5. **JARGON USAGE** — Industry terms used vs. avoided\n"
        "6. **STORYTELLING ARC** — How they narrate before/after\n"
        "7. **VOICE SUMMARY** — 200-word brand voice description\n"
}).json()

bv = requests.post(f"{MV_BASE}/brand-voices", headers=MV_H, json={
    "name": "Customer Voice — Interviews Q1 2026",
    "description": f"From {len(INTERVIEWS)} interviews ({total:,} words). "
        f"Customers: {', '.join(c['name'] for c in all_text)}.\n\n"
        f"{voice.get('content','')[:4000]}",
}).json()
print(f"\nBrand Voice created: {bv.get('id')}")
print(voice.get("content","")[:3000])

Example Output

Sarah (18.4min, 2847w) | Marcus (12.7min, 1923w) | Priya (22.1min, 3412w)
James (9.8min, 1541w) | Elena (15.3min, 2398w) → 12,121 words total
Brand Voice created: bv_3f8a2c9d

VOCABULARY DNA (3+ customers)
  "game changer" — 4/5 | "before/after" framing — 5/5
  "my team" — 4/5 center on team, not themselves
SENTENCE PATTERNS
  Short declarative + example: "It just works. Last week we shipped
  40 assets in a day — that used to take a full sprint."
METAPHOR INVENTORY
  "flying blind" → "having a map" (3/5)

VOICE SUMMARY
  Practical, results-oriented. Leads with what changed, not what
  the product does. Tone: "competent peer sharing a win."

Error Handling

Interview audio quality

Zoom/phone recordings have artifacts. Nova-3 handles this well, but aim for 16kHz mono minimum. If diarization struggles, use separate tracks per speaker.

Not enough interviews

Brand voice needs 5+ interviews (10,000+ words) for reliable patterns. Fewer over-fits to individual styles. Supplement with written reviews or support tickets.

Brand Voice description length

Mavera accepts up to 5,000 characters. For richer profiles, create two brand voices: tone/vocabulary rules + example rewrites and do/don’t lists.

Overview

Salesforce

HubSpot

Pipedrive

Close CRM

Meta Ads

Google Ads

LinkedIn Marketing

TikTok

YouTube

Reddit

X / Twitter

LinkedIn Content

Vimeo

Wistia

Google Analytics (GA4)

Mixpanel

Amplitude

Segment

Mailchimp

Klaviyo

Customer.io

SendGrid

Typeform

SurveyMonkey

Qualtrics

Shopify

Stripe

BigCommerce

SEMrush

Ahrefs

WordPress

NewsAPI

Perigon

Alpha Vantage

Slack

Discord

Twilio

Notion

Asana

Linear

Jira

OpenAI

Anthropic

ElevenLabs

Deepgram

Greenhouse

Lever

LinkedIn Talent

G2

Trustpilot

Google Business

Yelp

Documentation Index

​Scenario

​Code

​Example Output

​Error Handling

Scenario

Code

Example Output

Error Handling