Ad Script → Production-Ready Audio

Scenario

Full pipeline from creative brief to broadcast-ready audio. Generate ad script variations with Mavera, produce audio with ElevenLabs, then assess quality with Mave Agent — iterating voice settings until production standards are met. Flow: Mavera POST /generations (ad scripts) → ElevenLabs TTS (voice settings) → Mavera POST /mave/chat (quality feedback) → Adjust → Iterate

Code

import os, requests, time, re
EL_KEY = os.environ["ELEVENLABS_API_KEY"]
EL_BASE = "https://api.elevenlabs.io/v1"
MV = os.environ["MAVERA_API_KEY"]
MV_BASE = "https://app.mavera.io/api/v1"
MV_H = {"Authorization": f"Bearer {MV}", "Content-Type": "application/json"}
os.makedirs("ad_production", exist_ok=True)
VOICE_ID, QUALITY_THRESHOLD = "21m00Tcm4TlvDq8ikWAM", 8

# 1. Generate ad script variations
gen = requests.post(f"{MV_BASE}/generations", headers=MV_H, json={
    "prompt": "Write 3 variations of a 30-second radio ad for 'Beacon' AI marketing analytics. "
        "Target: marketing directors. Confident tone. Each: hook, proof point, CTA.\n"
        "VARIATION A: problem-focused. VARIATION B: outcome-focused. VARIATION C: story-focused.",
}).json()
script_text = gen.get("output") or gen.get("content") or ""
variations = [v.strip() for v in re.split(r"VARIATION\s+[A-C]", script_text) if v.strip()]
print(f"Generated {len(variations)} variations")

results = []
for vi, script in enumerate(variations[:3]):
    label = chr(65 + vi)
    settings = {"stability": 0.5 + vi * 0.05, "similarity_boost": 0.75 + vi * 0.05,
                "style": 0.3 + vi * 0.1, "use_speaker_boost": True}
    best_score, best_path = 0, None

    for take in range(1, 4):
        # 2. Generate audio
        tts = requests.post(f"{EL_BASE}/text-to-speech/{VOICE_ID}",
            headers={"xi-api-key": EL_KEY, "Content-Type": "application/json"},
            json={"text": script, "model_id": "eleven_multilingual_v2", "voice_settings": settings})
        if tts.status_code != 200:
            print(f"  TTS error {tts.status_code}"); time.sleep(3); continue

        path = f"ad_production/var_{label}_take_{take}.mp3"
        with open(path, "wb") as f: f.write(tts.content)
        print(f"Var {label} Take {take}: {path} ({len(tts.content) // 1024} KB)")
        time.sleep(2)

        # 3. Quality assessment
        quality = requests.post(f"{MV_BASE}/mave/chat", headers=MV_H, json={
            "message": f"Audio ad quality assessor. Variation {label}, Take {take}.\n"
                f"SCRIPT:\n{script}\nSETTINGS: {settings}\n\n"
                "Score 1-10: PACING, HOOK STRENGTH, CLARITY, EMOTIONAL IMPACT, CTA EFFECTIVENESS.\n"
                "OVERALL SCORE (1-10). If below 8, suggest adjusted stability/similarity_boost/style values.",
        }).json()
        feedback = quality.get("content", "")
        score = int(m[1]) if (m := re.search(r"OVERALL[^0-9]*(\d+)", feedback, re.I)) else 0
        print(f"  Score: {score}/10")
        if score > best_score: best_score, best_path = score, path
        if score >= QUALITY_THRESHOLD: print("  Production-ready!"); break

        # 4. Adjust voice settings from feedback
        for key, pat in [("stability", r"stability[:\s]*([\d.]+)"),
                         ("similarity_boost", r"similarity[_ ]?boost[:\s]*([\d.]+)"),
                         ("style", r"style[:\s]*([\d.]+)")]:
            if m := re.search(pat, feedback, re.I):
                settings[key] = min(1.0, max(0.0, float(m[1])))
        time.sleep(2)

    results.append({"var": label, "score": best_score, "path": best_path})

print(f"\n{'='*60}\nPRODUCTION SUMMARY\n{'='*60}")
for r in results:
    status = "APPROVED" if r["score"] >= QUALITY_THRESHOLD else "NEEDS REVIEW"
    print(f"  Variation {r['var']}: {r['score']}/10 — {status} — {r['path']}")
winner = max(results, key=lambda r: r["score"])
print(f"\nRecommended: Variation {winner['var']} ({winner['score']}/10)")

Example Output

Var A Take 1: 112 KB — 7/10 (pacing fast) → Take 2: 118 KB — 8/10 ✓
Var B Take 1: 124 KB — 6/10 (CTA rushed) → Take 2: 121 KB — 8/10 ✓
Var C Take 1: 134 KB — 9/10 ✓

PRODUCTION SUMMARY
  Variation A: 8/10 — APPROVED — var_A_take_2.mp3
  Variation B: 8/10 — APPROVED — var_B_take_2.mp3
  Variation C: 9/10 — APPROVED — var_C_take_1.mp3
Recommended: Variation C (9/10)

Error Handling

Voice settings range

All values accept 0.0–1.0; code clamps parsed feedback. Higher stability = consistent delivery; higher style = more expressive but less predictable.

Iteration cost management

3 variations × 3 takes = up to 9 TTS calls at ~400-500 characters each. Threshold 7 for drafts, 8+ for production.

Audio format options

Default is 128kbps MP3. For broadcast quality use output_format: "mp3_44100_192" or "pcm_44100" for uncompressed PCM.

​Scenario

​Code

​Example Output

​Error Handling

Scenario

Code

Example Output

Error Handling