Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.mavera.io/llms.txt

Use this file to discover all available pages before exploring further.

Scenario

Take Mavera-generated content — blog posts, ad scripts, podcast intros, email copy — and convert them to high-quality audio using OpenAI’s TTS models. Build an audio content library from your text assets. Choose from six voices (alloy, echo, fable, onyx, nova, shimmer) and two quality tiers to match your brand tone. Flow: Mavera POST /generations → text content → OpenAI POST /audio/speech (TTS) → audio files (.mp3)

Code

import os, requests, time
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
MV = os.environ["MAVERA_API_KEY"]
MV_BASE = "https://app.mavera.io/api/v1"
MV_H = {"Authorization": f"Bearer {MV}", "Content-Type": "application/json"}

OUTPUT_DIR = "audio_library"
os.makedirs(OUTPUT_DIR, exist_ok=True)

CONTENT_SPECS = [
    {"prompt": "Write a 150-word blog post intro about AI-powered marketing automation.",
     "voice": "nova", "label": "blog-intro"},
    {"prompt": "Write a 30-second radio ad script for a SaaS analytics platform.",
     "voice": "onyx", "label": "radio-ad"},
    {"prompt": "Write a 60-second podcast intro for a marketing strategy episode.",
     "voice": "shimmer", "label": "podcast-intro"},
    {"prompt": "Write a 100-word product announcement for a new dashboard feature.",
     "voice": "alloy", "label": "product-announcement"},
]

results = []
for spec in CONTENT_SPECS:
    gen = requests.post(f"{MV_BASE}/generations", headers=MV_H, json={
        "prompt": spec["prompt"],
    }).json()
    text = gen.get("output") or gen.get("content") or ""
    word_count = len(text.split())
    print(f"[{spec['label']}] Generated {word_count} words")
    time.sleep(1)

    response = client.audio.speech.create(
        model="tts-1-hd", voice=spec["voice"],
        input=text, response_format="mp3", speed=1.0,
    )
    output_path = os.path.join(OUTPUT_DIR, f"{spec['label']}-{spec['voice']}.mp3")
    response.stream_to_file(output_path)
    file_size = os.path.getsize(output_path)
    results.append({"label": spec["label"], "voice": spec["voice"],
                     "words": word_count, "size_kb": file_size // 1024})
    print(f"  Audio: {output_path} ({file_size // 1024} KB)")
    time.sleep(1)

print(f"\n{'='*60}\nAUDIO LIBRARY SUMMARY\n{'='*60}")
for r in results:
    print(f"  {r['label']:25s} | {r['voice']:8s} | {r['words']:4d} words | {r['size_kb']:5d} KB")
print(f"Total: {len(results)} files, {sum(r['size_kb'] for r in results)} KB")

Example Output

[blog-intro] Generated 152 words → blog-intro-nova.mp3 (187 KB)
[radio-ad] Generated 78 words → radio-ad-onyx.mp3 (96 KB)
[podcast-intro] Generated 134 words → podcast-intro-shimmer.mp3 (164 KB)
[product-announcement] Generated 103 words → product-announcement-alloy.mp3 (128 KB)

AUDIO LIBRARY: 4 files, 575 KB total

Error Handling

TTS accepts up to 4,096 characters per request. For longer content, split at sentence boundaries and concatenate audio files with ffmpeg -f concat -i list.txt -c copy output.mp3.
Each voice has a distinct character: alloy (neutral), echo (warm male), fable (expressive British), onyx (deep authoritative), nova (friendly female), shimmer (soft gentle). Test 2-3 voices per content type before standardizing.
tts-1 is faster and cheaper but lower quality. tts-1-hd produces broadcast-quality audio with less distortion. Use tts-1 for internal drafts and tts-1-hd for published content.