Subreddit Voice Mining

Scenario

Your brand wants to sound like the community it serves. This job pulls hot posts and comment threads, concatenates organic language into a corpus, then feeds it to Mavera’s Brand Voice endpoint as extracted_content. The output is a “community voice” that captures vocabulary, rhythm, humor, and register.

Architecture

Code

import os, requests, time

auth = requests.auth.HTTPBasicAuth(os.environ["REDDIT_CLIENT_ID"], os.environ["REDDIT_CLIENT_SECRET"])
tk = requests.post("https://www.reddit.com/api/v1/access_token", auth=auth,
    data={"grant_type": "client_credentials"},
    headers={"User-Agent": os.environ["REDDIT_USER_AGENT"]}).json()["access_token"]
RD = "https://oauth.reddit.com"
RD_H = {"Authorization": f"Bearer {tk}", "User-Agent": os.environ["REDDIT_USER_AGENT"]}
MV, MV_BASE = os.environ["MAVERA_API_KEY"], "https://app.mavera.io/api/v1"
MV_H = {"Authorization": f"Bearer {MV}", "Content-Type": "application/json"}

SUB = "skincare"

# 1. Pull hot posts + top comments
r = requests.get(f"{RD}/r/{SUB}/hot", headers=RD_H, params={"limit": 25, "raw_json": 1})
r.raise_for_status()
posts = [p["data"] for p in r.json()["data"]["children"] if p["data"].get("selftext") or p["data"].get("title")]

corpus = []
for post in posts:
    corpus.append(f"POST: {post['title']}\n{post.get('selftext','')[:500]}")
    cr = requests.get(f"{RD}/comments/{post['id']}", headers=RD_H,
        params={"limit": 15, "depth": 2, "sort": "top", "raw_json": 1})
    if cr.status_code == 429:
        time.sleep(int(cr.headers.get("X-Ratelimit-Reset", 60))); continue
    for c in (cr.json()[1]["data"]["children"] if len(cr.json()) > 1 else []):
        body = c.get("data", {}).get("body", "")
        if body and body not in ("[removed]", "[deleted]"):
            corpus.append(f"COMMENT: {body[:400]}")
    time.sleep(0.7)

text = "\n\n---\n\n".join(corpus)
print(f"Corpus: {len(corpus)} fragments, {len(text):,} chars from /r/{SUB}")

# 2. Create brand voice
bv = requests.post(f"{MV_BASE}/brand-voices", headers=MV_H, json={
    "name": f"Community Voice: /r/{SUB}",
    "extracted_content": text,
    "description": f"Voice from {len(posts)} hot posts and comments on /r/{SUB}.",
}).json()
print(f"Brand Voice: {bv['id']} — Traits: {bv.get('traits', bv.get('voice_summary', 'N/A'))}")

# 3. Test generation
test = requests.post(f"{MV_BASE}/generations", headers=MV_H, json={
    "brand_voice_id": bv["id"],
    "prompt": f"Write a short product announcement for a new moisturizer, using /r/{SUB} community voice. Under 150 words.",
}).json()
print(f"\n{test.get('output', test.get('content', ''))[:500]}")

Example Output

Corpus: 312 fragments, 89,421 chars from /r/skincare
Brand Voice: bv_reddit_skincare_7k2m — Traits: Conversational, ingredient-aware,
  self-deprecating humor, supportive. Uses abbreviations (HG, YMMV).

Ok so I've been testing this new moisturizer for about 3 weeks and I think
we might have a new HG situation? Texture is gel-cream hybrid — absorbs fast,
doesn't pill under sunscreen. Ingredient list solid: ceramides, centella,
niacinamide. No fragrance. YMMV but at this price point it's worth a shot.

Error Handling

Deleted content

Reddit replaces removed/deleted comments. The code filters these. If corpus is small, switch to /top?t=month.

Rate limits

Monitor X-Ratelimit-Remaining and X-Ratelimit-Reset. Code sleeps on 429s.

Private subreddits

Return 403. Quarantined subs require quarantine: true header.

Overview

Salesforce

HubSpot

Pipedrive

Close CRM

Meta Ads

Google Ads

LinkedIn Marketing

TikTok

YouTube

Reddit

X / Twitter

LinkedIn Content

Vimeo

Wistia

Google Analytics (GA4)

Mixpanel

Amplitude

Segment

Mailchimp

Klaviyo

Customer.io

SendGrid

Typeform

SurveyMonkey

Qualtrics

Shopify

Stripe

BigCommerce

SEMrush

Ahrefs

WordPress

NewsAPI

Perigon

Alpha Vantage

Slack

Discord

Twilio

Notion

Asana

Linear

Jira

OpenAI

Anthropic

ElevenLabs

Deepgram

Greenhouse

Lever

LinkedIn Talent

G2

Trustpilot

Google Business

Yelp

Documentation Index

​Scenario

​Architecture

​Code

​Example Output

​Error Handling

Reddit Integration

Brand Voice

Scenario

Architecture

Code

Example Output

Error Handling