Long Document Analysis → Persona Source

Scenario

Claude’s 1M token context window can process entire annual reports, 10-K filings, or industry whitepapers in a single call — no chunking, no summarization chains, no lost context. This job feeds a full document to Claude Opus 4.6 for deep analysis, extracts audience segments and decision-maker profiles, then creates Mavera Custom Personas grounded in real market intelligence. Flow: Read document → Anthropic POST /v1/messages (Claude Opus 4.6, full document) → Extract segments → Mavera POST /personas → Data-driven personas

Code

import os, time, json, anthropic, requests

MV = os.environ["MAVERA_API_KEY"]
MV_BASE = "https://app.mavera.io/api/v1"
MV_H = {"Authorization": f"Bearer {MV}", "Content-Type": "application/json"}
client = anthropic.Anthropic()

with open("acme_annual_report_2025.txt") as f:
    document_text = f.read()
print(f"Document loaded: {len(document_text):,} chars (~{len(document_text)//4:,} tokens)")

# 1. Claude Opus analyzes the full document
analysis = client.messages.create(
    model="claude-opus-4-6-20250725", max_tokens=4096,
    input=[{"role": "user",
        "content": "Market intelligence analyst. Analyze this document and extract:\n\n"
            "1. **Audience Segments** 2. **Pain Points** 3. **Decision Criteria** "
            "4. **Communication Preferences** 5. **Budget Indicators**\n\n"
            "Return JSON: [{name, role, industry, pain_points, decision_criteria, "
            "communication_style, budget_range, key_quotes}]\n\n"
            f"DOCUMENT:\n{document_text}"}],
)
raw_output = analysis.content[0].text
print(f"Claude complete — {analysis.usage.input_tokens:,} input, {analysis.usage.output_tokens:,} output tokens")

# 2. Parse extracted personas
try:
    start = raw_output.index("[")
    end = raw_output.rindex("]") + 1
    personas_data = json.loads(raw_output[start:end])
except (ValueError, json.JSONDecodeError):
    personas_data = [{"name": "Extracted Persona", "description": raw_output[:500]}]
print(f"Extracted {len(personas_data)} persona segments")

# 3. Create Mavera personas
created = []
for p in personas_data[:6]:
    desc = (f"Role: {p.get('role', 'N/A')}. Industry: {p.get('industry', 'N/A')}. "
        f"Pain points: {', '.join(p.get('pain_points', [])[:3])}. "
        f"Style: {p.get('communication_style', 'professional')}. Budget: {p.get('budget_range', 'N/A')}.")
    mv = requests.post(f"{MV_BASE}/personas", headers=MV_H, json={
        "name": p.get("name", "Document Persona"), "description": desc,
    }).json()
    created.append({"id": mv["id"], "name": p.get("name")})
    print(f"  Created: {p.get('name')} → {mv['id']}")
    time.sleep(0.3)

print(f"\nPersonas created: {len(created)}")
for c in created:
    print(f"  • {c['name']} ({c['id']})")

Example Output

Document loaded: 847,293 chars (~211,823 tokens)
Claude complete — 211,823 input, 2,847 output tokens
Extracted 4 persona segments
  Created: VP of Operations → per_8a3f2c
  Created: Chief Financial Officer → per_1b7d4e
  Created: IT Director (Mid-Market) → per_9c2a1f
  Created: Procurement Manager → per_4d8e3b

Error Handling

Token limits

Claude Opus 4.6 supports 1M input tokens. A 500-page annual report is ~200K tokens — well within limits. For documents exceeding 1M tokens, split at natural boundaries (chapters, sections) and run multiple calls.

Rate limits and retries

Tier 1 allows 50 req/min. This job uses 1 Claude call + up to 6 Mavera calls. The anthropic SDK retries 429 errors automatically with exponential backoff. For raw HTTP, add retry logic with 30-60s waits.

JSON parsing failures

Claude may wrap JSON in markdown code fences. The substring extraction handles this. If parsing still fails, retry with a system prompt that says “return raw JSON, no markdown.”

Overview

Salesforce

HubSpot

Pipedrive

Close CRM

Meta Ads

Google Ads

LinkedIn Marketing

TikTok

YouTube

Reddit

X / Twitter

LinkedIn Content

Vimeo

Wistia

Google Analytics (GA4)

Mixpanel

Amplitude

Segment

Mailchimp

Klaviyo

Customer.io

SendGrid

Typeform

SurveyMonkey

Qualtrics

Shopify

Stripe

BigCommerce

SEMrush

Ahrefs

WordPress

NewsAPI

Perigon

Alpha Vantage

Slack

Discord

Twilio

Notion

Asana

Linear

Jira

OpenAI

Anthropic

ElevenLabs

Deepgram

Greenhouse

Lever

LinkedIn Talent

G2

Trustpilot

Google Business

Yelp

Documentation Index

​Scenario

​Code

​Example Output

​Error Handling

Scenario

Code

Example Output

Error Handling