> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mavera.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Long Document Analysis → Persona Source

> Use Claude's 1M token context window to process entire annual reports, 10-K filings, or whitepapers in a single call — extract audience segments and decision-maker profiles, then create Mavera Custom Personas grounded in real market intelligence

## Scenario

Claude's 1M token context window can process entire annual reports, 10-K filings, or industry whitepapers in a single call — no chunking, no summarization chains, no lost context. This job feeds a full document to Claude Opus 4.6 for deep analysis, extracts audience segments and decision-maker profiles, then creates Mavera Custom Personas grounded in real market intelligence.

**Flow:** Read document → Anthropic `POST /v1/messages` (Claude Opus 4.6, full document) → Extract segments → Mavera `POST /personas` → Data-driven personas

## Code

<CodeGroup>
  ```python Python theme={"dark"}
  import os, time, json, anthropic, requests

  MV = os.environ["MAVERA_API_KEY"]
  MV_BASE = "https://app.mavera.io/api/v1"
  MV_H = {"Authorization": f"Bearer {MV}", "Content-Type": "application/json"}
  client = anthropic.Anthropic()

  with open("acme_annual_report_2025.txt") as f:
      document_text = f.read()
  print(f"Document loaded: {len(document_text):,} chars (~{len(document_text)//4:,} tokens)")

  # 1. Claude Opus analyzes the full document
  analysis = client.messages.create(
      model="claude-opus-4-6-20250725", max_tokens=4096,
      input=[{"role": "user",
          "content": "Market intelligence analyst. Analyze this document and extract:\n\n"
              "1. **Audience Segments** 2. **Pain Points** 3. **Decision Criteria** "
              "4. **Communication Preferences** 5. **Budget Indicators**\n\n"
              "Return JSON: [{name, role, industry, pain_points, decision_criteria, "
              "communication_style, budget_range, key_quotes}]\n\n"
              f"DOCUMENT:\n{document_text}"}],
  )
  raw_output = analysis.content[0].text
  print(f"Claude complete — {analysis.usage.input_tokens:,} input, {analysis.usage.output_tokens:,} output tokens")

  # 2. Parse extracted personas
  try:
      start = raw_output.index("[")
      end = raw_output.rindex("]") + 1
      personas_data = json.loads(raw_output[start:end])
  except (ValueError, json.JSONDecodeError):
      personas_data = [{"name": "Extracted Persona", "description": raw_output[:500]}]
  print(f"Extracted {len(personas_data)} persona segments")

  # 3. Create Mavera personas
  created = []
  for p in personas_data[:6]:
      desc = (f"Role: {p.get('role', 'N/A')}. Industry: {p.get('industry', 'N/A')}. "
          f"Pain points: {', '.join(p.get('pain_points', [])[:3])}. "
          f"Style: {p.get('communication_style', 'professional')}. Budget: {p.get('budget_range', 'N/A')}.")
      mv = requests.post(f"{MV_BASE}/personas", headers=MV_H, json={
          "name": p.get("name", "Document Persona"), "description": desc,
      }).json()
      created.append({"id": mv["id"], "name": p.get("name")})
      print(f"  Created: {p.get('name')} → {mv['id']}")
      time.sleep(0.3)

  print(f"\nPersonas created: {len(created)}")
  for c in created:
      print(f"  • {c['name']} ({c['id']})")
  ```

  ```javascript JavaScript theme={"dark"}
  import Anthropic from "@anthropic-ai/sdk";
  import fs from "fs";

  const MV = process.env.MAVERA_API_KEY;
  const MV_BASE = "https://app.mavera.io/api/v1";
  const MV_H = { Authorization: `Bearer ${MV}`, "Content-Type": "application/json" };
  const client = new Anthropic();

  const documentText = fs.readFileSync("acme_annual_report_2025.txt", "utf-8");

  const analysis = await client.messages.create({
    model: "claude-opus-4-6-20250725", max_tokens: 4096,
    input: [{ role: "user",
      content: `Market intelligence analyst. Extract: 1. Audience Segments 2. Pain Points `
        + `3. Decision Criteria 4. Comms Preferences 5. Budget Indicators\n`
        + `Return JSON: [{name, role, industry, pain_points, decision_criteria, communication_style, budget_range}]\n\n`
        + `DOCUMENT:\n${documentText}` }],
  });
  const rawOutput = analysis.content[0].text;
  console.log(`Complete — ${analysis.usage.input_tokens.toLocaleString()} input tokens`);

  let personasData;
  try {
    personasData = JSON.parse(rawOutput.slice(rawOutput.indexOf("["), rawOutput.lastIndexOf("]") + 1));
  } catch { personasData = [{ name: "Extracted Persona", description: rawOutput.slice(0, 500) }]; }

  const created = [];
  for (const p of personasData.slice(0, 6)) {
    const desc = `Role: ${p.role || "N/A"}. Pain points: ${(p.pain_points || []).slice(0, 3).join(", ")}. `
      + `Style: ${p.communication_style || "professional"}. Budget: ${p.budget_range || "N/A"}.`;
    const mv = await fetch(`${MV_BASE}/personas`, { method: "POST", headers: MV_H,
      body: JSON.stringify({ name: p.name || "Document Persona", description: desc }),
    }).then(r => r.json());
    created.push({ id: mv.id, name: p.name });
    console.log(`  Created: ${p.name} → ${mv.id}`);
    await new Promise(r => setTimeout(r, 300));
  }
  console.log(`Personas created: ${created.length}`);
  ```
</CodeGroup>

## Example Output

```text theme={"dark"}
Document loaded: 847,293 chars (~211,823 tokens)
Claude complete — 211,823 input, 2,847 output tokens
Extracted 4 persona segments
  Created: VP of Operations → per_8a3f2c
  Created: Chief Financial Officer → per_1b7d4e
  Created: IT Director (Mid-Market) → per_9c2a1f
  Created: Procurement Manager → per_4d8e3b
```

## Error Handling

<AccordionGroup>
  <Accordion title="Token limits">Claude Opus 4.6 supports 1M input tokens. A 500-page annual report is \~200K tokens — well within limits. For documents exceeding 1M tokens, split at natural boundaries (chapters, sections) and run multiple calls.</Accordion>
  <Accordion title="Rate limits and retries">Tier 1 allows 50 req/min. This job uses 1 Claude call + up to 6 Mavera calls. The `anthropic` SDK retries 429 errors automatically with exponential backoff. For raw HTTP, add retry logic with 30-60s waits.</Accordion>
  <Accordion title="JSON parsing failures">Claude may wrap JSON in markdown code fences. The substring extraction handles this. If parsing still fails, retry with a system prompt that says "return raw JSON, no markdown."</Accordion>
</AccordionGroup>
