GPT Vision → Video Analysis Pre-Processing

Scenario

Before sending video content to Mavera’s analysis pipeline, use GPT-4.1’s vision capability to describe keyframes. Extract frames from a video (one per 10 seconds), send each to GPT-4.1 vision for a description, then aggregate those descriptions and pass them to Mavera for marketing analysis — scene composition, brand alignment, emotional tone, and recommendations. Flow: Extract keyframes → OpenAI GPT-4.1 vision (describe frames) → aggregate → Mavera POST /mave/chat → Marketing analysis

Code

import os, requests, time, base64, glob
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
MV = os.environ["MAVERA_API_KEY"]
MV_BASE = "https://app.mavera.io/api/v1"
MV_H = {"Authorization": f"Bearer {MV}", "Content-Type": "application/json"}

VIDEO_FILE = "product-demo.mp4"
FRAME_DIR = "frames"

# 1. Extract keyframes (one every 10 seconds)
os.makedirs(FRAME_DIR, exist_ok=True)
os.system(f'ffmpeg -i {VIDEO_FILE} -vf "fps=1/10" -q:v 2 {FRAME_DIR}/frame_%04d.jpg -y 2>/dev/null')
frames = sorted(glob.glob(f"{FRAME_DIR}/frame_*.jpg"))
print(f"Extracted {len(frames)} keyframes from {VIDEO_FILE}")

# 2. Describe each frame with GPT-4.1 vision
descriptions = []
for i, frame_path in enumerate(frames[:20]):
    with open(frame_path, "rb") as img:
        b64 = base64.b64encode(img.read()).decode()

    resp = client.responses.create(
        model="gpt-4.1",
        input=[{"role": "user", "content": [
            {"type": "text", "text": "Describe this video frame in 2-3 sentences. "
                "Focus on: subjects, actions, setting, colors, text overlays, branding, mood."},
            {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{b64}", "detail": "low"}},
        ]}],
        max_tokens=200,
    )
    desc = resp.output[0].content[0].text
    descriptions.append(f"[{i * 10}s] {desc}")
    print(f"Frame {i+1}/{len(frames)}: {desc[:80]}...")
    time.sleep(0.5)

# 3. Aggregate and send to Mavera
frame_corpus = "\n\n".join(descriptions)
time.sleep(1)

analysis = requests.post(f"{MV_BASE}/mave/chat", headers=MV_H, json={
    "message": f"Video marketing analyst. Analyze these {len(descriptions)} frame descriptions.\n\n"
        f"FRAME DESCRIPTIONS:\n{frame_corpus[:10000]}\n\n"
        "Produce:\n"
        "1. **VISUAL NARRATIVE ARC** — Story across frames\n"
        "2. **BRAND CONSISTENCY** — Colors, logo, typography\n"
        "3. **EMOTIONAL PROGRESSION** — Mood shifts\n"
        "4. **KEY SCENES** — Most impactful frames\n"
        "5. **CONTENT GAPS** — Missing elements\n"
        "6. **RECOMMENDATIONS** — Improvements for conversion\n"
}).json()

print(f"\n{'='*60}\nVIDEO MARKETING ANALYSIS\n{'='*60}")
print(analysis.get("content", "")[:4000])

Example Output

Extracted 18 keyframes, described 18 frames (4,271 chars)

VISUAL NARRATIVE ARC
Problem → solution → proof. Frames 1-4 set context, 5-10 show
the product, 11-16 results, 17-18 CTA.

BRAND CONSISTENCY
- Blue (#2563EB) and white consistent in 16/18 frames
- Logo visible in 11/18 frames

KEY SCENES
- Frame 5 [40s]: Dashboard reveal with live data
- Frame 12 [110s]: "340% pipeline growth" metric overlay

CONTENT GAPS
- No social proof overlay — Missing pricing anchor before CTA

Error Handling

Image token limits

Each image consumes tokens based on detail level. Use detail: "low" (fixed 85 tokens) for keyframe descriptions. Switch to "high" only for frames requiring fine-grained text extraction.

ffmpeg not installed

Install with brew install ffmpeg (macOS), apt install ffmpeg (Ubuntu), or choco install ffmpeg (Windows). Alternatively, use opencv-python to extract frames programmatically.

Large frame counts

A 10-minute video at 1 frame/10s yields 60 frames. Cap at 20 and increase the interval for longer videos. Adjust the ffmpeg filter to fps=1/30 for 30-minute+ content.

Overview

Salesforce

HubSpot

Pipedrive

Close CRM

Meta Ads

Google Ads

LinkedIn Marketing

TikTok

YouTube

Reddit

X / Twitter

LinkedIn Content

Vimeo

Wistia

Google Analytics (GA4)

Mixpanel

Amplitude

Segment

Mailchimp

Klaviyo

Customer.io

SendGrid

Typeform

SurveyMonkey

Qualtrics

Shopify

Stripe

BigCommerce

SEMrush

Ahrefs

WordPress

NewsAPI

Perigon

Alpha Vantage

Slack

Discord

Twilio

Notion

Asana

Linear

Jira

OpenAI

Anthropic

ElevenLabs

Deepgram

Greenhouse

Lever

LinkedIn Talent

G2

Trustpilot

Google Business

Yelp

Documentation Index

​Scenario

​Code

​Example Output

​Error Handling

Scenario

Code

Example Output

Error Handling