The Mine Works
Browse on Apify
Build a Reddit Intelligence Agent with Claude and the Reddit Scraper
← All posts
tutorial September 8, 2025 · 9 min read

Build a Reddit Intelligence Agent with Claude and the Reddit Scraper

How to combine Apify's Reddit Scraper with Claude to build an autonomous brand monitoring agent, sentiment analysis pipeline

Try the scraper

The actor referenced in this article is live on Apify. Pay only for results delivered.

Open on Apify →

Reddit is where developers, early adopters, and power users talk candidly about products, competitors, and pain points — without PR polish. A brand that monitors Reddit in real time has intelligence that most of its competitors lack. This guide shows you how to build that system using Apify’s Reddit Scraper paired with Claude as the reasoning layer.

TL;DR: Build a Reddit brand intelligence agent by combining the Apify Reddit Scraper (handles auth and pagination) with Claude (classifies sentiment, intent, urgency, and action-required in one pass). Process posts in parallel batches of 5, generate weekly narrative reports via a synthesis prompt, and fire same-day alerts for posts with urgency ≥ 4 or engagement > 500 upvotes. Total cost: under $1/week.

The architecture is straightforward: the scraper handles data collection (auth, pagination, rate limiting), Claude handles interpretation (sentiment, entity extraction, actionable summaries), and Python glues them together. The result is an agent that runs on a schedule, reads Reddit so you don’t have to, and emails you a weekly digest of everything worth knowing.

What You Will Build

By the end of this guide you will have:

  1. A Reddit data pipeline that scrapes posts and comments from any subreddit or keyword search
  2. A Claude-powered sentiment and theme classifier
  3. A scheduled weekly report generator that summarizes findings in plain English
  4. An alert system that fires immediately when a high-signal post is detected

Prerequisites

pip install apify-client anthropic python-dotenv
APIFY_TOKEN=your_apify_token_here
ANTHROPIC_API_KEY=your_anthropic_key_here

Step 1: Collect Reddit Data

The Reddit Scraper handles the OAuth complexity. You specify subreddits or search queries, it returns clean JSON:

from apify_client import ApifyClient
import os

apify = ApifyClient(os.environ["APIFY_TOKEN"])

def fetch_reddit_data(subreddits: list[str], keywords: list[str], max_posts: int = 100) -> list[dict]:
    run = apify.actor("themineworks/reddit-scraper").call(run_input={
        "subreddits": subreddits,
        "searchQuery": " OR ".join(keywords) if keywords else None,
        "maxPostsPerSubreddit": max_posts,
        "maxCommentsPerPost": 20,
        "sortBy": "new",
        "includeComments": True,
    })
    
    posts = []
    for item in apify.dataset(run["defaultDatasetId"]).iterate_items():
        posts.append(item)
    
    return posts

This returns posts with full metadata: title, body, score, comment count, flair, author, timestamp, and the top comments nested inside.

Step 2: Classify Sentiment and Intent with Claude

Rather than running a separate sentiment model, use Claude to classify and extract structure in a single pass. Claude understands context that keyword-based classifiers miss — sarcasm, indirect complaints, feature requests buried inside praise.

import anthropic
import json

claude = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

def classify_post(post: dict) -> dict:
    """Classify a single Reddit post using Claude."""
    
    # Build the text to analyze
    content = f"Title: {post['title']}\n\nBody: {post.get('body', '')[:1000]}"
    if post.get("comments"):
        top_comments = "\n".join([c["body"][:200] for c in post["comments"][:5]])
        content += f"\n\nTop comments:\n{top_comments}"
    
    response = claude.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=500,
        messages=[{
            "role": "user",
            "content": f"""Analyze this Reddit post and return a JSON object with:
- sentiment: "positive" | "negative" | "neutral" | "mixed"
- intent: "complaint" | "praise" | "question" | "discussion" | "feature_request" | "comparison" | "news"
- urgency: 1-5 (5 = needs immediate attention)
- key_entities: list of products/companies/people mentioned
- summary: one sentence summary
- action_required: true/false (does this need a human response?)

Post:
{content}

Return only valid JSON, no explanation."""
        }]
    )
    
    try:
        result = json.loads(response.content[0].text)
        result["post_id"] = post["id"]
        result["url"] = post["url"]
        result["score"] = post["score"]
        result["title"] = post["title"]
        return result
    except json.JSONDecodeError:
        return {"post_id": post["id"], "error": "parse_failed"}

Step 3: Batch Processing with Rate Control

Processing 100 posts one at a time is slow. Batch them in groups of 10 while staying within Claude’s rate limits:

import time
from concurrent.futures import ThreadPoolExecutor, as_completed

def classify_posts_batch(posts: list[dict], max_workers: int = 5) -> list[dict]:
    results = []
    
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        futures = {executor.submit(classify_post, post): post for post in posts}
        
        for future in as_completed(futures):
            try:
                result = future.result()
                results.append(result)
            except Exception as e:
                print(f"Classification failed: {e}")
    
    # Sort by urgency descending
    results.sort(key=lambda x: x.get("urgency", 0), reverse=True)
    return results

Step 4: Generate the Weekly Intelligence Report

This is where Claude earns its place. Rather than dumping a spreadsheet of classified posts, ask Claude to synthesize the week’s data into an executive brief:

def generate_weekly_report(classified_posts: list[dict], brand_name: str) -> str:
    """Generate a narrative intelligence report from a week of Reddit data."""
    
    # Summarize the dataset for Claude's context
    total = len(classified_posts)
    sentiment_counts = {}
    intent_counts = {}
    high_urgency = []
    
    for p in classified_posts:
        s = p.get("sentiment", "unknown")
        i = p.get("intent", "unknown")
        sentiment_counts[s] = sentiment_counts.get(s, 0) + 1
        intent_counts[i] = intent_counts.get(i, 0) + 1
        if p.get("urgency", 0) >= 4:
            high_urgency.append(p)
    
    # Build the context block
    summaries = "\n".join([
        f"- [{p['sentiment'].upper()}] {p['title']} (score: {p['score']}, urgency: {p['urgency']}): {p.get('summary', '')}"
        for p in classified_posts[:50]
    ])
    
    urgent_items = "\n".join([
        f"- {p['title']}\n  URL: {p['url']}\n  Summary: {p.get('summary', '')}"
        for p in high_urgency[:10]
    ])
    
    response = claude.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2000,
        messages=[{
            "role": "user",
            "content": f"""You are a brand intelligence analyst. Write a weekly Reddit intelligence report for {brand_name}.

DATA SUMMARY:
- Total posts analyzed: {total}
- Sentiment breakdown: {sentiment_counts}
- Intent breakdown: {intent_counts}
- High-urgency posts (4-5/5): {len(high_urgency)}

POST SUMMARIES:
{summaries}

HIGH-URGENCY POSTS REQUIRING ATTENTION:
{urgent_items}

Write a professional intelligence report covering:
1. Executive summary (3-4 sentences)
2. Sentiment trend analysis
3. Top themes and recurring complaints/praise
4. Competitive intelligence (mentions of competitors)
5. Feature requests worth considering
6. Recommended actions this week

Write in plain, direct prose. No bullet-point lists. Be specific and cite examples from the data."""
        }]
    )
    
    return response.content[0].text

Step 5: Real-Time Alert System

Not everything can wait for the weekly report. A post going viral with a complaint, a competitor comparison gaining traction, or a customer asking a question that hints at churn — these need same-day awareness.

def check_for_alerts(classified_posts: list[dict], alert_threshold: int = 4) -> list[dict]:
    """Return posts that need immediate human attention."""
    alerts = []
    
    for post in classified_posts:
        reasons = []
        
        if post.get("urgency", 0) >= alert_threshold:
            reasons.append(f"urgency score {post['urgency']}/5")
        
        if post.get("score", 0) > 500:
            reasons.append(f"high engagement ({post['score']} upvotes)")
        
        if post.get("action_required"):
            reasons.append("action required")
        
        if post.get("intent") == "complaint" and post.get("score", 0) > 50:
            reasons.append("complaint gaining traction")
        
        if reasons:
            alerts.append({
                **post,
                "alert_reasons": reasons
            })
    
    return alerts

Step 6: Putting It Together

def run_reddit_intelligence_pipeline(
    brand_name: str,
    subreddits: list[str],
    keywords: list[str],
):
    print(f"Fetching Reddit data for {brand_name}...")
    posts = fetch_reddit_data(subreddits, keywords, max_posts=100)
    print(f"  {len(posts)} posts fetched")
    
    print("Classifying with Claude...")
    classified = classify_posts_batch(posts, max_workers=5)
    print(f"  {len(classified)} posts classified")
    
    # Check for immediate alerts
    alerts = check_for_alerts(classified)
    if alerts:
        print(f"\n ALERTS ({len(alerts)} items need attention):")
        for alert in alerts:
            print(f"  - {alert['title']}")
            print(f"    Reasons: {', '.join(alert['alert_reasons'])}")
            print(f"    URL: {alert['url']}")
    
    # Generate weekly report
    print("\nGenerating weekly intelligence report...")
    report = generate_weekly_report(classified, brand_name)
    
    # Save report
    report_path = f"reddit_report_{brand_name.lower().replace(' ', '_')}.txt"
    with open(report_path, "w") as f:
        f.write(report)
    
    print(f"\nReport saved to {report_path}")
    return classified, report

# Run it
classified, report = run_reddit_intelligence_pipeline(
    brand_name="Your Product Name",
    subreddits=["SaaS", "webdev", "datascience", "Python"],
    keywords=["your-product-name", "competitor-name"],
)

Real-World Use Cases

SaaS Product Teams. Monitor subreddits in your category weekly. Every complaint is a bug report or UX issue that your support tickets are missing. Every competitor comparison is a sales intelligence signal.

Investor Due Diligence. Before investing in or acquiring a company, run 6 months of Reddit history through this pipeline. Authentic user opinion is a leading indicator that PR-managed review sites cannot fake.

Content Marketing. The feature request and discussion posts tell you exactly what questions your audience has. Each one is a blog post or documentation page that will rank because it answers a real question.

Competitive Intelligence. Set keywords to your top three competitors. When a negative post gains traction, you have a window to reach out to the poster and offer an alternative.

Cost Estimate

Running this pipeline weekly on 100 posts costs roughly:

  • Apify Reddit Scraper: ~$0.50 (PPE billing, ~500 posts/comments at $0.001 each)
  • Claude API: ~$0.30 (50 classification calls + 1 report synthesis at Sonnet pricing)
  • Total: under $1/week for continuous brand intelligence

The return on that investment is knowing about problems before they become crises, and opportunities before your competitors see them.

Frequently Asked Questions

Why is Claude better than keyword-based classifiers for Reddit sentiment analysis?

Keyword classifiers miss context, sarcasm, and domain-specific language. “This tool is literally broken” — keyword classifiers flag “broken” as negative; Claude understands the full sentence means the same thing but also processes “I can’t believe how good this is” without misclassifying “can’t” as negative. For developer communities in particular, where understatement and sarcasm are common, LLM classification achieves 20-30% higher accuracy than VADER or keyword-based sentiment models.

What fields does Claude extract when classifying a Reddit post?

The recommended schema includes: sentiment (positive/negative/neutral/mixed), intent (complaint/praise/question/comparison/mention), urgency (1-5 scale), response_recommended (boolean), suggested_response_type (acknowledge/support/clarify/thank/none), and key_issue (one sentence for complaints/questions, null otherwise). Extracting all six fields in a single API call costs the same as extracting one — batch the classification rather than making separate calls per field.

How do you handle rate limits when classifying hundreds of Reddit posts with Claude?

Process posts in parallel batches of 5-10, not one at a time. Use asyncio with a semaphore limiting concurrent API calls to 10. Claude Haiku can handle 400,000 tokens per minute on the standard tier — at ~200 tokens per post classification, that is 2,000 posts per minute before hitting rate limits. For weekly brand monitoring (typically 200-500 posts), rate limits are not a practical constraint. For larger volumes, implement exponential backoff on 429 responses.

What triggers an immediate alert in a Reddit brand monitoring system?

Two conditions warrant same-day alerts: (1) urgency ≥ 4 on Claude’s 1-5 scale — posts indicating a serious complaint, outage report, or reputational threat; (2) engagement > 500 upvotes on any mention — high-engagement posts spread rapidly and require fast response to shape the narrative. Configure your alert pipeline to send these directly to Slack or email rather than waiting for the weekly digest. Everything below these thresholds can go into the weekly summary report.

What is the typical weekly cost of running an automated Reddit intelligence pipeline?

A pipeline monitoring 3-5 brand/competitor keywords across 10 relevant subreddits — collecting ~500 posts/week, classifying with Claude Haiku, and generating a weekly narrative report with Claude Sonnet — costs approximately $0.50-1.00/week. The Apify Reddit Scraper at PPE rates costs $0.30-0.50 for 500 posts. Claude Haiku classification for 500 posts at ~150 tokens each is under $0.05. The weekly Sonnet report synthesis is under $0.10. Under $1/week total.

Related Actor

Try the scraper referenced in this article — live on Apify, pay only for results.

Open reddit-scraper on Apify →