Build a Reddit Intelligence Agent with Claude and the Reddit Scraper
How to combine Apify's Reddit Scraper with Claude to build an autonomous brand monitoring agent, sentiment analysis pipeline
The actor referenced in this article is live on Apify. Pay only for results delivered.
Reddit is where developers, early adopters, and power users talk candidly about products, competitors, and pain points — without PR polish. A brand that monitors Reddit in real time has intelligence that most of its competitors lack. This guide shows you how to build that system using Apify’s Reddit Scraper paired with Claude as the reasoning layer.
TL;DR: Build a Reddit brand intelligence agent by combining the Apify Reddit Scraper (handles auth and pagination) with Claude (classifies sentiment, intent, urgency, and action-required in one pass). Process posts in parallel batches of 5, generate weekly narrative reports via a synthesis prompt, and fire same-day alerts for posts with urgency ≥ 4 or engagement > 500 upvotes. Total cost: under $1/week.
The architecture is straightforward: the scraper handles data collection (auth, pagination, rate limiting), Claude handles interpretation (sentiment, entity extraction, actionable summaries), and Python glues them together. The result is an agent that runs on a schedule, reads Reddit so you don’t have to, and emails you a weekly digest of everything worth knowing.
What You Will Build
By the end of this guide you will have:
- A Reddit data pipeline that scrapes posts and comments from any subreddit or keyword search
- A Claude-powered sentiment and theme classifier
- A scheduled weekly report generator that summarizes findings in plain English
- An alert system that fires immediately when a high-signal post is detected
Prerequisites
pip install apify-client anthropic python-dotenv
APIFY_TOKEN=your_apify_token_here
ANTHROPIC_API_KEY=your_anthropic_key_here
Step 1: Collect Reddit Data
The Reddit Scraper handles the OAuth complexity. You specify subreddits or search queries, it returns clean JSON:
from apify_client import ApifyClient
import os
apify = ApifyClient(os.environ["APIFY_TOKEN"])
def fetch_reddit_data(subreddits: list[str], keywords: list[str], max_posts: int = 100) -> list[dict]:
run = apify.actor("themineworks/reddit-scraper").call(run_input={
"subreddits": subreddits,
"searchQuery": " OR ".join(keywords) if keywords else None,
"maxPostsPerSubreddit": max_posts,
"maxCommentsPerPost": 20,
"sortBy": "new",
"includeComments": True,
})
posts = []
for item in apify.dataset(run["defaultDatasetId"]).iterate_items():
posts.append(item)
return posts
This returns posts with full metadata: title, body, score, comment count, flair, author, timestamp, and the top comments nested inside.
Step 2: Classify Sentiment and Intent with Claude
Rather than running a separate sentiment model, use Claude to classify and extract structure in a single pass. Claude understands context that keyword-based classifiers miss — sarcasm, indirect complaints, feature requests buried inside praise.
import anthropic
import json
claude = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
def classify_post(post: dict) -> dict:
"""Classify a single Reddit post using Claude."""
# Build the text to analyze
content = f"Title: {post['title']}\n\nBody: {post.get('body', '')[:1000]}"
if post.get("comments"):
top_comments = "\n".join([c["body"][:200] for c in post["comments"][:5]])
content += f"\n\nTop comments:\n{top_comments}"
response = claude.messages.create(
model="claude-sonnet-4-6",
max_tokens=500,
messages=[{
"role": "user",
"content": f"""Analyze this Reddit post and return a JSON object with:
- sentiment: "positive" | "negative" | "neutral" | "mixed"
- intent: "complaint" | "praise" | "question" | "discussion" | "feature_request" | "comparison" | "news"
- urgency: 1-5 (5 = needs immediate attention)
- key_entities: list of products/companies/people mentioned
- summary: one sentence summary
- action_required: true/false (does this need a human response?)
Post:
{content}
Return only valid JSON, no explanation."""
}]
)
try:
result = json.loads(response.content[0].text)
result["post_id"] = post["id"]
result["url"] = post["url"]
result["score"] = post["score"]
result["title"] = post["title"]
return result
except json.JSONDecodeError:
return {"post_id": post["id"], "error": "parse_failed"}
Step 3: Batch Processing with Rate Control
Processing 100 posts one at a time is slow. Batch them in groups of 10 while staying within Claude’s rate limits:
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
def classify_posts_batch(posts: list[dict], max_workers: int = 5) -> list[dict]:
results = []
with ThreadPoolExecutor(max_workers=max_workers) as executor:
futures = {executor.submit(classify_post, post): post for post in posts}
for future in as_completed(futures):
try:
result = future.result()
results.append(result)
except Exception as e:
print(f"Classification failed: {e}")
# Sort by urgency descending
results.sort(key=lambda x: x.get("urgency", 0), reverse=True)
return results
Step 4: Generate the Weekly Intelligence Report
This is where Claude earns its place. Rather than dumping a spreadsheet of classified posts, ask Claude to synthesize the week’s data into an executive brief:
def generate_weekly_report(classified_posts: list[dict], brand_name: str) -> str:
"""Generate a narrative intelligence report from a week of Reddit data."""
# Summarize the dataset for Claude's context
total = len(classified_posts)
sentiment_counts = {}
intent_counts = {}
high_urgency = []
for p in classified_posts:
s = p.get("sentiment", "unknown")
i = p.get("intent", "unknown")
sentiment_counts[s] = sentiment_counts.get(s, 0) + 1
intent_counts[i] = intent_counts.get(i, 0) + 1
if p.get("urgency", 0) >= 4:
high_urgency.append(p)
# Build the context block
summaries = "\n".join([
f"- [{p['sentiment'].upper()}] {p['title']} (score: {p['score']}, urgency: {p['urgency']}): {p.get('summary', '')}"
for p in classified_posts[:50]
])
urgent_items = "\n".join([
f"- {p['title']}\n URL: {p['url']}\n Summary: {p.get('summary', '')}"
for p in high_urgency[:10]
])
response = claude.messages.create(
model="claude-sonnet-4-6",
max_tokens=2000,
messages=[{
"role": "user",
"content": f"""You are a brand intelligence analyst. Write a weekly Reddit intelligence report for {brand_name}.
DATA SUMMARY:
- Total posts analyzed: {total}
- Sentiment breakdown: {sentiment_counts}
- Intent breakdown: {intent_counts}
- High-urgency posts (4-5/5): {len(high_urgency)}
POST SUMMARIES:
{summaries}
HIGH-URGENCY POSTS REQUIRING ATTENTION:
{urgent_items}
Write a professional intelligence report covering:
1. Executive summary (3-4 sentences)
2. Sentiment trend analysis
3. Top themes and recurring complaints/praise
4. Competitive intelligence (mentions of competitors)
5. Feature requests worth considering
6. Recommended actions this week
Write in plain, direct prose. No bullet-point lists. Be specific and cite examples from the data."""
}]
)
return response.content[0].text
Step 5: Real-Time Alert System
Not everything can wait for the weekly report. A post going viral with a complaint, a competitor comparison gaining traction, or a customer asking a question that hints at churn — these need same-day awareness.
def check_for_alerts(classified_posts: list[dict], alert_threshold: int = 4) -> list[dict]:
"""Return posts that need immediate human attention."""
alerts = []
for post in classified_posts:
reasons = []
if post.get("urgency", 0) >= alert_threshold:
reasons.append(f"urgency score {post['urgency']}/5")
if post.get("score", 0) > 500:
reasons.append(f"high engagement ({post['score']} upvotes)")
if post.get("action_required"):
reasons.append("action required")
if post.get("intent") == "complaint" and post.get("score", 0) > 50:
reasons.append("complaint gaining traction")
if reasons:
alerts.append({
**post,
"alert_reasons": reasons
})
return alerts
Step 6: Putting It Together
def run_reddit_intelligence_pipeline(
brand_name: str,
subreddits: list[str],
keywords: list[str],
):
print(f"Fetching Reddit data for {brand_name}...")
posts = fetch_reddit_data(subreddits, keywords, max_posts=100)
print(f" {len(posts)} posts fetched")
print("Classifying with Claude...")
classified = classify_posts_batch(posts, max_workers=5)
print(f" {len(classified)} posts classified")
# Check for immediate alerts
alerts = check_for_alerts(classified)
if alerts:
print(f"\n ALERTS ({len(alerts)} items need attention):")
for alert in alerts:
print(f" - {alert['title']}")
print(f" Reasons: {', '.join(alert['alert_reasons'])}")
print(f" URL: {alert['url']}")
# Generate weekly report
print("\nGenerating weekly intelligence report...")
report = generate_weekly_report(classified, brand_name)
# Save report
report_path = f"reddit_report_{brand_name.lower().replace(' ', '_')}.txt"
with open(report_path, "w") as f:
f.write(report)
print(f"\nReport saved to {report_path}")
return classified, report
# Run it
classified, report = run_reddit_intelligence_pipeline(
brand_name="Your Product Name",
subreddits=["SaaS", "webdev", "datascience", "Python"],
keywords=["your-product-name", "competitor-name"],
)
Real-World Use Cases
SaaS Product Teams. Monitor subreddits in your category weekly. Every complaint is a bug report or UX issue that your support tickets are missing. Every competitor comparison is a sales intelligence signal.
Investor Due Diligence. Before investing in or acquiring a company, run 6 months of Reddit history through this pipeline. Authentic user opinion is a leading indicator that PR-managed review sites cannot fake.
Content Marketing. The feature request and discussion posts tell you exactly what questions your audience has. Each one is a blog post or documentation page that will rank because it answers a real question.
Competitive Intelligence. Set keywords to your top three competitors. When a negative post gains traction, you have a window to reach out to the poster and offer an alternative.
Cost Estimate
Running this pipeline weekly on 100 posts costs roughly:
- Apify Reddit Scraper: ~$0.50 (PPE billing, ~500 posts/comments at $0.001 each)
- Claude API: ~$0.30 (50 classification calls + 1 report synthesis at Sonnet pricing)
- Total: under $1/week for continuous brand intelligence
The return on that investment is knowing about problems before they become crises, and opportunities before your competitors see them.
Frequently Asked Questions
Why is Claude better than keyword-based classifiers for Reddit sentiment analysis?
Keyword classifiers miss context, sarcasm, and domain-specific language. “This tool is literally broken” — keyword classifiers flag “broken” as negative; Claude understands the full sentence means the same thing but also processes “I can’t believe how good this is” without misclassifying “can’t” as negative. For developer communities in particular, where understatement and sarcasm are common, LLM classification achieves 20-30% higher accuracy than VADER or keyword-based sentiment models.
What fields does Claude extract when classifying a Reddit post?
The recommended schema includes: sentiment (positive/negative/neutral/mixed), intent (complaint/praise/question/comparison/mention), urgency (1-5 scale), response_recommended (boolean), suggested_response_type (acknowledge/support/clarify/thank/none), and key_issue (one sentence for complaints/questions, null otherwise). Extracting all six fields in a single API call costs the same as extracting one — batch the classification rather than making separate calls per field.
How do you handle rate limits when classifying hundreds of Reddit posts with Claude?
Process posts in parallel batches of 5-10, not one at a time. Use asyncio with a semaphore limiting concurrent API calls to 10. Claude Haiku can handle 400,000 tokens per minute on the standard tier — at ~200 tokens per post classification, that is 2,000 posts per minute before hitting rate limits. For weekly brand monitoring (typically 200-500 posts), rate limits are not a practical constraint. For larger volumes, implement exponential backoff on 429 responses.
What triggers an immediate alert in a Reddit brand monitoring system?
Two conditions warrant same-day alerts: (1) urgency ≥ 4 on Claude’s 1-5 scale — posts indicating a serious complaint, outage report, or reputational threat; (2) engagement > 500 upvotes on any mention — high-engagement posts spread rapidly and require fast response to shape the narrative. Configure your alert pipeline to send these directly to Slack or email rather than waiting for the weekly digest. Everything below these thresholds can go into the weekly summary report.
What is the typical weekly cost of running an automated Reddit intelligence pipeline?
A pipeline monitoring 3-5 brand/competitor keywords across 10 relevant subreddits — collecting ~500 posts/week, classifying with Claude Haiku, and generating a weekly narrative report with Claude Sonnet — costs approximately $0.50-1.00/week. The Apify Reddit Scraper at PPE rates costs $0.30-0.50 for 500 posts. Claude Haiku classification for 500 posts at ~150 tokens each is under $0.05. The weekly Sonnet report synthesis is under $0.10. Under $1/week total.
Try the scraper referenced in this article — live on Apify, pay only for results.
Open reddit-scraper on Apify →How to Scrape AmbitionBox Company Reviews and Ratings
AmbitionBox is India largest employer review platform with 300,000 companies. Learn how to pull ratings, review counts, salary data, and dimension scores as structured JSON without any official API.
AliExpress Product Data API: Prices, Ratings, and Orders in Python
AliExpress affiliate API has restricted coverage. Learn how to scrape AliExpress product listings for prices, ratings, order counts, and seller data as structured JSON — no affiliate approval needed.
ClinicalTrials.gov API v2: How to Search 500,000 Studies and Track Trial Status
ClinicalTrials.gov upgraded to a v2 REST API in 2024. Here is how to use it, what changed from v1, and how to build automated trial monitoring pipelines in Python.