How to Build a Competitor Intelligence System Using Web Scrapers
A practical guide to building automated competitor monitoring — pricing, job postings, content, and review tracking
The actor referenced in this article is live on Apify. Pay only for results delivered.
Knowing what competitors are doing is not optional at any stage of building a company. The question is whether you are tracking this manually (expensive in human time) or systematically (cheap at scale). Web scrapers make systematic competitive intelligence tractable for teams of any size.
TL;DR: Automate competitive intelligence across 5 dimensions: pricing page change detection (hash-compare daily), job posting analysis (hiring spikes reveal product roadmap 6-12 months out), content/SEO tracking (blog topics show positioning), Reddit social listening (real customer complaints and feature requests), and Google Trends brand momentum. Full pipeline runs for a few dollars per month.
This guide covers the five intelligence dimensions most worth automating, with the tools and code to run each.
Dimension 1: Pricing Intelligence
Competitor pricing pages are public and usually structured enough to extract reliably. Changes to pricing — new tiers, removed features, price increases — are high-signal competitive events.
from apify_client import ApifyClient
from datetime import datetime
import hashlib
import json
client = ApifyClient('YOUR_API_TOKEN')
# Crawl competitor pricing pages
run = client.actor('themineworks/rag-crawler').call(run_input={
'startUrls': [
{'url': 'https://competitor-a.com/pricing'},
{'url': 'https://competitor-b.com/pricing'},
{'url': 'https://competitor-c.com/pricing'},
],
'maxPages': 1,
'renderJs': True,
'outputFormat': 'markdown',
})
for page in client.dataset(run['defaultDatasetId']).iterate_items():
content_hash = hashlib.md5(page['markdown'].encode()).hexdigest()
# Compare to stored hash
stored = load_stored_page(page['url']) # Your storage implementation
if stored and stored['hash'] != content_hash:
send_alert(f"Pricing page changed: {page['url']}")
save_page({'url': page['url'], 'hash': content_hash, 'markdown': page['markdown']})
Run this daily. When a hash changes, the pricing page has been updated. Review the diff to identify what changed.
Dimension 2: Job Posting Intelligence
What your competitors are hiring for reveals their product roadmap 6-12 months ahead. A surge in ML engineer postings signals an AI feature push. New growth marketing hires signal expansion into a new channel.
# Track competitor hiring using ATS Jobs
run = client.actor('themineworks/ats-jobs').call(run_input={
'companies': [
'competitor-a-slug',
'competitor-b-slug',
'competitor-c-slug',
],
'maxJobsPerCompany': 100,
'includeDescription': True,
})
from collections import Counter
department_counts = Counter()
skill_mentions = Counter()
for job in client.dataset(run['defaultDatasetId']).iterate_items():
department_counts[f"{job['company_slug']}/{job.get('department', 'unknown')}"] += 1
desc = (job.get('description_plain', '') or '').lower()
for skill in ['machine learning', 'llm', 'genai', 'rust', 'go', 'react native']:
if skill in desc:
skill_mentions[f"{job['company_slug']}/{skill}"] += 1
print("Department distribution:", department_counts.most_common(20))
print("Skill mentions:", skill_mentions.most_common(20))
Track this weekly. When a competitor’s engineering headcount spikes in a specific area, that is where they are building.
Dimension 3: Content and SEO Intelligence
Which topics are competitors publishing about? Which of their posts rank well? Content strategy reveals product positioning and growth strategy.
# Crawl competitor blog indexes
run = client.actor('themineworks/rag-crawler').call(run_input={
'startUrls': [
{'url': 'https://competitor-a.com/blog'},
],
'maxPages': 50,
'renderJs': True,
'outputFormat': 'markdown',
'includeUrlPatterns': ['*/blog/*'],
})
# Extract post topics
import re
topics = []
for page in client.dataset(run['defaultDatasetId']).iterate_items():
# Extract heading 1 as topic
h1 = re.search(r'^# (.+)$', page['markdown'], re.MULTILINE)
if h1:
topics.append({
'url': page['url'],
'title': h1.group(1),
'word_count': len(page['markdown'].split()),
})
print(f"Competitor published {len(topics)} posts")
print("Most recent:", sorted(topics, key=lambda x: x['url'])[-5:])
Dimension 4: Social Listening
What are customers saying about competitors on Reddit? What complaints are surfacing? What features are being requested?
# Monitor competitor mentions on Reddit
run = client.actor('themineworks/reddit-scraper').call(run_input={
'mode': 'search',
'searchQuery': '"competitor-name" OR "competitor-a" OR "@competitor_handle"',
'subreddits': ['your_niche', 'software', 'entrepreneur', 'startups'],
'sortBy': 'new',
'maxPosts': 200,
'includeComments': True,
'timeFilter': 'week',
})
complaint_keywords = ['bug', 'broken', 'stopped working', 'terrible', 'cancelled', 'switching to']
feature_request_keywords = ['wish', 'would be great', 'need', 'missing', 'please add']
complaints = []
feature_requests = []
for post in client.dataset(run['defaultDatasetId']).iterate_items():
text = f"{post.get('title', '')} {post.get('selftext', '')}".lower()
if any(kw in text for kw in complaint_keywords):
complaints.append(post)
if any(kw in text for kw in feature_request_keywords):
feature_requests.append(post)
This surfaces real customer pain points — things people wish competitors did differently. These are your product opportunities.
Dimension 5: Review Intelligence
G2, Product Hunt, and app stores publish structured reviews. Track competitor review volume and sentiment over time.
# Track Google Trends for competitor brand names
run = client.actor('themineworks/google-trends-pro').call(run_input={
'keywords': ['competitor-a', 'competitor-b', 'your-brand'],
'timeframe': 'today 12-m',
'geo': 'US',
})
for item in client.dataset(run['defaultDatasetId']).iterate_items():
iot = item['interest_over_time']
recent_avg = sum(p['value'] for p in iot[-8:]) / 8 # Last 2 months
older_avg = sum(p['value'] for p in iot[-26:-8]) / 18 # 3-6 months ago
growth = (recent_avg - older_avg) / max(older_avg, 1) * 100
print(f"{item['keyword']}: {growth:+.0f}% trend (recent vs 3-6mo ago)")
Putting It Together: A Weekly Intelligence Digest
Schedule all five components to run weekly and compile the output into a structured report:
- Pricing changes: List any competitor pricing pages that changed
- Hiring signals: Departments and skills seeing notable growth
- Content published: New posts from competitors with topic analysis
- Community sentiment: Top complaints and feature requests from social listening
- Brand momentum: Google Trends trajectory for each competitor vs your brand
This information, delivered weekly to your product and marketing teams, replaces dozens of hours of manual research with an automated pipeline that runs for a few dollars per month.
Frequently Asked Questions
How do you detect when a competitor changes their pricing page?
Fetch the pricing page HTML daily, extract the pricing-relevant content (strip navigation, footer, and boilerplate), and store an MD5 hash. When the hash changes, diff the old and new versions to identify what changed. Focus on sections containing currency symbols, plan names, and feature bullets. A Python script using the Apify RAG Crawler plus a simple hash comparison catches pricing changes within 24 hours of them going live.
What does competitor hiring data reveal about product strategy?
Job posting patterns are a 6-12 month leading indicator of product direction. A company posting 5+ ML engineer roles is building internal AI infrastructure. Multiple new sales roles in a specific geography signals expansion into that market. A sudden cluster of data engineering postings means they are building a data product. Reading job descriptions carefully reveals tech stack choices, integration priorities, and customer segments — all of which inform where the market is heading.
How do you monitor competitor brand mentions on Reddit at scale?
Use the Apify Reddit Scraper to search for your competitor’s brand name and product names across all relevant subreddits weekly. Classify each mention by sentiment and intent using Claude Haiku — this costs under $0.05 per 100 posts. Track complaint themes over time: recurring complaints about a competitor’s product are opportunities to position against. Rising positive mentions of a competitor after a specific date often signal a successful launch or feature release.
What is the most actionable competitive intelligence signal for a product team?
Hiring velocity changes are the most actionable signal — they reflect internal decisions made months earlier and predict product direction before anything is public. A competitor posting 10+ engineers in one month when they previously posted 2-3 is a significant signal. Combine hiring velocity with the specific roles (frontend vs. backend vs. ML vs. sales) to infer which product areas they are investing in.
How much does it cost to run an automated competitor intelligence system?
A full pipeline monitoring 5 competitors weekly — pricing change detection, job posting analysis, Reddit sentiment tracking, and Google Trends brand momentum — costs approximately $3-8/month in Apify credits at PPE rates plus under $1 in Claude API calls for classification and summarization. The total is under $10/month for intelligence that would otherwise require 10+ hours of manual research weekly.
Try the scraper referenced in this article — live on Apify, pay only for results.
Open reddit-scraper on Apify →Building a Legal & Regulatory Intelligence Pipeline with Court Records, Federal Rules, and Contract Data
Track case law, new federal regulations, and government contract awards automatically. A step-by-step guide to wiring three public-data scrapers into a
The Economic Data Stack: GDP, Trade Flows, and Open Government Data as Clean JSON
Build a macroeconomic intelligence pipeline from authoritative open data. World Bank indicators, bilateral trade flows
Building an Academic Research Data Stack: Crossref, OpenAlex, and Citation-Aware RAG
How to assemble a literature-review and research-intelligence pipeline from open scholarly data. Search 150M+ works, map citation networks