Build a Talent Intelligence System with Claude and ATS Job Scrapers
Combine Greenhouse, Lever, and Ashby job data with Claude to automate candidate sourcing research, salary benchmarking, skills gap analysis
The actor referenced in this article is live on Apify. Pay only for results delivered.
Greenhouse, Lever, and Ashby all expose public job board APIs with zero authentication required. That means any company using these ATSs is broadcasting their open roles, hiring velocity, required skills, and team growth patterns to anyone who knows where to look.
TL;DR: Build a talent intelligence system combining ATS public APIs with Claude: score job fit against a candidate profile (Claude reasons about culture signals and unstated requirements, not just keyword overlap), analyze competitor hiring patterns (department breakdowns reveal product roadmap), extract skills trends from job descriptions, and alert only on new roles above a fit threshold. A developer using this system found their ideal role 4 hours after it posted.
Most people don’t look. This guide shows you how to build a system that does — using the ATS Jobs scraper to collect the raw data and Claude to turn it into actionable intelligence.
What This System Enables
For job seekers: Automatically monitor target companies for new roles, get personalized fit analysis for each posting, and receive daily alerts for matches above a threshold score.
For recruiters and talent teams: Track competitor hiring to understand their growth areas, identify skill gaps in their team, and find candidates who recently left companies in your category.
For market researchers: Map hiring trends across an industry, identify which skills are becoming table stakes, and detect technology adoption signals embedded in job descriptions.
Setup
pip install apify-client anthropic python-dotenv
from apify_client import ApifyClient
import anthropic
import json
import os
apify = ApifyClient(os.environ["APIFY_TOKEN"])
claude = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
Step 1: Collect Jobs from Multiple ATSs
The ATS Jobs scraper handles all three platforms in a single call:
def fetch_jobs(company_slugs: dict, max_jobs_per_company: int = 50) -> list[dict]:
"""
company_slugs format:
{
"greenhouse": ["stripe", "airbnb", "figma"],
"lever": ["linear", "vercel"],
"ashby": ["notion", "loom"]
}
"""
all_jobs = []
for platform, slugs in company_slugs.items():
for slug in slugs:
run = apify.actor("themineworks/ats-jobs").call(run_input={
"platform": platform,
"companySlug": slug,
"maxJobs": max_jobs_per_company,
})
for job in apify.dataset(run["defaultDatasetId"]).iterate_items():
job["_company"] = slug
job["_platform"] = platform
all_jobs.append(job)
return all_jobs
Pattern 1: Personalized Job Match Scoring
Rather than keyword-matching your resume against job descriptions, use Claude to reason about fit — including unstated requirements, culture signals, and growth trajectory.
def score_job_fit(jobs: list[dict], candidate_profile: str) -> list[dict]:
"""Score job postings against a candidate profile."""
scored_jobs = []
for job in jobs:
# Build job summary
job_text = f"""
Company: {job.get('_company')} ({job.get('_platform')})
Title: {job.get('title')}
Department: {job.get('department', 'N/A')}
Location: {job.get('location', 'N/A')}
Description excerpt: {job.get('description', '')[:2000]}
"""
response = claude.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=400,
messages=[{
"role": "user",
"content": f"""Score this job against the candidate profile. Return JSON only.
CANDIDATE PROFILE:
{candidate_profile}
JOB:
{job_text}
Return:
{{
"fit_score": 1-10,
"match_reasons": ["reason 1", "reason 2"],
"gaps": ["gap 1", "gap 2"],
"apply_recommendation": "strong_yes" | "yes" | "maybe" | "no",
"standout_angle": "one sentence on how this candidate should position themselves for this role"
}}"""
}]
)
try:
analysis = json.loads(response.content[0].text)
scored_jobs.append({
**job,
"fit_analysis": analysis,
"fit_score": analysis.get("fit_score", 0),
})
except json.JSONDecodeError:
scored_jobs.append({**job, "fit_score": 0})
return sorted(scored_jobs, key=lambda x: x["fit_score"], reverse=True)
# Example candidate profile
candidate = """
Senior software engineer with 6 years experience.
Strong: Python, TypeScript, distributed systems, PostgreSQL, Redis.
Moderate: Kubernetes, AWS, React.
Looking for: Series A-C companies, technical leadership opportunity, data-heavy products.
Not interested in: pure frontend, enterprise sales tooling, crypto/web3.
"""
companies = {
"greenhouse": ["stripe", "linear", "notion"],
"lever": ["vercel", "planetscale"],
}
jobs = fetch_jobs(companies, max_jobs_per_company=30)
scored = score_job_fit(jobs, candidate)
print(f"Top 5 matches out of {len(scored)} jobs:")
for job in scored[:5]:
analysis = job.get("fit_analysis", {})
print(f"\n{job['fit_score']}/10 — {job['title']} at {job['_company']}")
print(f" Recommendation: {analysis.get('apply_recommendation')}")
print(f" Angle: {analysis.get('standout_angle')}")
Pattern 2: Competitive Hiring Intelligence
Track what your competitors are hiring for. A company ramping up ML engineers is building a product feature. A company posting multiple sales roles in a new geography is expanding. These signals are public.
def analyze_competitor_hiring(competitor_slugs: list[str], platform: str = "greenhouse") -> str:
"""Analyze what competitors are hiring for and what it signals."""
all_jobs = []
for slug in competitor_slugs:
run = apify.actor("themineworks/ats-jobs").call(run_input={
"platform": platform,
"companySlug": slug,
"maxJobs": 100,
})
for job in apify.dataset(run["defaultDatasetId"]).iterate_items():
job["_company"] = slug
all_jobs.append(job)
# Build department/role summary per company
company_summaries = {}
for job in all_jobs:
company = job["_company"]
if company not in company_summaries:
company_summaries[company] = {"total_jobs": 0, "departments": {}, "titles": []}
company_summaries[company]["total_jobs"] += 1
dept = job.get("department", "Unknown")
company_summaries[company]["departments"][dept] = company_summaries[company]["departments"].get(dept, 0) + 1
company_summaries[company]["titles"].append(job.get("title", ""))
response = claude.messages.create(
model="claude-sonnet-4-6",
max_tokens=2000,
messages=[{
"role": "user",
"content": f"""You are a competitive intelligence analyst. Analyze this competitor hiring data.
HIRING DATA BY COMPANY:
{json.dumps(company_summaries, indent=2)}
Write a 500-word analysis covering:
1. Which company is growing fastest (by volume)
2. What each company's hiring pattern reveals about their product strategy
3. Skill sets and departments they're investing in most
4. Geographic patterns if visible in the titles/locations
5. What this collectively tells us about where the market is heading
Be specific. Inferences from job titles are acceptable — "3 new ML Platform engineers at Company X suggests they're building internal ML infrastructure rather than using off-the-shelf tools"."""
}]
)
return response.content[0].text
Pattern 3: Skills Trend Extraction
Job descriptions are a leading indicator of technology adoption. If 40% of senior backend roles this month mention vector databases, that technology is crossing the chasm. This pattern extracts skills from hundreds of job descriptions and tracks their frequency over time.
def extract_skills_trends(jobs: list[dict], role_filter: str = None) -> dict:
"""Extract and rank skills mentioned across job postings."""
# Filter by role if specified
if role_filter:
jobs = [j for j in jobs if role_filter.lower() in j.get("title", "").lower()]
# Batch job descriptions for Claude analysis
batch_size = 10
all_skills = {}
for i in range(0, min(len(jobs), 100), batch_size):
batch = jobs[i:i+batch_size]
descriptions = "\n\n---\n\n".join([
f"Role: {j.get('title')}\n{j.get('description', '')[:500]}"
for j in batch
])
response = claude.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=500,
messages=[{
"role": "user",
"content": f"""Extract all technical skills, tools, frameworks, and methodologies mentioned in these job descriptions.
{descriptions}
Return a JSON object mapping each skill to the number of job descriptions it appeared in:
{{"Python": 7, "Kubernetes": 4, "PostgreSQL": 6, ...}}
Include programming languages, frameworks, databases, cloud platforms, methodologies (e.g. "distributed systems"), and domain knowledge (e.g. "ML/AI", "real-time data").
Return only valid JSON."""
}]
)
try:
batch_skills = json.loads(response.content[0].text)
for skill, count in batch_skills.items():
all_skills[skill] = all_skills.get(skill, 0) + count
except json.JSONDecodeError:
pass
# Sort by frequency
return dict(sorted(all_skills.items(), key=lambda x: x[1], reverse=True))
def skills_trend_report(jobs: list[dict], target_roles: list[str]) -> str:
"""Generate a narrative report on skills trends across target roles."""
skills_by_role = {}
for role in target_roles:
skills_by_role[role] = extract_skills_trends(jobs, role_filter=role)
response = claude.messages.create(
model="claude-sonnet-4-6",
max_tokens=1500,
messages=[{
"role": "user",
"content": f"""Analyze the skills landscape for these engineering roles.
SKILLS FREQUENCY BY ROLE:
{json.dumps(skills_by_role, indent=2)}
Write a technical hiring trends report covering:
1. Table stakes skills (mentioned in 60%+ of postings)
2. Differentiators (skills that appear frequently but aren't universal)
3. Emerging skills (mentioned less frequently but appear in forward-looking job descriptions)
4. Surprising absences or declining skills
5. Recommendations for a developer looking to maximize their market value
Be specific and practical. Developers reading this will use it to prioritize what to learn next."""
}]
)
return response.content[0].text
Pattern 4: Daily New Job Alert
Run this daily. It fetches the latest postings from your target list and alerts you only about roles that are genuinely new and above a relevance threshold.
import hashlib
from pathlib import Path
SEEN_JOBS_FILE = Path("seen_jobs.json")
def load_seen_jobs() -> set:
if SEEN_JOBS_FILE.exists():
return set(json.loads(SEEN_JOBS_FILE.read_text()))
return set()
def save_seen_jobs(seen: set):
SEEN_JOBS_FILE.write_text(json.dumps(list(seen)))
def daily_job_alert(target_companies: dict, candidate_profile: str, min_fit_score: int = 7):
seen = load_seen_jobs()
all_jobs = fetch_jobs(target_companies, max_jobs_per_company=50)
# Filter to new jobs only
new_jobs = []
for job in all_jobs:
job_id = hashlib.md5(f"{job.get('_company')}-{job.get('title')}-{job.get('id', '')}".encode()).hexdigest()
if job_id not in seen:
job["_id"] = job_id
new_jobs.append(job)
if not new_jobs:
print("No new jobs since last check.")
return
print(f"{len(new_jobs)} new jobs found. Scoring against profile...")
scored = score_job_fit(new_jobs, candidate_profile)
# Filter to matches above threshold
strong_matches = [j for j in scored if j.get("fit_score", 0) >= min_fit_score]
if strong_matches:
print(f"\n{len(strong_matches)} strong matches:")
for job in strong_matches:
print(f"\n {job['fit_score']}/10 — {job['title']} at {job['_company']}")
print(f" {job['fit_analysis'].get('standout_angle')}")
print(f" URL: {job.get('hostedUrl', 'N/A')}")
# Mark all new jobs as seen
for job in new_jobs:
seen.add(job["_id"])
save_seen_jobs(seen)
Real-World Outcomes
One developer using this system described monitoring 12 target companies daily: “I found out about a staff engineer role at a company I really wanted to work at 4 hours after it posted. I applied same day. The hiring manager told me I was the first applicant and it set a different tone for the whole process.”
For a talent team at a Series B startup: “We ran competitor hiring analysis every Monday morning. When we saw two of our direct competitors both posting 5+ data engineering roles in the same month, we knew they were building a feature we hadn’t prioritized. We moved it up the roadmap.”
The data is public. The question is whether you read it manually, once a week, or whether you build a system that reads it for you, every day, and tells you only what matters.
Frequently Asked Questions
How does Claude improve job fit scoring compared to keyword matching on resumes?
Keyword matching produces false positives (a DevOps engineer who listed “Python” years ago matches “senior Python developer”) and false negatives (a strong candidate whose resume says “data pipelines” doesn’t match “ETL engineer”). Claude reasons about transferability, seniority signals embedded in job descriptions, culture fit indicators in the language (“we move fast” vs “enterprise environment”), and unstated requirements that experienced readers infer. In practice, Claude’s fit scores correlate more closely with interview conversion rates than keyword-match scores.
What does competitor hiring data reveal about product strategy?
Job posting patterns lead product announcements by 6-12 months. A company posting ML Platform engineers is building internal AI infrastructure rather than using APIs. Multiple new data engineering roles signal a data product in development. A cluster of mobile engineers at a web-first company means a native app is coming. Sales roles in specific verticals reveal which customer segments they are prioritizing. Reading the full job descriptions rather than just titles provides even more signal — tech stack choices, integration priorities, and customer pain points are often described explicitly.
How do you avoid re-alerting on jobs you have already seen?
Hash each job using a deterministic ID based on company slug, title, and location — md5(f"{company}-{title}-{location}"). Store seen IDs in a local file or SQLite table. On each daily run, compute IDs for all fetched jobs, filter to those not in the seen set, score only the new ones, and alert only above-threshold matches. Add the new IDs to the seen set after alerting. This prevents the same role from triggering repeated alerts across multiple collection runs.
How do you extract technology skills from job descriptions using Claude?
Send the full job description to Claude Haiku with a prompt asking for a JSON object mapping skill names to mention counts. Specify categories to extract: programming languages, frameworks, databases, cloud platforms, methodologies, and domain knowledge. Request normalized skill names (“postgres” → “PostgreSQL”, “k8s” → “Kubernetes”) so aggregations are accurate. Process in batches of 10 descriptions per API call to minimize cost — at ~400 tokens per batch including prompt and response, 1,000 descriptions costs under $0.10 with Haiku.
What is the most valuable signal in ATS job posting data for competitive intelligence?
Department-level posting velocity is the most actionable signal — it directly reveals where a competitor is investing resources. A company adding 10 ML engineers over 3 months is building an AI capability; the job descriptions tell you which capability. Combine department velocity with title-level granularity: “ML Research Engineer” suggests foundational work while “ML Platform Engineer” suggests productionization. Track this weekly and the pattern across 6 months builds a clear picture of a competitor’s technical roadmap before any of it is public.
Try the scraper referenced in this article — live on Apify, pay only for results.
Open ats-jobs on Apify →How to Scrape AmbitionBox Company Reviews and Ratings
AmbitionBox is India largest employer review platform with 300,000 companies. Learn how to pull ratings, review counts, salary data, and dimension scores as structured JSON without any official API.
AliExpress Product Data API: Prices, Ratings, and Orders in Python
AliExpress affiliate API has restricted coverage. Learn how to scrape AliExpress product listings for prices, ratings, order counts, and seller data as structured JSON — no affiliate approval needed.
ClinicalTrials.gov API v2: How to Search 500,000 Studies and Track Trial Status
ClinicalTrials.gov upgraded to a v2 REST API in 2024. Here is how to use it, what changed from v1, and how to build automated trial monitoring pipelines in Python.