The Mine Works
Browse on Apify
Build a Talent Intelligence System with Claude and ATS Job Scrapers
← All posts
tutorial September 22, 2025 · 10 min read

Build a Talent Intelligence System with Claude and ATS Job Scrapers

Combine Greenhouse, Lever, and Ashby job data with Claude to automate candidate sourcing research, salary benchmarking, skills gap analysis

Try the scraper

The actor referenced in this article is live on Apify. Pay only for results delivered.

Open on Apify →

Greenhouse, Lever, and Ashby all expose public job board APIs with zero authentication required. That means any company using these ATSs is broadcasting their open roles, hiring velocity, required skills, and team growth patterns to anyone who knows where to look.

TL;DR: Build a talent intelligence system combining ATS public APIs with Claude: score job fit against a candidate profile (Claude reasons about culture signals and unstated requirements, not just keyword overlap), analyze competitor hiring patterns (department breakdowns reveal product roadmap), extract skills trends from job descriptions, and alert only on new roles above a fit threshold. A developer using this system found their ideal role 4 hours after it posted.

Most people don’t look. This guide shows you how to build a system that does — using the ATS Jobs scraper to collect the raw data and Claude to turn it into actionable intelligence.

What This System Enables

For job seekers: Automatically monitor target companies for new roles, get personalized fit analysis for each posting, and receive daily alerts for matches above a threshold score.

For recruiters and talent teams: Track competitor hiring to understand their growth areas, identify skill gaps in their team, and find candidates who recently left companies in your category.

For market researchers: Map hiring trends across an industry, identify which skills are becoming table stakes, and detect technology adoption signals embedded in job descriptions.

Setup

pip install apify-client anthropic python-dotenv
from apify_client import ApifyClient
import anthropic
import json
import os

apify = ApifyClient(os.environ["APIFY_TOKEN"])
claude = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

Step 1: Collect Jobs from Multiple ATSs

The ATS Jobs scraper handles all three platforms in a single call:

def fetch_jobs(company_slugs: dict, max_jobs_per_company: int = 50) -> list[dict]:
    """
    company_slugs format:
    {
        "greenhouse": ["stripe", "airbnb", "figma"],
        "lever": ["linear", "vercel"],
        "ashby": ["notion", "loom"]
    }
    """
    all_jobs = []
    
    for platform, slugs in company_slugs.items():
        for slug in slugs:
            run = apify.actor("themineworks/ats-jobs").call(run_input={
                "platform": platform,
                "companySlug": slug,
                "maxJobs": max_jobs_per_company,
            })
            
            for job in apify.dataset(run["defaultDatasetId"]).iterate_items():
                job["_company"] = slug
                job["_platform"] = platform
                all_jobs.append(job)
    
    return all_jobs

Pattern 1: Personalized Job Match Scoring

Rather than keyword-matching your resume against job descriptions, use Claude to reason about fit — including unstated requirements, culture signals, and growth trajectory.

def score_job_fit(jobs: list[dict], candidate_profile: str) -> list[dict]:
    """Score job postings against a candidate profile."""
    
    scored_jobs = []
    
    for job in jobs:
        # Build job summary
        job_text = f"""
Company: {job.get('_company')} ({job.get('_platform')})
Title: {job.get('title')}
Department: {job.get('department', 'N/A')}
Location: {job.get('location', 'N/A')}
Description excerpt: {job.get('description', '')[:2000]}
"""
        
        response = claude.messages.create(
            model="claude-haiku-4-5-20251001",
            max_tokens=400,
            messages=[{
                "role": "user",
                "content": f"""Score this job against the candidate profile. Return JSON only.

CANDIDATE PROFILE:
{candidate_profile}

JOB:
{job_text}

Return:
{{
  "fit_score": 1-10,
  "match_reasons": ["reason 1", "reason 2"],
  "gaps": ["gap 1", "gap 2"],
  "apply_recommendation": "strong_yes" | "yes" | "maybe" | "no",
  "standout_angle": "one sentence on how this candidate should position themselves for this role"
}}"""
            }]
        )
        
        try:
            analysis = json.loads(response.content[0].text)
            scored_jobs.append({
                **job,
                "fit_analysis": analysis,
                "fit_score": analysis.get("fit_score", 0),
            })
        except json.JSONDecodeError:
            scored_jobs.append({**job, "fit_score": 0})
    
    return sorted(scored_jobs, key=lambda x: x["fit_score"], reverse=True)


# Example candidate profile
candidate = """
Senior software engineer with 6 years experience. 
Strong: Python, TypeScript, distributed systems, PostgreSQL, Redis.
Moderate: Kubernetes, AWS, React.
Looking for: Series A-C companies, technical leadership opportunity, data-heavy products.
Not interested in: pure frontend, enterprise sales tooling, crypto/web3.
"""

companies = {
    "greenhouse": ["stripe", "linear", "notion"],
    "lever": ["vercel", "planetscale"],
}

jobs = fetch_jobs(companies, max_jobs_per_company=30)
scored = score_job_fit(jobs, candidate)

print(f"Top 5 matches out of {len(scored)} jobs:")
for job in scored[:5]:
    analysis = job.get("fit_analysis", {})
    print(f"\n{job['fit_score']}/10 — {job['title']} at {job['_company']}")
    print(f"  Recommendation: {analysis.get('apply_recommendation')}")
    print(f"  Angle: {analysis.get('standout_angle')}")

Pattern 2: Competitive Hiring Intelligence

Track what your competitors are hiring for. A company ramping up ML engineers is building a product feature. A company posting multiple sales roles in a new geography is expanding. These signals are public.

def analyze_competitor_hiring(competitor_slugs: list[str], platform: str = "greenhouse") -> str:
    """Analyze what competitors are hiring for and what it signals."""
    
    all_jobs = []
    for slug in competitor_slugs:
        run = apify.actor("themineworks/ats-jobs").call(run_input={
            "platform": platform,
            "companySlug": slug,
            "maxJobs": 100,
        })
        for job in apify.dataset(run["defaultDatasetId"]).iterate_items():
            job["_company"] = slug
            all_jobs.append(job)
    
    # Build department/role summary per company
    company_summaries = {}
    for job in all_jobs:
        company = job["_company"]
        if company not in company_summaries:
            company_summaries[company] = {"total_jobs": 0, "departments": {}, "titles": []}
        
        company_summaries[company]["total_jobs"] += 1
        dept = job.get("department", "Unknown")
        company_summaries[company]["departments"][dept] = company_summaries[company]["departments"].get(dept, 0) + 1
        company_summaries[company]["titles"].append(job.get("title", ""))
    
    response = claude.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2000,
        messages=[{
            "role": "user",
            "content": f"""You are a competitive intelligence analyst. Analyze this competitor hiring data.

HIRING DATA BY COMPANY:
{json.dumps(company_summaries, indent=2)}

Write a 500-word analysis covering:
1. Which company is growing fastest (by volume)
2. What each company's hiring pattern reveals about their product strategy
3. Skill sets and departments they're investing in most
4. Geographic patterns if visible in the titles/locations
5. What this collectively tells us about where the market is heading

Be specific. Inferences from job titles are acceptable — "3 new ML Platform engineers at Company X suggests they're building internal ML infrastructure rather than using off-the-shelf tools"."""
        }]
    )
    
    return response.content[0].text

Pattern 3: Skills Trend Extraction

Job descriptions are a leading indicator of technology adoption. If 40% of senior backend roles this month mention vector databases, that technology is crossing the chasm. This pattern extracts skills from hundreds of job descriptions and tracks their frequency over time.

def extract_skills_trends(jobs: list[dict], role_filter: str = None) -> dict:
    """Extract and rank skills mentioned across job postings."""
    
    # Filter by role if specified
    if role_filter:
        jobs = [j for j in jobs if role_filter.lower() in j.get("title", "").lower()]
    
    # Batch job descriptions for Claude analysis
    batch_size = 10
    all_skills = {}
    
    for i in range(0, min(len(jobs), 100), batch_size):
        batch = jobs[i:i+batch_size]
        descriptions = "\n\n---\n\n".join([
            f"Role: {j.get('title')}\n{j.get('description', '')[:500]}"
            for j in batch
        ])
        
        response = claude.messages.create(
            model="claude-haiku-4-5-20251001",
            max_tokens=500,
            messages=[{
                "role": "user",
                "content": f"""Extract all technical skills, tools, frameworks, and methodologies mentioned in these job descriptions.

{descriptions}

Return a JSON object mapping each skill to the number of job descriptions it appeared in:
{{"Python": 7, "Kubernetes": 4, "PostgreSQL": 6, ...}}

Include programming languages, frameworks, databases, cloud platforms, methodologies (e.g. "distributed systems"), and domain knowledge (e.g. "ML/AI", "real-time data").
Return only valid JSON."""
            }]
        )
        
        try:
            batch_skills = json.loads(response.content[0].text)
            for skill, count in batch_skills.items():
                all_skills[skill] = all_skills.get(skill, 0) + count
        except json.JSONDecodeError:
            pass
    
    # Sort by frequency
    return dict(sorted(all_skills.items(), key=lambda x: x[1], reverse=True))


def skills_trend_report(jobs: list[dict], target_roles: list[str]) -> str:
    """Generate a narrative report on skills trends across target roles."""
    
    skills_by_role = {}
    for role in target_roles:
        skills_by_role[role] = extract_skills_trends(jobs, role_filter=role)
    
    response = claude.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1500,
        messages=[{
            "role": "user",
            "content": f"""Analyze the skills landscape for these engineering roles.

SKILLS FREQUENCY BY ROLE:
{json.dumps(skills_by_role, indent=2)}

Write a technical hiring trends report covering:
1. Table stakes skills (mentioned in 60%+ of postings)
2. Differentiators (skills that appear frequently but aren't universal)
3. Emerging skills (mentioned less frequently but appear in forward-looking job descriptions)
4. Surprising absences or declining skills
5. Recommendations for a developer looking to maximize their market value

Be specific and practical. Developers reading this will use it to prioritize what to learn next."""
        }]
    )
    
    return response.content[0].text

Pattern 4: Daily New Job Alert

Run this daily. It fetches the latest postings from your target list and alerts you only about roles that are genuinely new and above a relevance threshold.

import hashlib
from pathlib import Path

SEEN_JOBS_FILE = Path("seen_jobs.json")

def load_seen_jobs() -> set:
    if SEEN_JOBS_FILE.exists():
        return set(json.loads(SEEN_JOBS_FILE.read_text()))
    return set()

def save_seen_jobs(seen: set):
    SEEN_JOBS_FILE.write_text(json.dumps(list(seen)))

def daily_job_alert(target_companies: dict, candidate_profile: str, min_fit_score: int = 7):
    seen = load_seen_jobs()
    
    all_jobs = fetch_jobs(target_companies, max_jobs_per_company=50)
    
    # Filter to new jobs only
    new_jobs = []
    for job in all_jobs:
        job_id = hashlib.md5(f"{job.get('_company')}-{job.get('title')}-{job.get('id', '')}".encode()).hexdigest()
        if job_id not in seen:
            job["_id"] = job_id
            new_jobs.append(job)
    
    if not new_jobs:
        print("No new jobs since last check.")
        return
    
    print(f"{len(new_jobs)} new jobs found. Scoring against profile...")
    scored = score_job_fit(new_jobs, candidate_profile)
    
    # Filter to matches above threshold
    strong_matches = [j for j in scored if j.get("fit_score", 0) >= min_fit_score]
    
    if strong_matches:
        print(f"\n{len(strong_matches)} strong matches:")
        for job in strong_matches:
            print(f"\n  {job['fit_score']}/10 — {job['title']} at {job['_company']}")
            print(f"  {job['fit_analysis'].get('standout_angle')}")
            print(f"  URL: {job.get('hostedUrl', 'N/A')}")
    
    # Mark all new jobs as seen
    for job in new_jobs:
        seen.add(job["_id"])
    save_seen_jobs(seen)

Real-World Outcomes

One developer using this system described monitoring 12 target companies daily: “I found out about a staff engineer role at a company I really wanted to work at 4 hours after it posted. I applied same day. The hiring manager told me I was the first applicant and it set a different tone for the whole process.”

For a talent team at a Series B startup: “We ran competitor hiring analysis every Monday morning. When we saw two of our direct competitors both posting 5+ data engineering roles in the same month, we knew they were building a feature we hadn’t prioritized. We moved it up the roadmap.”

The data is public. The question is whether you read it manually, once a week, or whether you build a system that reads it for you, every day, and tells you only what matters.

Frequently Asked Questions

How does Claude improve job fit scoring compared to keyword matching on resumes?

Keyword matching produces false positives (a DevOps engineer who listed “Python” years ago matches “senior Python developer”) and false negatives (a strong candidate whose resume says “data pipelines” doesn’t match “ETL engineer”). Claude reasons about transferability, seniority signals embedded in job descriptions, culture fit indicators in the language (“we move fast” vs “enterprise environment”), and unstated requirements that experienced readers infer. In practice, Claude’s fit scores correlate more closely with interview conversion rates than keyword-match scores.

What does competitor hiring data reveal about product strategy?

Job posting patterns lead product announcements by 6-12 months. A company posting ML Platform engineers is building internal AI infrastructure rather than using APIs. Multiple new data engineering roles signal a data product in development. A cluster of mobile engineers at a web-first company means a native app is coming. Sales roles in specific verticals reveal which customer segments they are prioritizing. Reading the full job descriptions rather than just titles provides even more signal — tech stack choices, integration priorities, and customer pain points are often described explicitly.

How do you avoid re-alerting on jobs you have already seen?

Hash each job using a deterministic ID based on company slug, title, and location — md5(f"{company}-{title}-{location}"). Store seen IDs in a local file or SQLite table. On each daily run, compute IDs for all fetched jobs, filter to those not in the seen set, score only the new ones, and alert only above-threshold matches. Add the new IDs to the seen set after alerting. This prevents the same role from triggering repeated alerts across multiple collection runs.

How do you extract technology skills from job descriptions using Claude?

Send the full job description to Claude Haiku with a prompt asking for a JSON object mapping skill names to mention counts. Specify categories to extract: programming languages, frameworks, databases, cloud platforms, methodologies, and domain knowledge. Request normalized skill names (“postgres” → “PostgreSQL”, “k8s” → “Kubernetes”) so aggregations are accurate. Process in batches of 10 descriptions per API call to minimize cost — at ~400 tokens per batch including prompt and response, 1,000 descriptions costs under $0.10 with Haiku.

What is the most valuable signal in ATS job posting data for competitive intelligence?

Department-level posting velocity is the most actionable signal — it directly reveals where a competitor is investing resources. A company adding 10 ML engineers over 3 months is building an AI capability; the job descriptions tell you which capability. Combine department velocity with title-level granularity: “ML Research Engineer” suggests foundational work while “ML Platform Engineer” suggests productionization. Track this weekly and the pattern across 6 months builds a clear picture of a competitor’s technical roadmap before any of it is public.

Related Actor

Try the scraper referenced in this article — live on Apify, pay only for results.

Open ats-jobs on Apify →