The Mine Works
Browse on Apify
Recruitment Automation: Building a Job Intelligence Pipeline with Free ATS Data
← All posts
use-case October 20, 2025 · 6 min read

Recruitment Automation: Building a Job Intelligence Pipeline with Free ATS Data

How to use public Greenhouse, Lever, and Ashby APIs to build automated job monitoring, salary benchmarking

Try the scraper

The actor referenced in this article is live on Apify. Pay only for results delivered.

Open on Apify →

Recruitment teams spend hours each week manually researching which companies are hiring for which roles, what compensation ranges look like, and which skills are in demand. This is exactly the kind of research that structured data can automate — and public ATS APIs provide the data for free.

TL;DR: Build three recruitment automation tools from free public ATS APIs: a daily new-posting alert filtered by skill keywords, a salary benchmarking tool that uses Claude to extract salary ranges from job description text (most ATS APIs lack structured salary fields), and a weekly skills demand tracker using regex patterns across job descriptions. Real-time intelligence at the cost of a few API calls.

This guide shows how to build three practical recruitment automation tools using public job board APIs.

Tool 1: Candidate Pipeline Alert System

Monitor a curated list of target companies for new job openings in your specialty areas. Send a daily digest of new postings.

from apify_client import ApifyClient
from datetime import datetime, timedelta
import smtplib
from email.mime.text import MIMEText

client = ApifyClient('YOUR_API_TOKEN')

TARGET_COMPANIES = [
    'stripe', 'notion', 'linear', 'vercel', 'planetscale',
    'neon', 'supabase', 'railway', 'render', 'fly',
]

ALERT_KEYWORDS = [
    'machine learning', 'llm', 'generative ai', 'python',
    'data engineer', 'ml engineer',
]

def get_new_jobs(since_date: datetime) -> list[dict]:
    run = client.actor('themineworks/ats-jobs').call(run_input={
        'companies': TARGET_COMPANIES,
        'maxJobsPerCompany': 50,
        'includeDescription': True,
    })
    
    new_jobs = []
    for job in client.dataset(run['defaultDatasetId']).iterate_items():
        pub_date = job.get('published_at')
        if pub_date:
            try:
                job_date = datetime.fromisoformat(pub_date.replace('Z', '+00:00').replace('+00:00', ''))
                if job_date.date() >= since_date.date():
                    # Check for relevant keywords
                    desc = (job.get('description_plain') or '').lower()
                    title = job.get('title', '').lower()
                    if any(kw in desc or kw in title for kw in ALERT_KEYWORDS):
                        new_jobs.append(job)
            except (ValueError, AttributeError):
                pass
    
    return new_jobs

# Run daily and send digest
yesterday = datetime.utcnow() - timedelta(days=1)
new_jobs = get_new_jobs(yesterday)

if new_jobs:
    body = f"Found {len(new_jobs)} relevant new postings:\n\n"
    for job in new_jobs[:20]:
        body += f"• {job['title']} at {job['company_slug'].upper()}\n"
        body += f"  {job.get('location', 'Location not specified')}"
        if job.get('is_remote'):
            body += " (Remote)"
        body += f"\n  {job['url']}\n\n"
    
    print(body)  # or send email

Tool 2: Salary Benchmarking from Job Descriptions

Most ATS job postings do not include salary data, but many mention salary ranges within the job description text. NLP extraction can surface these ranges for benchmarking.

import re
import anthropic

claude = anthropic.Anthropic()

def extract_salary_from_description(description: str) -> dict | None:
    """Use Claude to extract salary information from job description text."""
    if not description or len(description) < 100:
        return None
    
    # Quick regex check — skip if no numeric patterns that look like salaries
    if not re.search(r'\$\d|\d|£\d|\d+k|\d+,\d{3}', description, re.IGNORECASE):
        return None
    
    response = claude.messages.create(
        model='claude-haiku-4-5-20251001',
        max_tokens=200,
        messages=[{
            'role': 'user',
            'content': f"""Extract salary information from this job description. 
Return JSON: {{"min": number_or_null, "max": number_or_null, "currency": "USD/EUR/GBP/etc", "period": "annual/monthly/hourly"}}
Return null if no salary mentioned.

Description excerpt: {description[:500]}

JSON:"""
        }]
    )
    
    try:
        import json
        return json.loads(response.content[0].text)
    except:
        return None

# Collect and analyze
run = client.actor('themineworks/ats-jobs').call(run_input={
    'companies': ['stripe', 'notion', 'linear', 'vercel', 'figma'],
    'maxJobsPerCompany': 100,
    'includeDescription': True,
})

salary_data = []
for job in client.dataset(run['defaultDatasetId']).iterate_items():
    salary = extract_salary_from_description(job.get('description_plain', ''))
    if salary and salary.get('min'):
        salary_data.append({
            'company': job['company_slug'],
            'title': job['title'],
            'department': job.get('department'),
            'salary_min': salary['min'],
            'salary_max': salary.get('max'),
            'currency': salary.get('currency', 'USD'),
        })

# Summarize by role level
import pandas as pd
df = pd.DataFrame(salary_data)
if not df.empty:
    print(df.groupby('department')[['salary_min', 'salary_max']].mean())

Tool 3: Skills Demand Tracking

Track which technical skills appear most frequently in job postings within your target companies over time. This is leading-edge signal for professional development decisions.

from collections import Counter
import json

SKILL_PATTERNS = {
    'Python': r'\bpython\b',
    'TypeScript': r'\btypescript\b',
    'Rust': r'\brust\b',
    'Go': r'\b(golang|go lang|\bgo\b(?= development| engineer))\b',
    'Kubernetes': r'\bkubernetes\b|\bk8s\b',
    'AWS': r'\baws\b|\bamazon web services\b',
    'LLM/AI': r'\bllm\b|\bgenerative ai\b|\bgpt\b|\bclaude\b|\bai engineer\b',
    'RAG': r'\brag\b|\bretrieval.augmented\b',
    'dbt': r'\bdbt\b|\bdata build tool\b',
    'Spark': r'\bapache spark\b|\bpyspark\b|\bspark\b',
}

def count_skills(jobs: list[dict]) -> Counter:
    counts = Counter()
    for job in jobs:
        desc = (job.get('description_plain') or '').lower()
        title = (job.get('title') or '').lower()
        full_text = f"{title} {desc}"
        
        for skill, pattern in SKILL_PATTERNS.items():
            if re.search(pattern, full_text, re.IGNORECASE):
                counts[skill] += 1
    
    return counts

# Compare skill demand over time
run = client.actor('themineworks/ats-jobs').call(run_input={
    'companies': TARGET_COMPANIES,
    'maxJobsPerCompany': 100,
    'includeDescription': True,
})

jobs = list(client.dataset(run['defaultDatasetId']).iterate_items())
skill_counts = count_skills(jobs)

print(f"\nSkill demand across {len(TARGET_COMPANIES)} companies ({len(jobs)} postings):")
for skill, count in skill_counts.most_common():
    pct = count / len(jobs) * 100
    print(f"  {skill}: {count} jobs ({pct:.0f}%)")

Scheduling for Continuous Intelligence

These tools are most valuable when they run continuously. Set up a weekly schedule:

import schedule
import time

def weekly_job_intelligence():
    print("Running weekly job intelligence pull...")
    
    # 1. Alert on new postings
    new_jobs = get_new_jobs(datetime.utcnow() - timedelta(days=7))
    print(f"New relevant jobs this week: {len(new_jobs)}")
    
    # 2. Update skills demand tracking
    run = client.actor('themineworks/ats-jobs').call(run_input={
        'companies': TARGET_COMPANIES,
        'maxJobsPerCompany': 50,
        'includeDescription': True,
    })
    jobs = list(client.dataset(run['defaultDatasetId']).iterate_items())
    skill_counts = count_skills(jobs)
    
    # Save to database for trend analysis
    save_weekly_snapshot({
        'week': datetime.utcnow().strftime('%Y-W%W'),
        'job_count': len(jobs),
        'skill_counts': dict(skill_counts),
    })

schedule.every().monday.at('09:00').do(weekly_job_intelligence)

The public ATS API approach gives you a real-time view of hiring at the specific companies you care about — without paying for a LinkedIn Recruiter seat or a talent intelligence platform. The data is structured, reliable, and free.

Frequently Asked Questions

How do you monitor target companies for new job postings without a paid talent intelligence platform?

Use public ATS APIs directly: Greenhouse (boards-api.greenhouse.io/v1/boards/{slug}/jobs), Lever (api.lever.co/v0/postings/{slug}), and Ashby (jobs.ashbyhq.com/api/non-user-facing/job-board/{slug}). Store the job IDs you’ve seen in a local database, fetch daily, and alert on any new IDs. This covers the majority of tech companies with no API key, no subscription, and no rate-limit concerns. The entire monitoring pipeline for 100 target companies runs in under 5 minutes.

How can you extract salary data from ATS job descriptions that don’t have structured salary fields?

Most ATS APIs return job descriptions as HTML or plain text without dedicated salary fields — but many descriptions mention salary ranges inline. Apply a quick regex pre-filter for currency symbols and numeric patterns before sending to Claude, to avoid paying for API calls on descriptions that clearly have no salary mention. Claude Haiku can extract min/max salary, currency, and period (annual/monthly/hourly) in JSON format for under $0.001 per description, making it economical to process thousands of postings.

What skills appear most frequently in tech company job descriptions using ATS data?

Based on Q1 2025 analysis across 200+ tech companies: Python appears in 71% of engineering JDs, TypeScript in 58%, AWS in 54%, Kubernetes in 41%, PostgreSQL in 38%, and LLM/AI-related terms in 34% — up from 8% in 2023. The fastest-growing skills are vector database mentions (+280% YoY), RAG-related terms (+210%), and “agentic” or “AI agent” (+190%). These figures update weekly when you run automated skills extraction on fresh ATS data.

How do you schedule recurring job intelligence collection for continuous monitoring?

Use a simple cron job or Python schedule library for single-machine deployments. For Monday morning reports, schedule the collection at 6am local time, process and summarize at 7am, and deliver the report by 8am. Store run logs with timestamps and item counts so you can audit what was collected when. For higher reliability, run on a cloud VM or use Apify’s scheduling feature which handles retries and run history automatically.

What advantage does public ATS API data have over LinkedIn Recruiter for talent intelligence?

Public ATS APIs provide structured, complete job data with reliable uptime and no cost. LinkedIn Recruiter costs $8,000-15,000/year for enterprise access, has rate limits on searches, and returns incomplete data unless you pay for the highest tier. ATS APIs return the full job description, department, posting date, and location in a single request. The data is also more current — ATS APIs reflect postings immediately, while LinkedIn can lag by 24-48 hours. The only advantage LinkedIn has is breadth across non-tech companies.

Related Actor

Try the scraper referenced in this article — live on Apify, pay only for results.

Open ats-jobs on Apify →