The Mine Works
Browse on Apify
How to Aggregate Job Postings from 500+ Companies Using Public ATS APIs
← All posts
tutorial August 11, 2025 · 6 min read

How to Aggregate Job Postings from 500+ Companies Using Public ATS APIs

Greenhouse, Lever, and Ashby expose zero-auth public job board APIs. This guide shows how to build a job aggregator that pulls from all three and

Try the scraper

The actor referenced in this article is live on Apify. Pay only for results delivered.

Open on Apify →

The most underrated dataset in recruiting tech is the one sitting in plain sight: public job board APIs from Greenhouse, Lever, and Ashby. Every company using these ATSes exposes a fully queryable endpoint with no API key, no rate limiting beyond reasonable use, and structured JSON output.

TL;DR: Build a job aggregator by detecting which ATS each company uses (try Greenhouse, Lever, then Ashby in order), defining a normalized Job dataclass, and running async collection across 500+ companies in parallel with aiohttp. The hardest part is slug discovery — company names in lowercase work for roughly 80% of cases. Full aggregator in about 200 lines of Python.

Most major tech companies use one of these three. Stripe, Coinbase, Notion, Vercel, Linear, Figma, Shopify, and hundreds of others expose their open roles through these APIs. This guide shows how to build a functional job aggregator in about 200 lines of Python.

Step 1: Company Slug Discovery

The hardest part of building an ATS aggregator is not the API calls — it is knowing which company uses which ATS and what their slug is.

Greenhouse slugs: Usually the company’s name in lowercase. stripe, airbnb, notion, coinbase, cloudflare. You can find a company’s Greenhouse slug by visiting jobs.greenhouse.io/{slug} and checking if it redirects to their job board.

Lever slugs: Also usually the company name. Check jobs.lever.co/{slug}.

Ashby slugs: Check jobs.ashbyhq.com/{slug}.

A practical approach is to maintain a company list and detect which ATS each uses:

import asyncio
import aiohttp

COMPANIES = [
    'stripe', 'notion', 'linear', 'vercel', 'supabase',
    'figma', 'loom', 'webflow', 'retool', 'airtable',
    'coda', 'clickup', 'monday', 'asana', 'basecamp',
    # ... add more
]

async def detect_ats(session: aiohttp.ClientSession, slug: str) -> dict:
    """Try all three ATSes and return which one has data."""
    endpoints = {
        'greenhouse': f'https://boards-api.greenhouse.io/v1/boards/{slug}/jobs',
        'lever': f'https://api.lever.co/v0/postings/{slug}?mode=json&limit=200',
        'ashby': f'https://api.ashbyhq.com/posting-api/job-board/{slug}',
    }
    
    for ats, url in endpoints.items():
        try:
            async with session.get(url, timeout=aiohttp.ClientTimeout(total=10)) as resp:
                if resp.status == 200:
                    data = await resp.json()
                    count = (
                        len(data.get('jobs', [])) if ats == 'greenhouse' else
                        len(data) if ats == 'lever' else
                        len(data.get('jobPostings', []))
                    )
                    if count > 0:
                        return {'slug': slug, 'ats': ats, 'count': count}
        except Exception:
            continue
    
    return {'slug': slug, 'ats': None, 'count': 0}

async def build_company_map(companies: list[str]) -> list[dict]:
    async with aiohttp.ClientSession() as session:
        tasks = [detect_ats(session, slug) for slug in companies]
        return await asyncio.gather(*tasks)

company_map = asyncio.run(build_company_map(COMPANIES))
print(f"Found: {sum(1 for c in company_map if c['ats'])} / {len(COMPANIES)} companies")

Step 2: Normalized Data Model

The three ATSes return different schemas. Define a normalized output format before writing any collection code:

from dataclasses import dataclass
from datetime import datetime
from typing import Optional

@dataclass
class Job:
    id: str
    ats: str                    # 'greenhouse' | 'lever' | 'ashby'
    company_slug: str
    title: str
    department: Optional[str]
    location: Optional[str]
    is_remote: Optional[bool]
    employment_type: Optional[str]
    url: str
    description_html: Optional[str]
    description_plain: Optional[str]
    published_at: Optional[datetime]
    collected_at: datetime

Step 3: Per-ATS Collectors

async def collect_greenhouse(session, slug: str) -> list[Job]:
    async with session.get(
        f'https://boards-api.greenhouse.io/v1/boards/{slug}/jobs',
        params={'content': 'true'}  # Include description
    ) as resp:
        data = await resp.json()
    
    jobs = []
    for j in data.get('jobs', []):
        location = j.get('location', {}).get('name', '')
        jobs.append(Job(
            id=str(j['id']),
            ats='greenhouse',
            company_slug=slug,
            title=j['title'],
            department=j['departments'][0]['name'] if j.get('departments') else None,
            location=location,
            is_remote='remote' in location.lower() if location else None,
            employment_type=None,  # Greenhouse doesn't expose this
            url=j['absolute_url'],
            description_html=j.get('content'),
            description_plain=None,
            published_at=datetime.fromisoformat(j['updated_at'].replace('Z', '+00:00')) if j.get('updated_at') else None,
            collected_at=datetime.utcnow(),
        ))
    return jobs

async def collect_lever(session, slug: str) -> list[Job]:
    async with session.get(
        f'https://api.lever.co/v0/postings/{slug}?mode=json&limit=200'
    ) as resp:
        data = await resp.json()
    
    jobs = []
    for j in data:
        cats = j.get('categories', {})
        workplace = cats.get('workplaceType', '').lower()
        jobs.append(Job(
            id=j['id'],
            ats='lever',
            company_slug=slug,
            title=j['text'],
            department=cats.get('department'),
            location=cats.get('location'),
            is_remote=workplace == 'remote',
            employment_type=cats.get('commitment'),
            url=j['hostedUrl'],
            description_html=j.get('description'),
            description_plain=j.get('descriptionPlain'),
            published_at=datetime.fromtimestamp(j['createdAt'] / 1000) if j.get('createdAt') else None,
            collected_at=datetime.utcnow(),
        ))
    return jobs

async def collect_ashby(session, slug: str) -> list[Job]:
    async with session.get(
        f'https://api.ashbyhq.com/posting-api/job-board/{slug}'
    ) as resp:
        data = await resp.json()
    
    jobs = []
    for j in data.get('jobPostings', []):
        jobs.append(Job(
            id=j['id'],
            ats='ashby',
            company_slug=slug,
            title=j['title'],
            department=j.get('teamName'),
            location=j.get('locationName'),
            is_remote=j.get('isRemote', False),
            employment_type=j.get('employmentType'),
            url=f"https://jobs.ashbyhq.com/{slug}/{j['id']}",
            description_html=j.get('jobDescription'),
            description_plain=None,
            published_at=datetime.fromisoformat(j['publishedDate']) if j.get('publishedDate') else None,
            collected_at=datetime.utcnow(),
        ))
    return jobs

Step 4: Parallel Collection

With 500 companies, sequential collection takes too long. Run them concurrently:

COLLECTORS = {
    'greenhouse': collect_greenhouse,
    'lever': collect_lever,
    'ashby': collect_ashby,
}

async def collect_all(company_map: list[dict]) -> list[Job]:
    async with aiohttp.ClientSession() as session:
        tasks = []
        for company in company_map:
            if company['ats']:
                collector = COLLECTORS[company['ats']]
                tasks.append(collector(session, company['slug']))
        
        results = await asyncio.gather(*tasks, return_exceptions=True)
        
        all_jobs = []
        for result in results:
            if isinstance(result, list):
                all_jobs.extend(result)
        
        return all_jobs

all_jobs = asyncio.run(collect_all(company_map))
print(f"Collected {len(all_jobs)} jobs from {len([c for c in company_map if c['ats']])} companies")

Using the Managed Version

If you want this without the infrastructure, our ATS Jobs scraper runs this exact pipeline on Apify:

run = client.actor('themineworks/ats-jobs').call(run_input={
    'companies': COMPANIES,
    'maxJobsPerCompany': 100,
    'includeDescription': True,
})

The output is the same normalized schema regardless of which ATS each company uses.

Frequently Asked Questions

How do you detect which ATS a company uses without manual research?

Try each ATS in order of market share: Greenhouse first (boards.greenhouse.io/{slug}), then Lever (jobs.lever.co/{slug}), then Ashby (jobs.ashbyhq.com/{slug}). The first URL that returns a 200 with job data is the correct ATS. Automate this with aiohttp and run the probes in parallel — detecting the ATS for 500 companies takes under 2 minutes. About 80% of tech companies use Greenhouse, making it the right first check.

What is the best way to handle rate limiting when aggregating hundreds of ATS companies?

Greenhouse, Lever, and Ashby are all public APIs with no authentication, so they have soft rate limits rather than hard API quotas. Use asyncio with a concurrency cap of 10-20 simultaneous requests, add a 0.5-second delay between batches per platform, and implement exponential backoff on 429 responses. A full crawl of 500 companies across all three platforms typically completes in 5-10 minutes at moderate concurrency.

Why do you need a normalized data model when aggregating multiple ATS platforms?

Each ATS returns different field names for the same data: Greenhouse uses title, Lever uses text, Ashby uses jobTitle. Location is a string in Greenhouse but a nested object in Lever. Without a normalized Job dataclass, downstream code has to branch on platform type everywhere — which breaks silently when a platform changes its schema. A normalized model means all consumer code works on a single predictable structure regardless of source.

How do you find the job board slug for a company on Greenhouse, Lever, or Ashby?

The slug is almost always the company name lowercased with spaces replaced by hyphens — stripe, linear, vercel, notion. For companies with uncommon names, check their careers page: the ATS URL is usually visible in links like boards.greenhouse.io/stripe or embedded in the page source. Automated slug detection works for roughly 80% of companies; the remaining 20% require a manual lookup or fuzzy matching against a maintained company-slug database.

What does it take to build a job aggregator covering 500+ tech companies?

The core is a 200-line Python script: ATS detection, async data collection with aiohttp, a normalized Job dataclass, and a simple SQLite store. The hard part is maintaining the company-slug list — companies get acquired, change ATSs, or go dark. Plan for 5-10% of your company list to need updating per quarter. A weekly cron job that runs the full crawl and alerts on companies with sudden zero-job counts catches most breakage before it becomes stale data.

Related Actor

Try the scraper referenced in this article — live on Apify, pay only for results.

Open ats-jobs on Apify →