How to Aggregate Job Postings from 500+ Companies Using Public ATS APIs
Greenhouse, Lever, and Ashby expose zero-auth public job board APIs. This guide shows how to build a job aggregator that pulls from all three and
The actor referenced in this article is live on Apify. Pay only for results delivered.
The most underrated dataset in recruiting tech is the one sitting in plain sight: public job board APIs from Greenhouse, Lever, and Ashby. Every company using these ATSes exposes a fully queryable endpoint with no API key, no rate limiting beyond reasonable use, and structured JSON output.
TL;DR: Build a job aggregator by detecting which ATS each company uses (try Greenhouse, Lever, then Ashby in order), defining a normalized Job dataclass, and running async collection across 500+ companies in parallel with aiohttp. The hardest part is slug discovery — company names in lowercase work for roughly 80% of cases. Full aggregator in about 200 lines of Python.
Most major tech companies use one of these three. Stripe, Coinbase, Notion, Vercel, Linear, Figma, Shopify, and hundreds of others expose their open roles through these APIs. This guide shows how to build a functional job aggregator in about 200 lines of Python.
Step 1: Company Slug Discovery
The hardest part of building an ATS aggregator is not the API calls — it is knowing which company uses which ATS and what their slug is.
Greenhouse slugs: Usually the company’s name in lowercase. stripe, airbnb, notion, coinbase, cloudflare. You can find a company’s Greenhouse slug by visiting jobs.greenhouse.io/{slug} and checking if it redirects to their job board.
Lever slugs: Also usually the company name. Check jobs.lever.co/{slug}.
Ashby slugs: Check jobs.ashbyhq.com/{slug}.
A practical approach is to maintain a company list and detect which ATS each uses:
import asyncio
import aiohttp
COMPANIES = [
'stripe', 'notion', 'linear', 'vercel', 'supabase',
'figma', 'loom', 'webflow', 'retool', 'airtable',
'coda', 'clickup', 'monday', 'asana', 'basecamp',
# ... add more
]
async def detect_ats(session: aiohttp.ClientSession, slug: str) -> dict:
"""Try all three ATSes and return which one has data."""
endpoints = {
'greenhouse': f'https://boards-api.greenhouse.io/v1/boards/{slug}/jobs',
'lever': f'https://api.lever.co/v0/postings/{slug}?mode=json&limit=200',
'ashby': f'https://api.ashbyhq.com/posting-api/job-board/{slug}',
}
for ats, url in endpoints.items():
try:
async with session.get(url, timeout=aiohttp.ClientTimeout(total=10)) as resp:
if resp.status == 200:
data = await resp.json()
count = (
len(data.get('jobs', [])) if ats == 'greenhouse' else
len(data) if ats == 'lever' else
len(data.get('jobPostings', []))
)
if count > 0:
return {'slug': slug, 'ats': ats, 'count': count}
except Exception:
continue
return {'slug': slug, 'ats': None, 'count': 0}
async def build_company_map(companies: list[str]) -> list[dict]:
async with aiohttp.ClientSession() as session:
tasks = [detect_ats(session, slug) for slug in companies]
return await asyncio.gather(*tasks)
company_map = asyncio.run(build_company_map(COMPANIES))
print(f"Found: {sum(1 for c in company_map if c['ats'])} / {len(COMPANIES)} companies")
Step 2: Normalized Data Model
The three ATSes return different schemas. Define a normalized output format before writing any collection code:
from dataclasses import dataclass
from datetime import datetime
from typing import Optional
@dataclass
class Job:
id: str
ats: str # 'greenhouse' | 'lever' | 'ashby'
company_slug: str
title: str
department: Optional[str]
location: Optional[str]
is_remote: Optional[bool]
employment_type: Optional[str]
url: str
description_html: Optional[str]
description_plain: Optional[str]
published_at: Optional[datetime]
collected_at: datetime
Step 3: Per-ATS Collectors
async def collect_greenhouse(session, slug: str) -> list[Job]:
async with session.get(
f'https://boards-api.greenhouse.io/v1/boards/{slug}/jobs',
params={'content': 'true'} # Include description
) as resp:
data = await resp.json()
jobs = []
for j in data.get('jobs', []):
location = j.get('location', {}).get('name', '')
jobs.append(Job(
id=str(j['id']),
ats='greenhouse',
company_slug=slug,
title=j['title'],
department=j['departments'][0]['name'] if j.get('departments') else None,
location=location,
is_remote='remote' in location.lower() if location else None,
employment_type=None, # Greenhouse doesn't expose this
url=j['absolute_url'],
description_html=j.get('content'),
description_plain=None,
published_at=datetime.fromisoformat(j['updated_at'].replace('Z', '+00:00')) if j.get('updated_at') else None,
collected_at=datetime.utcnow(),
))
return jobs
async def collect_lever(session, slug: str) -> list[Job]:
async with session.get(
f'https://api.lever.co/v0/postings/{slug}?mode=json&limit=200'
) as resp:
data = await resp.json()
jobs = []
for j in data:
cats = j.get('categories', {})
workplace = cats.get('workplaceType', '').lower()
jobs.append(Job(
id=j['id'],
ats='lever',
company_slug=slug,
title=j['text'],
department=cats.get('department'),
location=cats.get('location'),
is_remote=workplace == 'remote',
employment_type=cats.get('commitment'),
url=j['hostedUrl'],
description_html=j.get('description'),
description_plain=j.get('descriptionPlain'),
published_at=datetime.fromtimestamp(j['createdAt'] / 1000) if j.get('createdAt') else None,
collected_at=datetime.utcnow(),
))
return jobs
async def collect_ashby(session, slug: str) -> list[Job]:
async with session.get(
f'https://api.ashbyhq.com/posting-api/job-board/{slug}'
) as resp:
data = await resp.json()
jobs = []
for j in data.get('jobPostings', []):
jobs.append(Job(
id=j['id'],
ats='ashby',
company_slug=slug,
title=j['title'],
department=j.get('teamName'),
location=j.get('locationName'),
is_remote=j.get('isRemote', False),
employment_type=j.get('employmentType'),
url=f"https://jobs.ashbyhq.com/{slug}/{j['id']}",
description_html=j.get('jobDescription'),
description_plain=None,
published_at=datetime.fromisoformat(j['publishedDate']) if j.get('publishedDate') else None,
collected_at=datetime.utcnow(),
))
return jobs
Step 4: Parallel Collection
With 500 companies, sequential collection takes too long. Run them concurrently:
COLLECTORS = {
'greenhouse': collect_greenhouse,
'lever': collect_lever,
'ashby': collect_ashby,
}
async def collect_all(company_map: list[dict]) -> list[Job]:
async with aiohttp.ClientSession() as session:
tasks = []
for company in company_map:
if company['ats']:
collector = COLLECTORS[company['ats']]
tasks.append(collector(session, company['slug']))
results = await asyncio.gather(*tasks, return_exceptions=True)
all_jobs = []
for result in results:
if isinstance(result, list):
all_jobs.extend(result)
return all_jobs
all_jobs = asyncio.run(collect_all(company_map))
print(f"Collected {len(all_jobs)} jobs from {len([c for c in company_map if c['ats']])} companies")
Using the Managed Version
If you want this without the infrastructure, our ATS Jobs scraper runs this exact pipeline on Apify:
run = client.actor('themineworks/ats-jobs').call(run_input={
'companies': COMPANIES,
'maxJobsPerCompany': 100,
'includeDescription': True,
})
The output is the same normalized schema regardless of which ATS each company uses.
Frequently Asked Questions
How do you detect which ATS a company uses without manual research?
Try each ATS in order of market share: Greenhouse first (boards.greenhouse.io/{slug}), then Lever (jobs.lever.co/{slug}), then Ashby (jobs.ashbyhq.com/{slug}). The first URL that returns a 200 with job data is the correct ATS. Automate this with aiohttp and run the probes in parallel — detecting the ATS for 500 companies takes under 2 minutes. About 80% of tech companies use Greenhouse, making it the right first check.
What is the best way to handle rate limiting when aggregating hundreds of ATS companies?
Greenhouse, Lever, and Ashby are all public APIs with no authentication, so they have soft rate limits rather than hard API quotas. Use asyncio with a concurrency cap of 10-20 simultaneous requests, add a 0.5-second delay between batches per platform, and implement exponential backoff on 429 responses. A full crawl of 500 companies across all three platforms typically completes in 5-10 minutes at moderate concurrency.
Why do you need a normalized data model when aggregating multiple ATS platforms?
Each ATS returns different field names for the same data: Greenhouse uses title, Lever uses text, Ashby uses jobTitle. Location is a string in Greenhouse but a nested object in Lever. Without a normalized Job dataclass, downstream code has to branch on platform type everywhere — which breaks silently when a platform changes its schema. A normalized model means all consumer code works on a single predictable structure regardless of source.
How do you find the job board slug for a company on Greenhouse, Lever, or Ashby?
The slug is almost always the company name lowercased with spaces replaced by hyphens — stripe, linear, vercel, notion. For companies with uncommon names, check their careers page: the ATS URL is usually visible in links like boards.greenhouse.io/stripe or embedded in the page source. Automated slug detection works for roughly 80% of companies; the remaining 20% require a manual lookup or fuzzy matching against a maintained company-slug database.
What does it take to build a job aggregator covering 500+ tech companies?
The core is a 200-line Python script: ATS detection, async data collection with aiohttp, a normalized Job dataclass, and a simple SQLite store. The hard part is maintaining the company-slug list — companies get acquired, change ATSs, or go dark. Plan for 5-10% of your company list to need updating per quarter. A weekly cron job that runs the full crawl and alerts on companies with sudden zero-job counts catches most breakage before it becomes stale data.
Try the scraper referenced in this article — live on Apify, pay only for results.
Open ats-jobs on Apify →How to Scrape AmbitionBox Company Reviews and Ratings
AmbitionBox is India largest employer review platform with 300,000 companies. Learn how to pull ratings, review counts, salary data, and dimension scores as structured JSON without any official API.
AliExpress Product Data API: Prices, Ratings, and Orders in Python
AliExpress affiliate API has restricted coverage. Learn how to scrape AliExpress product listings for prices, ratings, order counts, and seller data as structured JSON — no affiliate approval needed.
ClinicalTrials.gov API v2: How to Search 500,000 Studies and Track Trial Status
ClinicalTrials.gov upgraded to a v2 REST API in 2024. Here is how to use it, what changed from v1, and how to build automated trial monitoring pipelines in Python.