ClinicalTrials.gov API v2: How to Search 500,000 Studies and Track Trial Status
ClinicalTrials.gov upgraded to a v2 REST API in 2024. Here is how to use it, what changed from v1, and how to build automated trial monitoring pipelines in Python.
The actor referenced in this article is live on Apify. Pay only for results delivered.
ClinicalTrials.gov is the US registry for clinical studies. As of 2024, it holds more than 500,000 study records from 221 countries. The underlying data covers trial phase, status, enrollment, eligibility criteria, sponsor, interventions, outcomes, and location.
In May 2023, NLM (the National Library of Medicine, which operates the registry) announced the retirement of the v1 API. The v2 API went live in 2024 with a different base URL, different pagination model, and different filter syntax. Code written against v1 breaks silently against v2.
This post covers what changed, how to use the v2 API, and how to build monitoring pipelines for trial status tracking.
TL;DR: The v2 base URL is
https://clinicaltrials.gov/api/v2/. Pagination usespageToken(cursor-based), notmin_rnk/max_rnkoffset. Filters usequery.cond,query.intr,filter.overallStatus,filter.phase, andquery.spons. The response schema changed significantly: nestedprotocolSectionreplaces the flat v1 field structure.
What Changed from v1
The v1 API was at https://clinicaltrials.gov/api/query/. It used numeric offset pagination (min_rnk, max_rnk), returned XML or JSON, and had a flat field structure.
V2 breaks all of these:
| Aspect | V1 | V2 |
|---|---|---|
| Base URL | /api/query/ | /api/v2/ |
| Pagination | offset: min_rnk/max_rnk | cursor: pageToken |
| Response shape | Flat field map | Nested protocolSection |
| Default format | XML | JSON |
| Filter syntax | expr= (Lucene-like) | Typed params: query.cond, filter.phase, etc. |
V1 is fully retired. Any existing code using /api/query/ needs to be rewritten.
The V2 Endpoint Structure
The main endpoints in v2:
| Endpoint | What it returns |
|---|---|
/studies | Search and filter studies, paginated |
/studies/{nctId} | Full record for a single study |
/stats/size | Total number of studies in the registry |
/stats/fieldValues/{field} | All distinct values for a field (useful for discovery) |
The studies endpoint is the workhorse. All searches go through it.
Query Parameters
Search query params (free-text search over specific fields):
| Param | Searches | Example |
|---|---|---|
query.cond | Condition or disease | query.cond=pancreatic+cancer |
query.intr | Intervention or drug | query.intr=pembrolizumab |
query.titles | Official and brief title | query.titles=phase+3+checkpoint |
query.outc | Outcomes measures | query.outc=overall+survival |
query.spons | Sponsor and collaborators | query.spons=Pfizer |
query.lead | Lead sponsor only | query.lead=Merck |
query.term | Full-text across all fields | query.term=BRCA1 |
Filter params (exact/structured filtering):
| Param | Values | Example |
|---|---|---|
filter.overallStatus | RECRUITING, NOT_YET_RECRUITING, ACTIVE_NOT_RECRUITING, COMPLETED, TERMINATED, WITHDRAWN, SUSPENDED, ENROLLING_BY_INVITATION | filter.overallStatus=RECRUITING |
filter.phase | PHASE1, PHASE2, PHASE3, PHASE4, NA, EARLY_PHASE1 | filter.phase=PHASE3 |
filter.studyType | INTERVENTIONAL, OBSERVATIONAL, EXPANDED_ACCESS | filter.studyType=INTERVENTIONAL |
filter.advanced | Elasticsearch query string (advanced users) | filter.advanced=AREA[StartDate]RANGE[2024-01-01,2024-12-31] |
Pagination and display params:
| Param | Notes |
|---|---|
pageSize | Results per page. Default 10, max 1000 |
pageToken | Cursor for next page, taken from previous response |
fields | Comma-separated list of fields to return (reduces payload) |
sort | Field and direction, e.g. sort=StartDate:desc |
countTotal | true to include total result count in first response |
Multiple values for filter.overallStatus and filter.phase are comma-separated:
filter.overallStatus=RECRUITING,ACTIVE_NOT_RECRUITING
filter.phase=PHASE2,PHASE3
The Response Schema
The v2 response wraps each study in a protocolSection object with nested sub-sections. The key sections:
{
"studies": [
{
"protocolSection": {
"identificationModule": {
"nctId": "NCT05123456",
"briefTitle": "...",
"officialTitle": "..."
},
"statusModule": {
"overallStatus": "RECRUITING",
"startDateStruct": {"date": "2024-01-15", "type": "ACTUAL"},
"completionDateStruct": {"date": "2026-06-30", "type": "ESTIMATED"},
"studyFirstSubmitDate": "2023-10-01"
},
"sponsorCollaboratorsModule": {
"leadSponsor": {"name": "Merck Sharp & Dohme LLC", "class": "INDUSTRY"},
"collaborators": []
},
"descriptionModule": {
"briefSummary": "...",
"detailedDescription": "..."
},
"conditionsModule": {
"conditions": ["Non-Small Cell Lung Cancer"],
"keywords": ["NSCLC", "immunotherapy"]
},
"designModule": {
"studyType": "INTERVENTIONAL",
"phases": ["PHASE3"],
"enrollmentInfo": {"count": 450, "type": "ESTIMATED"}
},
"armsInterventionsModule": {
"interventions": [
{"type": "DRUG", "name": "Pembrolizumab", "description": "..."}
]
},
"eligibilityModule": {
"eligibilityCriteria": "Inclusion criteria:\n...\nExclusion criteria:\n...",
"sex": "ALL",
"minimumAge": "18 Years",
"maximumAge": "N/A"
},
"contactsLocationsModule": {
"locations": [
{
"facility": "Memorial Sloan Kettering Cancer Center",
"city": "New York",
"state": "New York",
"country": "United States",
"status": "RECRUITING"
}
]
}
}
}
],
"nextPageToken": "NF0g5AEBAQBZ...",
"totalCount": 1247
}
Python: Pulling All Phase 3 Trials for a Drug in Active Recruitment
import requests
import pandas as pd
import time
BASE = "https://clinicaltrials.gov/api/v2"
def search_trials(params, max_results=None):
"""
Search ClinicalTrials.gov v2 API with cursor pagination.
Returns a list of study dicts (full protocolSection).
"""
all_studies = []
page_params = {**params, "pageSize": 1000, "countTotal": "true"}
while True:
response = requests.get(f"{BASE}/studies", params=page_params)
response.raise_for_status()
data = response.json()
studies = data.get("studies", [])
all_studies.extend(studies)
total = data.get("totalCount", 0)
print(f"Fetched {len(all_studies)} / {total}")
if max_results and len(all_studies) >= max_results:
break
next_token = data.get("nextPageToken")
if not next_token:
break
page_params = {**params, "pageSize": 1000, "pageToken": next_token}
time.sleep(0.3)
return all_studies[:max_results] if max_results else all_studies
def extract_study_summary(study):
"""Flatten a v2 study record to a single row dict."""
ps = study.get("protocolSection", {})
id_mod = ps.get("identificationModule", {})
status_mod = ps.get("statusModule", {})
sponsor_mod = ps.get("sponsorCollaboratorsModule", {})
design_mod = ps.get("designModule", {})
conditions_mod = ps.get("conditionsModule", {})
contacts_mod = ps.get("contactsLocationsModule", {})
locations = contacts_mod.get("locations", [])
countries = list({loc.get("country", "") for loc in locations})
return {
"nct_id": id_mod.get("nctId"),
"brief_title": id_mod.get("briefTitle"),
"status": status_mod.get("overallStatus"),
"start_date": (status_mod.get("startDateStruct") or {}).get("date"),
"completion_date": (status_mod.get("completionDateStruct") or {}).get("date"),
"sponsor": (sponsor_mod.get("leadSponsor") or {}).get("name"),
"sponsor_class": (sponsor_mod.get("leadSponsor") or {}).get("class"),
"phases": "|".join(design_mod.get("phases", [])),
"study_type": design_mod.get("studyType"),
"enrollment": (design_mod.get("enrollmentInfo") or {}).get("count"),
"conditions": "|".join(conditions_mod.get("conditions", [])),
"location_count": len(locations),
"countries": "|".join(countries),
}
# All Phase 3 pembrolizumab trials currently recruiting
pembro_trials = search_trials({
"query.intr": "pembrolizumab",
"filter.phase": "PHASE3",
"filter.overallStatus": "RECRUITING",
"filter.studyType": "INTERVENTIONAL",
})
df = pd.DataFrame([extract_study_summary(s) for s in pembro_trials])
print(f"Phase 3 pembrolizumab trials recruiting: {len(df)}")
print(df[["nct_id", "brief_title", "sponsor", "enrollment", "completion_date"]].head(10).to_string(index=False))
Python: Getting All Trials for a Sponsor
# All trials sponsored by Moderna, any status
moderna_trials = search_trials({
"query.spons": "Moderna",
})
df_moderna = pd.DataFrame([extract_study_summary(s) for s in moderna_trials])
print(f"\nTotal Moderna trials: {len(df_moderna)}")
print(df_moderna.groupby("status")["nct_id"].count().sort_values(ascending=False))
The query.spons parameter searches both lead sponsor and collaborators. Use query.lead if you want only lead sponsor matches.
Python: Building a Status Change Monitor
This is the most operationally useful pattern: watch a set of trials and alert when their status changes.
import json
from pathlib import Path
from datetime import datetime
SNAPSHOT_FILE = "trial_snapshot.json"
def fetch_trial(nct_id):
"""Fetch a single trial by NCT ID."""
response = requests.get(f"{BASE}/studies/{nct_id}")
if response.status_code == 404:
return None
response.raise_for_status()
study = response.json()
return extract_study_summary(study)
def check_for_status_changes(nct_ids, snapshot_path=SNAPSHOT_FILE):
"""
Compare current trial statuses against a saved snapshot.
Returns a list of trials that have changed status since last check.
"""
# Load previous snapshot
previous = {}
if Path(snapshot_path).exists():
with open(snapshot_path) as f:
previous = json.load(f)
current = {}
changes = []
for nct_id in nct_ids:
trial = fetch_trial(nct_id)
if not trial:
continue
current[nct_id] = trial
prev = previous.get(nct_id, {})
if prev.get("status") != trial["status"]:
changes.append({
"nct_id": nct_id,
"brief_title": trial["brief_title"],
"old_status": prev.get("status", "UNKNOWN"),
"new_status": trial["status"],
"checked_at": datetime.utcnow().isoformat(),
})
time.sleep(0.2)
# Save updated snapshot
with open(snapshot_path, "w") as f:
json.dump(current, f, indent=2)
return changes
# List of trials you care about
watched_trials = [
"NCT05094674",
"NCT04516746",
"NCT03668418",
]
changes = check_for_status_changes(watched_trials)
if changes:
for c in changes:
print(f"STATUS CHANGE: {c['nct_id']} | {c['old_status']} -> {c['new_status']} | {c['brief_title']}")
else:
print("No status changes detected.")
Run this on a schedule (daily or weekly depending on how fast-moving your portfolio is) to build a lightweight competitive intelligence alert system without paying for a commercial clinical trial tracking service.
Useful Field Subsets
The full study record is large. Use the fields parameter to request only what you need:
# Request only identification and status fields (much faster pagination)
response = requests.get(f"{BASE}/studies", params={
"query.cond": "alzheimer",
"filter.phase": "PHASE3",
"fields": "NCTId,BriefTitle,OverallStatus,StartDate,CompletionDate,LeadSponsorName",
"pageSize": 1000,
})
The fields parameter accepts the ClinicalTrials field names (v2 docs publish the full list at https://clinicaltrials.gov/data-api/api). This reduces payload size significantly for large result sets.
Use Cases
Pharma competitive intelligence. Track competitor pipeline by sponsor name. Pull all active trials for competing drugs in your indication, sorted by phase and expected completion date. Build a landscape view of what is coming to market and when.
Patient recruitment. Identify recruiting sites for a trial by parsing the locations array and filtering by status RECRUITING. Build a database of active sites for enrollment support or partnership outreach.
Clinical trial landscape mapping. For a given disease area, pull all trials by phase and status. This gives you a view of how crowded the indication is, where trials are stalling (high number in ACTIVE_NOT_RECRUITING or TERMINATED), and where gaps exist.
CRO business development. Contract research organizations use trial data to identify sponsors running studies in therapeutic areas they serve. Filter by sponsor class INDUSTRY, phase, and status to find prospective clients before they have finished enrollment.
Regulatory and publication tracking. Trials registered on ClinicalTrials.gov are increasingly required to report results. The resultsSection of the v2 response (where present) contains outcome data submitted post-trial. Pulling and analyzing this is useful for systematic reviews and evidence synthesis.
The ClinicalTrials.gov scraper handles v2 pagination, scheduled monitoring runs, and output normalization for teams that need trial data on a recurring basis without maintaining their own API integration.
Try the scraper referenced in this article — live on Apify, pay only for results.
Open clinicaltrials-scraper on Apify →How to Scrape AmbitionBox Company Reviews and Ratings
AmbitionBox is India largest employer review platform with 300,000 companies. Learn how to pull ratings, review counts, salary data, and dimension scores as structured JSON without any official API.
AliExpress Product Data API: Prices, Ratings, and Orders in Python
AliExpress affiliate API has restricted coverage. Learn how to scrape AliExpress product listings for prices, ratings, order counts, and seller data as structured JSON — no affiliate approval needed.
CourtListener API: How to Search US Court Records and Case Law Programmatically
CourtListener exposes 10M+ court opinions and dockets via a free REST API. Here is how to query it, what the rate limits actually are, and when a scraper is faster.