The Mine Works
Browse on Apify
ClinicalTrials.gov API v2: How to Search 500,000 Studies and Track Trial Status
← All posts
tutorial June 22, 2026 · 7 min read Updated June 22, 2026

ClinicalTrials.gov API v2: How to Search 500,000 Studies and Track Trial Status

ClinicalTrials.gov upgraded to a v2 REST API in 2024. Here is how to use it, what changed from v1, and how to build automated trial monitoring pipelines in Python.

Try the scraper

The actor referenced in this article is live on Apify. Pay only for results delivered.

Open on Apify →

ClinicalTrials.gov is the US registry for clinical studies. As of 2024, it holds more than 500,000 study records from 221 countries. The underlying data covers trial phase, status, enrollment, eligibility criteria, sponsor, interventions, outcomes, and location.

In May 2023, NLM (the National Library of Medicine, which operates the registry) announced the retirement of the v1 API. The v2 API went live in 2024 with a different base URL, different pagination model, and different filter syntax. Code written against v1 breaks silently against v2.

This post covers what changed, how to use the v2 API, and how to build monitoring pipelines for trial status tracking.

TL;DR: The v2 base URL is https://clinicaltrials.gov/api/v2/. Pagination uses pageToken (cursor-based), not min_rnk/max_rnk offset. Filters use query.cond, query.intr, filter.overallStatus, filter.phase, and query.spons. The response schema changed significantly: nested protocolSection replaces the flat v1 field structure.

What Changed from v1

The v1 API was at https://clinicaltrials.gov/api/query/. It used numeric offset pagination (min_rnk, max_rnk), returned XML or JSON, and had a flat field structure.

V2 breaks all of these:

AspectV1V2
Base URL/api/query//api/v2/
Paginationoffset: min_rnk/max_rnkcursor: pageToken
Response shapeFlat field mapNested protocolSection
Default formatXMLJSON
Filter syntaxexpr= (Lucene-like)Typed params: query.cond, filter.phase, etc.

V1 is fully retired. Any existing code using /api/query/ needs to be rewritten.

The V2 Endpoint Structure

The main endpoints in v2:

EndpointWhat it returns
/studiesSearch and filter studies, paginated
/studies/{nctId}Full record for a single study
/stats/sizeTotal number of studies in the registry
/stats/fieldValues/{field}All distinct values for a field (useful for discovery)

The studies endpoint is the workhorse. All searches go through it.

Query Parameters

Search query params (free-text search over specific fields):

ParamSearchesExample
query.condCondition or diseasequery.cond=pancreatic+cancer
query.intrIntervention or drugquery.intr=pembrolizumab
query.titlesOfficial and brief titlequery.titles=phase+3+checkpoint
query.outcOutcomes measuresquery.outc=overall+survival
query.sponsSponsor and collaboratorsquery.spons=Pfizer
query.leadLead sponsor onlyquery.lead=Merck
query.termFull-text across all fieldsquery.term=BRCA1

Filter params (exact/structured filtering):

ParamValuesExample
filter.overallStatusRECRUITING, NOT_YET_RECRUITING, ACTIVE_NOT_RECRUITING, COMPLETED, TERMINATED, WITHDRAWN, SUSPENDED, ENROLLING_BY_INVITATIONfilter.overallStatus=RECRUITING
filter.phasePHASE1, PHASE2, PHASE3, PHASE4, NA, EARLY_PHASE1filter.phase=PHASE3
filter.studyTypeINTERVENTIONAL, OBSERVATIONAL, EXPANDED_ACCESSfilter.studyType=INTERVENTIONAL
filter.advancedElasticsearch query string (advanced users)filter.advanced=AREA[StartDate]RANGE[2024-01-01,2024-12-31]

Pagination and display params:

ParamNotes
pageSizeResults per page. Default 10, max 1000
pageTokenCursor for next page, taken from previous response
fieldsComma-separated list of fields to return (reduces payload)
sortField and direction, e.g. sort=StartDate:desc
countTotaltrue to include total result count in first response

Multiple values for filter.overallStatus and filter.phase are comma-separated:

filter.overallStatus=RECRUITING,ACTIVE_NOT_RECRUITING
filter.phase=PHASE2,PHASE3

The Response Schema

The v2 response wraps each study in a protocolSection object with nested sub-sections. The key sections:

{
  "studies": [
    {
      "protocolSection": {
        "identificationModule": {
          "nctId": "NCT05123456",
          "briefTitle": "...",
          "officialTitle": "..."
        },
        "statusModule": {
          "overallStatus": "RECRUITING",
          "startDateStruct": {"date": "2024-01-15", "type": "ACTUAL"},
          "completionDateStruct": {"date": "2026-06-30", "type": "ESTIMATED"},
          "studyFirstSubmitDate": "2023-10-01"
        },
        "sponsorCollaboratorsModule": {
          "leadSponsor": {"name": "Merck Sharp & Dohme LLC", "class": "INDUSTRY"},
          "collaborators": []
        },
        "descriptionModule": {
          "briefSummary": "...",
          "detailedDescription": "..."
        },
        "conditionsModule": {
          "conditions": ["Non-Small Cell Lung Cancer"],
          "keywords": ["NSCLC", "immunotherapy"]
        },
        "designModule": {
          "studyType": "INTERVENTIONAL",
          "phases": ["PHASE3"],
          "enrollmentInfo": {"count": 450, "type": "ESTIMATED"}
        },
        "armsInterventionsModule": {
          "interventions": [
            {"type": "DRUG", "name": "Pembrolizumab", "description": "..."}
          ]
        },
        "eligibilityModule": {
          "eligibilityCriteria": "Inclusion criteria:\n...\nExclusion criteria:\n...",
          "sex": "ALL",
          "minimumAge": "18 Years",
          "maximumAge": "N/A"
        },
        "contactsLocationsModule": {
          "locations": [
            {
              "facility": "Memorial Sloan Kettering Cancer Center",
              "city": "New York",
              "state": "New York",
              "country": "United States",
              "status": "RECRUITING"
            }
          ]
        }
      }
    }
  ],
  "nextPageToken": "NF0g5AEBAQBZ...",
  "totalCount": 1247
}

Python: Pulling All Phase 3 Trials for a Drug in Active Recruitment

import requests
import pandas as pd
import time

BASE = "https://clinicaltrials.gov/api/v2"

def search_trials(params, max_results=None):
    """
    Search ClinicalTrials.gov v2 API with cursor pagination.
    Returns a list of study dicts (full protocolSection).
    """
    all_studies = []
    page_params = {**params, "pageSize": 1000, "countTotal": "true"}

    while True:
        response = requests.get(f"{BASE}/studies", params=page_params)
        response.raise_for_status()
        data = response.json()

        studies = data.get("studies", [])
        all_studies.extend(studies)

        total = data.get("totalCount", 0)
        print(f"Fetched {len(all_studies)} / {total}")

        if max_results and len(all_studies) >= max_results:
            break

        next_token = data.get("nextPageToken")
        if not next_token:
            break

        page_params = {**params, "pageSize": 1000, "pageToken": next_token}
        time.sleep(0.3)

    return all_studies[:max_results] if max_results else all_studies

def extract_study_summary(study):
    """Flatten a v2 study record to a single row dict."""
    ps = study.get("protocolSection", {})
    id_mod = ps.get("identificationModule", {})
    status_mod = ps.get("statusModule", {})
    sponsor_mod = ps.get("sponsorCollaboratorsModule", {})
    design_mod = ps.get("designModule", {})
    conditions_mod = ps.get("conditionsModule", {})
    contacts_mod = ps.get("contactsLocationsModule", {})

    locations = contacts_mod.get("locations", [])
    countries = list({loc.get("country", "") for loc in locations})

    return {
        "nct_id":          id_mod.get("nctId"),
        "brief_title":     id_mod.get("briefTitle"),
        "status":          status_mod.get("overallStatus"),
        "start_date":      (status_mod.get("startDateStruct") or {}).get("date"),
        "completion_date": (status_mod.get("completionDateStruct") or {}).get("date"),
        "sponsor":         (sponsor_mod.get("leadSponsor") or {}).get("name"),
        "sponsor_class":   (sponsor_mod.get("leadSponsor") or {}).get("class"),
        "phases":          "|".join(design_mod.get("phases", [])),
        "study_type":      design_mod.get("studyType"),
        "enrollment":      (design_mod.get("enrollmentInfo") or {}).get("count"),
        "conditions":      "|".join(conditions_mod.get("conditions", [])),
        "location_count":  len(locations),
        "countries":       "|".join(countries),
    }

# All Phase 3 pembrolizumab trials currently recruiting
pembro_trials = search_trials({
    "query.intr":         "pembrolizumab",
    "filter.phase":       "PHASE3",
    "filter.overallStatus": "RECRUITING",
    "filter.studyType":   "INTERVENTIONAL",
})

df = pd.DataFrame([extract_study_summary(s) for s in pembro_trials])
print(f"Phase 3 pembrolizumab trials recruiting: {len(df)}")
print(df[["nct_id", "brief_title", "sponsor", "enrollment", "completion_date"]].head(10).to_string(index=False))

Python: Getting All Trials for a Sponsor

# All trials sponsored by Moderna, any status
moderna_trials = search_trials({
    "query.spons": "Moderna",
})

df_moderna = pd.DataFrame([extract_study_summary(s) for s in moderna_trials])
print(f"\nTotal Moderna trials: {len(df_moderna)}")
print(df_moderna.groupby("status")["nct_id"].count().sort_values(ascending=False))

The query.spons parameter searches both lead sponsor and collaborators. Use query.lead if you want only lead sponsor matches.

Python: Building a Status Change Monitor

This is the most operationally useful pattern: watch a set of trials and alert when their status changes.

import json
from pathlib import Path
from datetime import datetime

SNAPSHOT_FILE = "trial_snapshot.json"

def fetch_trial(nct_id):
    """Fetch a single trial by NCT ID."""
    response = requests.get(f"{BASE}/studies/{nct_id}")
    if response.status_code == 404:
        return None
    response.raise_for_status()
    study = response.json()
    return extract_study_summary(study)

def check_for_status_changes(nct_ids, snapshot_path=SNAPSHOT_FILE):
    """
    Compare current trial statuses against a saved snapshot.
    Returns a list of trials that have changed status since last check.
    """
    # Load previous snapshot
    previous = {}
    if Path(snapshot_path).exists():
        with open(snapshot_path) as f:
            previous = json.load(f)

    current = {}
    changes = []

    for nct_id in nct_ids:
        trial = fetch_trial(nct_id)
        if not trial:
            continue
        current[nct_id] = trial

        prev = previous.get(nct_id, {})
        if prev.get("status") != trial["status"]:
            changes.append({
                "nct_id":       nct_id,
                "brief_title":  trial["brief_title"],
                "old_status":   prev.get("status", "UNKNOWN"),
                "new_status":   trial["status"],
                "checked_at":   datetime.utcnow().isoformat(),
            })
        time.sleep(0.2)

    # Save updated snapshot
    with open(snapshot_path, "w") as f:
        json.dump(current, f, indent=2)

    return changes

# List of trials you care about
watched_trials = [
    "NCT05094674",
    "NCT04516746",
    "NCT03668418",
]

changes = check_for_status_changes(watched_trials)
if changes:
    for c in changes:
        print(f"STATUS CHANGE: {c['nct_id']} | {c['old_status']} -> {c['new_status']} | {c['brief_title']}")
else:
    print("No status changes detected.")

Run this on a schedule (daily or weekly depending on how fast-moving your portfolio is) to build a lightweight competitive intelligence alert system without paying for a commercial clinical trial tracking service.

Useful Field Subsets

The full study record is large. Use the fields parameter to request only what you need:

# Request only identification and status fields (much faster pagination)
response = requests.get(f"{BASE}/studies", params={
    "query.cond": "alzheimer",
    "filter.phase": "PHASE3",
    "fields": "NCTId,BriefTitle,OverallStatus,StartDate,CompletionDate,LeadSponsorName",
    "pageSize": 1000,
})

The fields parameter accepts the ClinicalTrials field names (v2 docs publish the full list at https://clinicaltrials.gov/data-api/api). This reduces payload size significantly for large result sets.

Use Cases

Pharma competitive intelligence. Track competitor pipeline by sponsor name. Pull all active trials for competing drugs in your indication, sorted by phase and expected completion date. Build a landscape view of what is coming to market and when.

Patient recruitment. Identify recruiting sites for a trial by parsing the locations array and filtering by status RECRUITING. Build a database of active sites for enrollment support or partnership outreach.

Clinical trial landscape mapping. For a given disease area, pull all trials by phase and status. This gives you a view of how crowded the indication is, where trials are stalling (high number in ACTIVE_NOT_RECRUITING or TERMINATED), and where gaps exist.

CRO business development. Contract research organizations use trial data to identify sponsors running studies in therapeutic areas they serve. Filter by sponsor class INDUSTRY, phase, and status to find prospective clients before they have finished enrollment.

Regulatory and publication tracking. Trials registered on ClinicalTrials.gov are increasingly required to report results. The resultsSection of the v2 response (where present) contains outcome data submitted post-trial. Pulling and analyzing this is useful for systematic reviews and evidence synthesis.

The ClinicalTrials.gov scraper handles v2 pagination, scheduled monitoring runs, and output normalization for teams that need trial data on a recurring basis without maintaining their own API integration.

Related Actor

Try the scraper referenced in this article — live on Apify, pay only for results.

Open clinicaltrials-scraper on Apify →