ClinicalTrials.gov upgraded to a v2 REST API in 2024. Here is how to use it, what changed from v1, and how to build automated trial monitoring pipelines in Python.

ClinicalTrials.gov is the US registry for clinical studies. As of 2024, it holds more than 500,000 study records from 221 countries. The underlying data covers trial phase, status, enrollment, eligibility criteria, sponsor, interventions, outcomes, and location.

In May 2023, NLM (the National Library of Medicine, which operates the registry) announced the retirement of the v1 API. The v2 API went live in 2024 with a different base URL, different pagination model, and different filter syntax. Code written against v1 breaks silently against v2.

This post covers what changed, how to use the v2 API, and how to build monitoring pipelines for trial status tracking.

TL;DR: The v2 base URL is https://clinicaltrials.gov/api/v2/. Pagination uses pageToken (cursor-based), not min_rnk/max_rnk offset. Filters use query.cond, query.intr, filter.overallStatus, filter.phase, and query.spons. The response schema changed significantly: nested protocolSection replaces the flat v1 field structure.

What Changed from v1

The v1 API was at https://clinicaltrials.gov/api/query/. It used numeric offset pagination (min_rnk, max_rnk), returned XML or JSON, and had a flat field structure.

V2 breaks all of these:

Aspect	V1	V2
Base URL	`/api/query/`	`/api/v2/`
Pagination	offset: `min_rnk`/`max_rnk`	cursor: `pageToken`
Response shape	Flat field map	Nested `protocolSection`
Default format	XML	JSON
Filter syntax	`expr=` (Lucene-like)	Typed params: `query.cond`, `filter.phase`, etc.

V1 is fully retired. Any existing code using /api/query/ needs to be rewritten.

The V2 Endpoint Structure

The main endpoints in v2:

Endpoint	What it returns
`/studies`	Search and filter studies, paginated
`/studies/{nctId}`	Full record for a single study
`/stats/size`	Total number of studies in the registry
`/stats/fieldValues/{field}`	All distinct values for a field (useful for discovery)

The studies endpoint is the workhorse. All searches go through it.

Query Parameters

Search query params (free-text search over specific fields):

Param	Searches	Example
`query.cond`	Condition or disease	`query.cond=pancreatic+cancer`
`query.intr`	Intervention or drug	`query.intr=pembrolizumab`
`query.titles`	Official and brief title	`query.titles=phase+3+checkpoint`
`query.outc`	Outcomes measures	`query.outc=overall+survival`
`query.spons`	Sponsor and collaborators	`query.spons=Pfizer`
`query.lead`	Lead sponsor only	`query.lead=Merck`
`query.term`	Full-text across all fields	`query.term=BRCA1`

Filter params (exact/structured filtering):

Param	Values	Example
`filter.overallStatus`	`RECRUITING`, `NOT_YET_RECRUITING`, `ACTIVE_NOT_RECRUITING`, `COMPLETED`, `TERMINATED`, `WITHDRAWN`, `SUSPENDED`, `ENROLLING_BY_INVITATION`	`filter.overallStatus=RECRUITING`
`filter.phase`	`PHASE1`, `PHASE2`, `PHASE3`, `PHASE4`, `NA`, `EARLY_PHASE1`	`filter.phase=PHASE3`
`filter.studyType`	`INTERVENTIONAL`, `OBSERVATIONAL`, `EXPANDED_ACCESS`	`filter.studyType=INTERVENTIONAL`
`filter.advanced`	Elasticsearch query string (advanced users)	`filter.advanced=AREA[StartDate]RANGE[2024-01-01,2024-12-31]`

Pagination and display params:

Param	Notes
`pageSize`	Results per page. Default 10, max 1000
`pageToken`	Cursor for next page, taken from previous response
`fields`	Comma-separated list of fields to return (reduces payload)
`sort`	Field and direction, e.g. `sort=StartDate:desc`
`countTotal`	`true` to include total result count in first response

Multiple values for filter.overallStatus and filter.phase are comma-separated:

filter.overallStatus=RECRUITING,ACTIVE_NOT_RECRUITING
filter.phase=PHASE2,PHASE3

The Response Schema

The v2 response wraps each study in a protocolSection object with nested sub-sections. The key sections:

{
  "studies": [
    {
      "protocolSection": {
        "identificationModule": {
          "nctId": "NCT05123456",
          "briefTitle": "...",
          "officialTitle": "..."
        },
        "statusModule": {
          "overallStatus": "RECRUITING",
          "startDateStruct": {"date": "2024-01-15", "type": "ACTUAL"},
          "completionDateStruct": {"date": "2026-06-30", "type": "ESTIMATED"},
          "studyFirstSubmitDate": "2023-10-01"
        },
        "sponsorCollaboratorsModule": {
          "leadSponsor": {"name": "Merck Sharp & Dohme LLC", "class": "INDUSTRY"},
          "collaborators": []
        },
        "descriptionModule": {
          "briefSummary": "...",
          "detailedDescription": "..."
        },
        "conditionsModule": {
          "conditions": ["Non-Small Cell Lung Cancer"],
          "keywords": ["NSCLC", "immunotherapy"]
        },
        "designModule": {
          "studyType": "INTERVENTIONAL",
          "phases": ["PHASE3"],
          "enrollmentInfo": {"count": 450, "type": "ESTIMATED"}
        },
        "armsInterventionsModule": {
          "interventions": [
            {"type": "DRUG", "name": "Pembrolizumab", "description": "..."}
          ]
        },
        "eligibilityModule": {
          "eligibilityCriteria": "Inclusion criteria:\n...\nExclusion criteria:\n...",
          "sex": "ALL",
          "minimumAge": "18 Years",
          "maximumAge": "N/A"
        },
        "contactsLocationsModule": {
          "locations": [
            {
              "facility": "Memorial Sloan Kettering Cancer Center",
              "city": "New York",
              "state": "New York",
              "country": "United States",
              "status": "RECRUITING"
            }
          ]
        }
      }
    }
  ],
  "nextPageToken": "NF0g5AEBAQBZ...",
  "totalCount": 1247
}

Python: Pulling All Phase 3 Trials for a Drug in Active Recruitment

import requests
import pandas as pd
import time

BASE = "https://clinicaltrials.gov/api/v2"

def search_trials(params, max_results=None):
    """
    Search ClinicalTrials.gov v2 API with cursor pagination.
    Returns a list of study dicts (full protocolSection).
    """
    all_studies = []
    page_params = {**params, "pageSize": 1000, "countTotal": "true"}

    while True:
        response = requests.get(f"{BASE}/studies", params=page_params)
        response.raise_for_status()
        data = response.json()

        studies = data.get("studies", [])
        all_studies.extend(studies)

        total = data.get("totalCount", 0)
        print(f"Fetched {len(all_studies)} / {total}")

        if max_results and len(all_studies) >= max_results:
            break

        next_token = data.get("nextPageToken")
        if not next_token:
            break

        page_params = {**params, "pageSize": 1000, "pageToken": next_token}
        time.sleep(0.3)

    return all_studies[:max_results] if max_results else all_studies

def extract_study_summary(study):
    """Flatten a v2 study record to a single row dict."""
    ps = study.get("protocolSection", {})
    id_mod = ps.get("identificationModule", {})
    status_mod = ps.get("statusModule", {})
    sponsor_mod = ps.get("sponsorCollaboratorsModule", {})
    design_mod = ps.get("designModule", {})
    conditions_mod = ps.get("conditionsModule", {})
    contacts_mod = ps.get("contactsLocationsModule", {})

    locations = contacts_mod.get("locations", [])
    countries = list({loc.get("country", "") for loc in locations})

    return {
        "nct_id":          id_mod.get("nctId"),
        "brief_title":     id_mod.get("briefTitle"),
        "status":          status_mod.get("overallStatus"),
        "start_date":      (status_mod.get("startDateStruct") or {}).get("date"),
        "completion_date": (status_mod.get("completionDateStruct") or {}).get("date"),
        "sponsor":         (sponsor_mod.get("leadSponsor") or {}).get("name"),
        "sponsor_class":   (sponsor_mod.get("leadSponsor") or {}).get("class"),
        "phases":          "|".join(design_mod.get("phases", [])),
        "study_type":      design_mod.get("studyType"),
        "enrollment":      (design_mod.get("enrollmentInfo") or {}).get("count"),
        "conditions":      "|".join(conditions_mod.get("conditions", [])),
        "location_count":  len(locations),
        "countries":       "|".join(countries),
    }

# All Phase 3 pembrolizumab trials currently recruiting
pembro_trials = search_trials({
    "query.intr":         "pembrolizumab",
    "filter.phase":       "PHASE3",
    "filter.overallStatus": "RECRUITING",
    "filter.studyType":   "INTERVENTIONAL",
})

df = pd.DataFrame([extract_study_summary(s) for s in pembro_trials])
print(f"Phase 3 pembrolizumab trials recruiting: {len(df)}")
print(df[["nct_id", "brief_title", "sponsor", "enrollment", "completion_date"]].head(10).to_string(index=False))

# All trials sponsored by Moderna, any status
moderna_trials = search_trials({
    "query.spons": "Moderna",
})

df_moderna = pd.DataFrame([extract_study_summary(s) for s in moderna_trials])
print(f"\nTotal Moderna trials: {len(df_moderna)}")
print(df_moderna.groupby("status")["nct_id"].count().sort_values(ascending=False))

The query.spons parameter searches both lead sponsor and collaborators. Use query.lead if you want only lead sponsor matches.

Python: Building a Status Change Monitor

This is the most operationally useful pattern: watch a set of trials and alert when their status changes.

import json
from pathlib import Path
from datetime import datetime

SNAPSHOT_FILE = "trial_snapshot.json"

def fetch_trial(nct_id):
    """Fetch a single trial by NCT ID."""
    response = requests.get(f"{BASE}/studies/{nct_id}")
    if response.status_code == 404:
        return None
    response.raise_for_status()
    study = response.json()
    return extract_study_summary(study)

def check_for_status_changes(nct_ids, snapshot_path=SNAPSHOT_FILE):
    """
    Compare current trial statuses against a saved snapshot.
    Returns a list of trials that have changed status since last check.
    """
    # Load previous snapshot
    previous = {}
    if Path(snapshot_path).exists():
        with open(snapshot_path) as f:
            previous = json.load(f)

    current = {}
    changes = []

    for nct_id in nct_ids:
        trial = fetch_trial(nct_id)
        if not trial:
            continue
        current[nct_id] = trial

        prev = previous.get(nct_id, {})
        if prev.get("status") != trial["status"]:
            changes.append({
                "nct_id":       nct_id,
                "brief_title":  trial["brief_title"],
                "old_status":   prev.get("status", "UNKNOWN"),
                "new_status":   trial["status"],
                "checked_at":   datetime.utcnow().isoformat(),
            })
        time.sleep(0.2)

    # Save updated snapshot
    with open(snapshot_path, "w") as f:
        json.dump(current, f, indent=2)

    return changes

# List of trials you care about
watched_trials = [
    "NCT05094674",
    "NCT04516746",
    "NCT03668418",
]

changes = check_for_status_changes(watched_trials)
if changes:
    for c in changes:
        print(f"STATUS CHANGE: {c['nct_id']} | {c['old_status']} -> {c['new_status']} | {c['brief_title']}")
else:
    print("No status changes detected.")

Run this on a schedule (daily or weekly depending on how fast-moving your portfolio is) to build a lightweight competitive intelligence alert system without paying for a commercial clinical trial tracking service.

Useful Field Subsets

The full study record is large. Use the fields parameter to request only what you need:

# Request only identification and status fields (much faster pagination)
response = requests.get(f"{BASE}/studies", params={
    "query.cond": "alzheimer",
    "filter.phase": "PHASE3",
    "fields": "NCTId,BriefTitle,OverallStatus,StartDate,CompletionDate,LeadSponsorName",
    "pageSize": 1000,
})

The fields parameter accepts the ClinicalTrials field names (v2 docs publish the full list at https://clinicaltrials.gov/data-api/api). This reduces payload size significantly for large result sets.

Use Cases

Pharma competitive intelligence. Track competitor pipeline by sponsor name. Pull all active trials for competing drugs in your indication, sorted by phase and expected completion date. Build a landscape view of what is coming to market and when.

Patient recruitment. Identify recruiting sites for a trial by parsing the locations array and filtering by status RECRUITING. Build a database of active sites for enrollment support or partnership outreach.

Clinical trial landscape mapping. For a given disease area, pull all trials by phase and status. This gives you a view of how crowded the indication is, where trials are stalling (high number in ACTIVE_NOT_RECRUITING or TERMINATED), and where gaps exist.

CRO business development. Contract research organizations use trial data to identify sponsors running studies in therapeutic areas they serve. Filter by sponsor class INDUSTRY, phase, and status to find prospective clients before they have finished enrollment.

Regulatory and publication tracking. Trials registered on ClinicalTrials.gov are increasingly required to report results. The resultsSection of the v2 response (where present) contains outcome data submitted post-trial. Pulling and analyzing this is useful for systematic reviews and evidence synthesis.

The ClinicalTrials.gov scraper handles v2 pagination, scheduled monitoring runs, and output normalization for teams that need trial data on a recurring basis without maintaining their own API integration.

ClinicalTrials.gov API v2: How to Search 500,000 Studies and Track Trial Status

What Changed from v1

The V2 Endpoint Structure

Query Parameters

The Response Schema

Python: Pulling All Phase 3 Trials for a Drug in Active Recruitment

Python: Building a Status Change Monitor

Useful Field Subsets

Use Cases

How to Scrape AmbitionBox Company Reviews and Ratings

AliExpress Product Data API: Prices, Ratings, and Orders in Python

CourtListener API: How to Search US Court Records and Case Law Programmatically

ClinicalTrials.gov API v2: How to Search 500,000 Studies and Track Trial Status

What Changed from v1

The V2 Endpoint Structure

Query Parameters

The Response Schema

Python: Pulling All Phase 3 Trials for a Drug in Active Recruitment

Python: Getting All Trials for a Sponsor

Python: Building a Status Change Monitor

Useful Field Subsets

Use Cases

How to Scrape AmbitionBox Company Reviews and Ratings

AliExpress Product Data API: Prices, Ratings, and Orders in Python

CourtListener API: How to Search US Court Records and Case Law Programmatically