CourtListener exposes 10M+ court opinions and dockets via a free REST API. Here is how to query it, what the rate limits actually are, and when a scraper is faster.

If you have ever tried to pull US court records programmatically, you have probably run into one of three walls: PACER’s pay-per-page model, a state court portal that returns HTML tables with no pagination, or a commercial legal database that costs $500 a month and requires a sales call to get API access.

CourtListener is a different story. It is a free, open-access platform run by the Free Law Project that gives you a proper REST API over 10 million court opinions, dockets, and oral argument recordings. You can search by keyword, filter by jurisdiction, pull full opinion text, and paginate through results with a standard cursor.

TL;DR: CourtListener’s REST API covers 10M+ federal and state court opinions. Unauthenticated requests are limited to 5,000 per day; authenticated requests (free API key) get you 5,000 per hour. The data model has three layers: dockets (the case), clusters (a group of opinions), and opinions (the actual document text). For bulk exports, scheduled pulls, or structured JSON output at scale, a managed scraper is faster than hand-rolling pagination.

What CourtListener Is

CourtListener is maintained by the Free Law Project, a 501(c)(3) nonprofit. The project started as an archive of federal circuit court opinions and has expanded to cover:

Federal district courts, circuit courts, and the Supreme Court
State supreme courts and appellate courts in most US states
PACER dockets (the procedural history of a case, not the underlying filings)
Oral argument audio recordings with transcripts

The database currently holds more than 10 million opinion documents going back to the 19th century in some jurisdictions. Coverage varies by court. Federal circuit courts are nearly complete. State trial courts have spotty coverage.

One important distinction: CourtListener contains court opinions, not the underlying case filings. If you want the actual briefs, motions, and exhibits from PACER, those require PACER credentials and the per-page fee structure. CourtListener scrapes and archives the publicly available PACER docket metadata, but the underlying PDFs still live on PACER.

The Official API

The base URL is https://www.courtlistener.com/api/rest/v4/. The API follows standard REST conventions: GET for reads, JSON responses, cursor-based pagination.

Authentication: Register at courtlistener.com/sign-in/ and generate a token from your profile. The token goes in the Authorization header: Token YOUR_TOKEN_HERE.

Rate limits:

Unauthenticated: 5,000 requests per day
Authenticated (free): 5,000 requests per hour
The API returns X-RateLimit-Remaining and X-RateLimit-Reset headers

Key endpoints:

Endpoint	What it returns
`/opinions/`	Individual opinion documents with full text
`/clusters/`	Groups of opinions from the same case instance
`/dockets/`	Case-level metadata (parties, court, filing date)
`/courts/`	Court metadata and identifiers
`/search/`	Full-text search across opinions

Python: Searching by Keyword and Court

The most common starting point is a keyword search filtered to a specific court. Here is how to pull opinions mentioning “tortious interference” from the 9th Circuit:

import requests

API_TOKEN = "your_token_here"
BASE_URL = "https://www.courtlistener.com/api/rest/v4"

headers = {"Authorization": f"Token {API_TOKEN}"}

def search_opinions(keyword, court_id=None, max_results=50):
    """Search CourtListener opinions by keyword."""
    params = {
        "q": keyword,
        "type": "o",          # 'o' = opinions
        "order_by": "score desc",
        "stat_Published": "on",
    }
    if court_id:
        params["court"] = court_id
    
    results = []
    url = f"{BASE_URL}/search/"
    
    while url and len(results) < max_results:
        response = requests.get(url, headers=headers, params=params)
        response.raise_for_status()
        data = response.json()
        
        results.extend(data["results"])
        url = data.get("next")   # cursor URL for next page
        params = {}              # params are encoded in the cursor URL
    
    return results[:max_results]

# 9th Circuit court ID is 'ca9'
opinions = search_opinions("tortious interference", court_id="ca9")

for op in opinions[:5]:
    print(op["caseName"])
    print(op["dateFiled"])
    print(op["absolute_url"])
    print()

Court IDs follow a short-code convention. Federal circuits are ca1 through ca11 plus cadc and cafc. The Supreme Court is scotus. District courts follow the pattern dcd (DC District), nysd (Southern District of New York), and so on. The /courts/ endpoint returns the full list.

Python: Retrieving Full Opinion Text

The search endpoint returns summary metadata. To get the full opinion text, fetch the opinion object directly:

def get_opinion_text(opinion_id):
    """Retrieve full text for a single opinion."""
    url = f"{BASE_URL}/opinions/{opinion_id}/"
    response = requests.get(url, headers=headers)
    response.raise_for_status()
    opinion = response.json()
    
    # Text comes in multiple formats; prefer plain text
    text = (
        opinion.get("plain_text") or
        opinion.get("html_with_citations") or
        opinion.get("html") or
        ""
    )
    
    return {
        "id": opinion["id"],
        "case_name": opinion.get("cluster_id"),
        "date_filed": opinion.get("date_filed"),
        "author": opinion.get("author_str"),
        "text": text,
        "download_url": opinion.get("download_url"),
    }

# Pull the first opinion from your search results
first_opinion_id = opinions[0]["id"]
full_opinion = get_opinion_text(first_opinion_id)
print(f"Characters in opinion: {len(full_opinion['text'])}")

The plain_text field is pre-extracted. The html_with_citations field includes hyperlinked citations to other opinions, which is useful for building citation graphs.

The Data Model: Dockets, Clusters, and Opinions

CourtListener uses a three-layer hierarchy that confuses most first-time users.

Docket is the case. A docket represents a lawsuit from filing to disposition. It has metadata: parties, attorneys, court, filing date, cause of action. One docket can generate multiple rulings at different stages.

Cluster is a decision instance. When a court issues a ruling, all documents from that ruling event group together under one cluster. A cluster has a date, the presiding judges, and the citation if one has been assigned (e.g., 123 F.3d 456).

Opinion is the actual document. One cluster can contain multiple opinions: the majority, concurrences, and dissents. Each opinion has its own text, author, and type code.

The relationship is: Docket 1:N Clusters 1:N Opinions.

When building a legal dataset, you typically want to work at the opinion level for text and the docket level for case metadata. The cluster ties them together.

Limitations Worth Knowing

PACER filings are not here. CourtListener has PACER docket metadata (entries, filing dates, party lists), but the underlying PDFs for motions, briefs, and exhibits are paywalled on PACER. If you need those, you need PACER credentials and the pay-per-page system.

Coverage varies significantly. Federal circuit court coverage goes back decades and is nearly complete. State court coverage depends on whether the state publishes machine-readable opinions. Some state trial courts have almost no coverage.

Text quality is inconsistent. Older opinions were OCR-scanned from PDFs. OCR errors appear in documents from the 1980s and earlier. The plain_text field is usable for modern opinions; older ones may need additional cleaning.

Citations are not always resolved. The html_with_citations field links to other CourtListener opinions, but not every citation in a document points to a record that exists in the system.

When to Use the Scraper vs the Raw API

The raw API handles well when you need a targeted keyword search with under a few hundred results, or when you want to pull a specific docket or opinion by ID. Authentication is straightforward and the response structure is consistent.

The managed CourtListener scraper is faster in four situations:

Bulk exports. The API cursor pagination works, but iterating through tens of thousands of results in a single script means handling retries, rate-limit backoff, and cursor state yourself. The scraper handles all of this and writes results to a structured dataset.

Scheduled monitoring. If you want to watch for new opinions mentioning a company name or legal doctrine, you need a scheduled job that polls for new results since a given date. That is infrastructure you would otherwise build and maintain yourself.

Structured JSON without cleanup. The scraper normalizes the docket/cluster/opinion hierarchy into flat rows that are immediately usable in a dataframe or database without traversing nested objects.

No-code environments. If your legal team wants to pull data into Airtable or Google Sheets without writing Python, the scraper’s dataset output connects directly via Apify’s HTTP API.

Use Cases

Litigation intelligence. Track all new opinions mentioning a competitor, a patent number, or a regulatory topic. Set up scheduled runs and pipe results into a Slack notification or CRM.

Contract clause research. Pull opinions where courts have interpreted specific contract language. Feed the opinion text into an LLM to extract how courts rule on particular clause types across jurisdictions.

Legal RAG systems. Build a retrieval-augmented generation system over case law. Index opinion text into a vector database, then let users query in plain English and retrieve relevant precedents with citations.

Regulatory monitoring. Track how federal district and circuit courts are ruling on a specific regulatory regime. Useful for compliance teams watching litigation trends in their industry.

Academic research. CourtListener is used by legal scholars for empirical analysis: how sentencing patterns shift over time, how often circuit courts reverse district courts, how citation networks evolve.

The combination of free access, a documented REST API, and 10 million documents makes CourtListener one of the most underused data sources in the legal intelligence space. The data model takes 30 minutes to internalize. After that, the limits are computational, not access-related.

CourtListener API: How to Search US Court Records and Case Law Programmatically