The Mine Works
Browse on Apify
How to Export Google Trends Data at Scale for Market Research
← All posts
tutorial November 24, 2025 · 7 min read

How to Export Google Trends Data at Scale for Market Research

Exporting Google Trends for dozens or hundreds of keywords while avoiding rate limits, handling the normalization quirks

Try the scraper

The actor referenced in this article is live on Apify. Pay only for results delivered.

Open on Apify →

Google Trends lets you compare up to 5 keywords at a time in the UI. For market research that requires 50, 500, or 5,000 keywords, you need a programmatic approach with careful handling of the normalization quirks that make multi-keyword analysis tricky.

TL;DR: Bulk-exporting Google Trends for hundreds of keywords requires anchor normalization: include a stable high-volume keyword (e.g., “python”) in every batch so results are comparable across queries. Rate-limit between batches (2+ seconds). Store datasets with deterministic IDs, handle zero values with linear interpolation, and export to CSV for spreadsheet analysis.

The Normalization Problem

Google Trends data is always indexed relative to the peak in your selected timeframe. This creates a problem when comparing keywords across separate API calls.

Example: You query keywords A, B, and C together. In that query, A is the highest-traffic keyword with value 100 (peak), B is half that at 50, and C is 10.

If you query B and D in a separate call, B might be 100 in that query (if it’s the highest traffic of the pair), making it look like B and D have similar traffic to A — which is wrong.

The solution: Anchor keywords

Always include the same 1-2 high-traffic “anchor” keywords in every batch call. This gives you a common reference point across all batches.

# Anchor with well-known stable keyword
ANCHOR_KEYWORD = 'python'  # High volume, stable trend — good anchor

def batch_keywords_with_anchor(keywords: list[str], anchor: str, batch_size: int = 4) -> list[list[str]]:
    """Split keywords into batches, always including the anchor."""
    batches = []
    for i in range(0, len(keywords), batch_size):
        batch = keywords[i:i + batch_size]
        if anchor not in batch:
            batch = [anchor] + batch[:batch_size - 1]
        batches.append(batch)
    return batches

Then use the anchor’s value to normalize across batches:

def normalize_across_batches(batch_results: list[dict], anchor: str) -> dict:
    """Normalize all keywords to a common scale using anchor values."""
    
    # Collect anchor values per batch
    anchor_values = {}
    for result in batch_results:
        if result['keyword'] == anchor:
            avg_val = sum(p['value'] for p in result['interest_over_time']) / len(result['interest_over_time'])
            anchor_values[result['batch_id']] = avg_val
    
    # Overall anchor average
    overall_anchor_avg = sum(anchor_values.values()) / len(anchor_values) if anchor_values else 1
    
    # Scale all other keywords relative to their batch's anchor
    normalized = {}
    for result in batch_results:
        if result['keyword'] == anchor:
            continue
        
        batch_anchor_avg = anchor_values.get(result['batch_id'], overall_anchor_avg)
        scale_factor = overall_anchor_avg / max(batch_anchor_avg, 0.1)
        
        normalized[result['keyword']] = [
            {'date': p['date'], 'value': p['value'] * scale_factor}
            for p in result['interest_over_time']
        ]
    
    return normalized

Large-Scale Collection

For hundreds of keywords, rate limiting is the main challenge. Google Trends blocks bursts of requests from the same session.

from apify_client import ApifyClient
import time

client = ApifyClient('YOUR_API_TOKEN')

def collect_at_scale(keywords: list[str], timeframe: str, geo: str, delay_sec: float = 2.0) -> list[dict]:
    """Collect Trends data for many keywords with rate limit management."""
    
    ANCHOR = 'python'  # Or choose a domain-appropriate anchor
    batches = batch_keywords_with_anchor(keywords, ANCHOR, batch_size=4)
    
    all_results = []
    
    for batch_idx, batch in enumerate(batches):
        print(f"Batch {batch_idx + 1}/{len(batches)}: {batch}")
        
        run = client.actor('themineworks/google-trends-pro').call(run_input={
            'keywords': batch,
            'timeframe': timeframe,
            'geo': geo,
            'includeRelatedQueries': False,  # Skip to reduce API calls
        })
        
        for item in client.dataset(run['defaultDatasetId']).iterate_items():
            item['batch_id'] = batch_idx
            all_results.append(item)
        
        # Rate limit between batches
        if batch_idx < len(batches) - 1:
            time.sleep(delay_sec)
    
    return all_results

Building a Reproducible Research Dataset

For market research that needs to be reproducible (comparison over time, sharing with colleagues), structure your data for versioned storage:

import json
import hashlib
from datetime import datetime
from pathlib import Path

def save_trends_dataset(results: list[dict], metadata: dict, output_dir: str = './trends-datasets'):
    """Save a fully reproducible trends dataset with metadata."""
    
    Path(output_dir).mkdir(exist_ok=True)
    
    # Generate a deterministic dataset ID based on the query parameters
    query_hash = hashlib.md5(
        json.dumps(metadata, sort_keys=True).encode()
    ).hexdigest()[:8]
    
    timestamp = datetime.utcnow().strftime('%Y%m%d')
    dataset_id = f"{timestamp}_{query_hash}"
    
    dataset = {
        'id': dataset_id,
        'metadata': {
            **metadata,
            'collected_at': datetime.utcnow().isoformat(),
            'record_count': len(results),
        },
        'data': results,
    }
    
    filepath = Path(output_dir) / f"{dataset_id}.json"
    with open(filepath, 'w') as f:
        json.dump(dataset, f, indent=2)
    
    print(f"Saved dataset {dataset_id}: {len(results)} keyword records to {filepath}")
    return filepath

# Example usage
keywords_to_research = ['web scraping', 'data extraction', 'apify', 'puppeteer', 'playwright']
results = collect_at_scale(keywords_to_research, timeframe='today 5-y', geo='US')

save_trends_dataset(results, {
    'query_type': 'competitive_analysis',
    'keywords': keywords_to_research,
    'timeframe': 'today 5-y',
    'geo': 'US',
    'anchor': 'python',
    'analyst': 'your-name',
})

Handling Missing Data

Google Trends returns value 0 for weeks/days with insufficient data. Distinguish between “zero searches” and “no data”:

def fill_missing_values(iot: list[dict], method: str = 'interpolate') -> list[dict]:
    """Handle zero/null values in interest over time."""
    
    if method == 'interpolate':
        # Linear interpolation between non-zero values
        filled = iot.copy()
        for i, point in enumerate(filled):
            if point['value'] == 0:
                # Find previous and next non-zero values
                prev = next((filled[j]['value'] for j in range(i-1, -1, -1) if filled[j]['value'] > 0), None)
                nxt = next((filled[j]['value'] for j in range(i+1, len(filled)) if filled[j]['value'] > 0), None)
                
                if prev and nxt:
                    filled[i]['value'] = (prev + nxt) / 2
                elif prev:
                    filled[i]['value'] = prev
                elif nxt:
                    filled[i]['value'] = nxt
        return filled
    
    elif method == 'mark_missing':
        # Preserve zeros but add flag
        return [{**p, 'is_missing': p['value'] == 0} for p in iot]
    
    return iot

Export Formats

import csv

# CSV export for spreadsheet analysis
def export_csv(normalized_data: dict, filepath: str):
    # Get all dates from first keyword
    first = list(normalized_data.values())[0]
    dates = [p['date'] for p in first]
    
    with open(filepath, 'w', newline='') as f:
        writer = csv.writer(f)
        writer.writerow(['date'] + list(normalized_data.keys()))
        
        for i, date in enumerate(dates):
            row = [date] + [
                round(normalized_data[kw][i]['value'], 1)
                for kw in normalized_data
            ]
            writer.writerow(row)
    
    print(f"Exported to {filepath}")

The complete workflow — anchor-normalized batching, rate limit management, reproducible datasets, and clean CSV export — gives you research-grade Google Trends data at any scale.

Frequently Asked Questions

Why can you not directly compare Google Trends values from separate API calls?

Google Trends indexes values relative to the peak within the specific query — a value of 100 means “highest in this batch,” not an absolute level. If you query “python” alone and get 100, and query “javascript” alone and get 100, they appear equal. But if you include both in the same query, you might see python at 85 and javascript at 100, revealing their actual relative popularity. Each separate API call creates its own independent 0-100 scale.

What is an anchor keyword, and why is it essential for large-scale Trends research?

An anchor keyword is a stable, high-volume keyword included in every batch query — typically something like “python” or “weather” — that serves as a common reference point. Because every batch includes the same anchor, you can normalize all values relative to it: normalized_value = raw_value / anchor_value. This makes values from different batches comparable. Without an anchor, you cannot meaningfully compare interest in “rag crawler” collected in batch 1 against “reddit scraper” collected in batch 2.

How do you handle Google Trends rate limiting when processing hundreds of keywords?

Add at least 2 seconds between API requests, and 10-15 seconds between batches of 5 keywords. If you receive a 429 or a TooManyRequestsError, back off exponentially starting at 60 seconds. Use the Apify Google Trends Pro actor to avoid implementing this yourself — it handles rate limiting, proxy rotation, and retries internally. Running 200 keywords through the actor typically takes 15-20 minutes versus hours of managing rate limits manually.

What does a Google Trends value of zero mean, and how should you handle it?

A zero value means search interest was too low to register in the index during that period — not literally zero searches. Treat zeros as “below detection threshold” rather than “no searches.” For time-series analysis, replace leading/trailing zeros with the first/last non-zero value, and fill interior zeros with linear interpolation between adjacent non-zero values. Avoid computing growth rates across zero values — division by zero produces infinite growth that corrupts trend analysis.

How do you structure a reproducible Google Trends research dataset?

Give each dataset a deterministic ID based on the keyword, timeframe, and geography — md5(f"{keyword}:{timeframe}:{geo}"). Store the raw API response alongside the processed values so you can reprocess if your normalization logic changes. Tag every record with the anchor keyword and its value in that batch, so future readers can verify or re-normalize. Store timestamps in UTC and use ISO 8601 format. A well-structured dataset should produce identical results when reprocessed months later.

Related Actor

Try the scraper referenced in this article — live on Apify, pay only for results.

Open google-trends-pro on Apify →