How to Export Google Trends Data at Scale for Market Research
Exporting Google Trends for dozens or hundreds of keywords while avoiding rate limits, handling the normalization quirks
The actor referenced in this article is live on Apify. Pay only for results delivered.
Google Trends lets you compare up to 5 keywords at a time in the UI. For market research that requires 50, 500, or 5,000 keywords, you need a programmatic approach with careful handling of the normalization quirks that make multi-keyword analysis tricky.
TL;DR: Bulk-exporting Google Trends for hundreds of keywords requires anchor normalization: include a stable high-volume keyword (e.g., “python”) in every batch so results are comparable across queries. Rate-limit between batches (2+ seconds). Store datasets with deterministic IDs, handle zero values with linear interpolation, and export to CSV for spreadsheet analysis.
The Normalization Problem
Google Trends data is always indexed relative to the peak in your selected timeframe. This creates a problem when comparing keywords across separate API calls.
Example: You query keywords A, B, and C together. In that query, A is the highest-traffic keyword with value 100 (peak), B is half that at 50, and C is 10.
If you query B and D in a separate call, B might be 100 in that query (if it’s the highest traffic of the pair), making it look like B and D have similar traffic to A — which is wrong.
The solution: Anchor keywords
Always include the same 1-2 high-traffic “anchor” keywords in every batch call. This gives you a common reference point across all batches.
# Anchor with well-known stable keyword
ANCHOR_KEYWORD = 'python' # High volume, stable trend — good anchor
def batch_keywords_with_anchor(keywords: list[str], anchor: str, batch_size: int = 4) -> list[list[str]]:
"""Split keywords into batches, always including the anchor."""
batches = []
for i in range(0, len(keywords), batch_size):
batch = keywords[i:i + batch_size]
if anchor not in batch:
batch = [anchor] + batch[:batch_size - 1]
batches.append(batch)
return batches
Then use the anchor’s value to normalize across batches:
def normalize_across_batches(batch_results: list[dict], anchor: str) -> dict:
"""Normalize all keywords to a common scale using anchor values."""
# Collect anchor values per batch
anchor_values = {}
for result in batch_results:
if result['keyword'] == anchor:
avg_val = sum(p['value'] for p in result['interest_over_time']) / len(result['interest_over_time'])
anchor_values[result['batch_id']] = avg_val
# Overall anchor average
overall_anchor_avg = sum(anchor_values.values()) / len(anchor_values) if anchor_values else 1
# Scale all other keywords relative to their batch's anchor
normalized = {}
for result in batch_results:
if result['keyword'] == anchor:
continue
batch_anchor_avg = anchor_values.get(result['batch_id'], overall_anchor_avg)
scale_factor = overall_anchor_avg / max(batch_anchor_avg, 0.1)
normalized[result['keyword']] = [
{'date': p['date'], 'value': p['value'] * scale_factor}
for p in result['interest_over_time']
]
return normalized
Large-Scale Collection
For hundreds of keywords, rate limiting is the main challenge. Google Trends blocks bursts of requests from the same session.
from apify_client import ApifyClient
import time
client = ApifyClient('YOUR_API_TOKEN')
def collect_at_scale(keywords: list[str], timeframe: str, geo: str, delay_sec: float = 2.0) -> list[dict]:
"""Collect Trends data for many keywords with rate limit management."""
ANCHOR = 'python' # Or choose a domain-appropriate anchor
batches = batch_keywords_with_anchor(keywords, ANCHOR, batch_size=4)
all_results = []
for batch_idx, batch in enumerate(batches):
print(f"Batch {batch_idx + 1}/{len(batches)}: {batch}")
run = client.actor('themineworks/google-trends-pro').call(run_input={
'keywords': batch,
'timeframe': timeframe,
'geo': geo,
'includeRelatedQueries': False, # Skip to reduce API calls
})
for item in client.dataset(run['defaultDatasetId']).iterate_items():
item['batch_id'] = batch_idx
all_results.append(item)
# Rate limit between batches
if batch_idx < len(batches) - 1:
time.sleep(delay_sec)
return all_results
Building a Reproducible Research Dataset
For market research that needs to be reproducible (comparison over time, sharing with colleagues), structure your data for versioned storage:
import json
import hashlib
from datetime import datetime
from pathlib import Path
def save_trends_dataset(results: list[dict], metadata: dict, output_dir: str = './trends-datasets'):
"""Save a fully reproducible trends dataset with metadata."""
Path(output_dir).mkdir(exist_ok=True)
# Generate a deterministic dataset ID based on the query parameters
query_hash = hashlib.md5(
json.dumps(metadata, sort_keys=True).encode()
).hexdigest()[:8]
timestamp = datetime.utcnow().strftime('%Y%m%d')
dataset_id = f"{timestamp}_{query_hash}"
dataset = {
'id': dataset_id,
'metadata': {
**metadata,
'collected_at': datetime.utcnow().isoformat(),
'record_count': len(results),
},
'data': results,
}
filepath = Path(output_dir) / f"{dataset_id}.json"
with open(filepath, 'w') as f:
json.dump(dataset, f, indent=2)
print(f"Saved dataset {dataset_id}: {len(results)} keyword records to {filepath}")
return filepath
# Example usage
keywords_to_research = ['web scraping', 'data extraction', 'apify', 'puppeteer', 'playwright']
results = collect_at_scale(keywords_to_research, timeframe='today 5-y', geo='US')
save_trends_dataset(results, {
'query_type': 'competitive_analysis',
'keywords': keywords_to_research,
'timeframe': 'today 5-y',
'geo': 'US',
'anchor': 'python',
'analyst': 'your-name',
})
Handling Missing Data
Google Trends returns value 0 for weeks/days with insufficient data. Distinguish between “zero searches” and “no data”:
def fill_missing_values(iot: list[dict], method: str = 'interpolate') -> list[dict]:
"""Handle zero/null values in interest over time."""
if method == 'interpolate':
# Linear interpolation between non-zero values
filled = iot.copy()
for i, point in enumerate(filled):
if point['value'] == 0:
# Find previous and next non-zero values
prev = next((filled[j]['value'] for j in range(i-1, -1, -1) if filled[j]['value'] > 0), None)
nxt = next((filled[j]['value'] for j in range(i+1, len(filled)) if filled[j]['value'] > 0), None)
if prev and nxt:
filled[i]['value'] = (prev + nxt) / 2
elif prev:
filled[i]['value'] = prev
elif nxt:
filled[i]['value'] = nxt
return filled
elif method == 'mark_missing':
# Preserve zeros but add flag
return [{**p, 'is_missing': p['value'] == 0} for p in iot]
return iot
Export Formats
import csv
# CSV export for spreadsheet analysis
def export_csv(normalized_data: dict, filepath: str):
# Get all dates from first keyword
first = list(normalized_data.values())[0]
dates = [p['date'] for p in first]
with open(filepath, 'w', newline='') as f:
writer = csv.writer(f)
writer.writerow(['date'] + list(normalized_data.keys()))
for i, date in enumerate(dates):
row = [date] + [
round(normalized_data[kw][i]['value'], 1)
for kw in normalized_data
]
writer.writerow(row)
print(f"Exported to {filepath}")
The complete workflow — anchor-normalized batching, rate limit management, reproducible datasets, and clean CSV export — gives you research-grade Google Trends data at any scale.
Frequently Asked Questions
Why can you not directly compare Google Trends values from separate API calls?
Google Trends indexes values relative to the peak within the specific query — a value of 100 means “highest in this batch,” not an absolute level. If you query “python” alone and get 100, and query “javascript” alone and get 100, they appear equal. But if you include both in the same query, you might see python at 85 and javascript at 100, revealing their actual relative popularity. Each separate API call creates its own independent 0-100 scale.
What is an anchor keyword, and why is it essential for large-scale Trends research?
An anchor keyword is a stable, high-volume keyword included in every batch query — typically something like “python” or “weather” — that serves as a common reference point. Because every batch includes the same anchor, you can normalize all values relative to it: normalized_value = raw_value / anchor_value. This makes values from different batches comparable. Without an anchor, you cannot meaningfully compare interest in “rag crawler” collected in batch 1 against “reddit scraper” collected in batch 2.
How do you handle Google Trends rate limiting when processing hundreds of keywords?
Add at least 2 seconds between API requests, and 10-15 seconds between batches of 5 keywords. If you receive a 429 or a TooManyRequestsError, back off exponentially starting at 60 seconds. Use the Apify Google Trends Pro actor to avoid implementing this yourself — it handles rate limiting, proxy rotation, and retries internally. Running 200 keywords through the actor typically takes 15-20 minutes versus hours of managing rate limits manually.
What does a Google Trends value of zero mean, and how should you handle it?
A zero value means search interest was too low to register in the index during that period — not literally zero searches. Treat zeros as “below detection threshold” rather than “no searches.” For time-series analysis, replace leading/trailing zeros with the first/last non-zero value, and fill interior zeros with linear interpolation between adjacent non-zero values. Avoid computing growth rates across zero values — division by zero produces infinite growth that corrupts trend analysis.
How do you structure a reproducible Google Trends research dataset?
Give each dataset a deterministic ID based on the keyword, timeframe, and geography — md5(f"{keyword}:{timeframe}:{geo}"). Store the raw API response alongside the processed values so you can reprocess if your normalization logic changes. Tag every record with the anchor keyword and its value in that batch, so future readers can verify or re-normalize. Store timestamps in UTC and use ISO 8601 format. A well-structured dataset should produce identical results when reprocessed months later.
Try the scraper referenced in this article — live on Apify, pay only for results.
Open google-trends-pro on Apify →How to Scrape AmbitionBox Company Reviews and Ratings
AmbitionBox is India largest employer review platform with 300,000 companies. Learn how to pull ratings, review counts, salary data, and dimension scores as structured JSON without any official API.
AliExpress Product Data API: Prices, Ratings, and Orders in Python
AliExpress affiliate API has restricted coverage. Learn how to scrape AliExpress product listings for prices, ratings, order counts, and seller data as structured JSON — no affiliate approval needed.
ClinicalTrials.gov API v2: How to Search 500,000 Studies and Track Trial Status
ClinicalTrials.gov upgraded to a v2 REST API in 2024. Here is how to use it, what changed from v1, and how to build automated trial monitoring pipelines in Python.