The Mine Works
Browse on Apify
How to Scrape Meta Threads Data in 2025 (Without Getting Blocked)
← All posts
tutorial June 23, 2025 · 6 min read

How to Scrape Meta Threads Data in 2025 (Without Getting Blocked)

Meta Threads has no public API for third-party developers. This guide shows the current working approaches for extracting profile data, post content

Try the scraper

The actor referenced in this article is live on Apify. Pay only for results delivered.

Open on Apify →

Meta launched Threads in July 2023 and reached 200 million monthly active users by the end of 2024. Unlike Twitter/X, which offers an API (at significant cost), Threads has no public API for third-party data access. The official API is restricted to content publishing only — you can post, but you cannot read.

TL;DR: Threads has no public read API. All data access requires a valid Instagram session cookie — anonymous requests are blocked. The most robust approach simulates the mobile app using consistent device headers. At any meaningful volume, GraphQL endpoint changes (every few weeks) and session management make a managed scraper the only practical option.

For researchers, marketers, and developers who need Threads data, web scraping is the only path. This post covers the current state of Threads scraping, what blocks you, and what works.

Why Threads Is Hard to Scrape

Threads loads all content through GraphQL APIs that are gated behind authentication. Unlike Reddit (which has a public OAuth path) or standard social networks with public endpoints, Threads requires a logged-in Instagram account to fetch any data.

The technical blockers:

1. Instagram session cookies Threads uses the same authentication infrastructure as Instagram. To fetch any Threads data, you need a valid Instagram session (sessionid cookie). Anonymous requests get redirected to a login wall.

2. Anti-automation detection Meta uses device fingerprinting and behavioral analysis across the Instagram/Threads session. Accounts that exhibit automation patterns — consistent timing intervals, high request rates, uniform device signatures — get rate limited or banned.

3. GraphQL endpoint obfuscation The GraphQL operation names and document hashes change with app updates. Hard-coded endpoint targeting breaks every few weeks.

What Data Is Available

Despite the authentication requirement, once you have a valid session, the data available is comprehensive:

  • Profile data: Username, display name, biography, follower/following count, verification status, profile photo URL
  • Posts: Text content, media attachments (images, videos), post timestamp, like count, reply count, repost count, quote count
  • Replies: Nested reply threads up to several levels deep
  • Following/followers: The accounts a user follows and their followers (limited by Meta’s pagination)

Approach 1: Session Extraction from a Real Instagram Account

The manual setup path for low-volume use:

  1. Log into Instagram in Chrome with a dedicated scraper account (not your personal account)
  2. Extract the sessionid and csrftoken cookies from DevTools
  3. Use these in your HTTP requests
import requests

INSTAGRAM_SESSION = 'your_sessionid_here'
CSRF_TOKEN = 'your_csrftoken_here'

headers = {
    'Cookie': f'sessionid={INSTAGRAM_SESSION}; csrftoken={CSRF_TOKEN}',
    'X-CSRFToken': CSRF_TOKEN,
    'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X)',
    'Referer': 'https://www.threads.net/',
}

# Fetch a user's profile
res = requests.get(
    'https://www.threads.net/api/graphql',
    params={'doc_id': 'THREADS_DOC_ID', 'variables': '{"username":"zuck"}'},
    headers=headers,
)

Limitations: Session cookies expire (typically 90 days). The doc_id values for GraphQL operations change with app updates. This approach requires maintenance.

Approach 2: Mobile App Session Simulation

More robust than browser cookie extraction. This approach simulates the Threads iOS or Android app using the app’s actual API endpoints and authentication flow.

The Threads app communicates with i.instagram.com and www.threads.net endpoints using a combination of the sessionid from Instagram login and device-specific headers that prove the session came from a mobile device.

Key headers the mobile app sends:

X-IG-App-ID: 238260118697367
User-Agent: Barcelona 289.0.0.77.109 Android
X-Device-ID: {consistent_device_uuid}

Limitations: Meta flags device IDs that make requests at non-human rates. Rotating device IDs too frequently also triggers detection.

Approach 3: Managed Scraper

For any volume beyond a few hundred posts, a managed solution is the practical choice. Session management, account rotation, proxy infrastructure, and GraphQL endpoint tracking are significant ongoing maintenance burdens.

Our Threads Scraper handles this automatically:

from apify_client import ApifyClient

client = ApifyClient('YOUR_API_TOKEN')
run = client.actor('themineworks/threads-scraper').call(run_input={
    'mode': 'profile',
    'profileUsernames': ['zuck', 'mosseri', 'instagram'],
    'maxPosts': 50,
    'includeReplies': False,
})

for post in client.dataset(run['defaultDatasetId']).iterate_items():
    print(post['username'], post['text'][:100], post['likeCount'])

Available modes:

  • profile — all posts from a specific user
  • post — a single post and its reply thread
  • search — posts matching a keyword (limited availability)

Rate Limits and Ethical Considerations

Meta’s terms of service prohibit automated data collection from Threads and Instagram. Before scraping Threads data, ensure your use case is compliant with applicable law, including EU data protection regulations (GDPR) and US state privacy laws.

From a practical standpoint:

  • Keep request rates below 1 per second per session
  • Use fresh accounts with organic posting history, not freshly created accounts
  • Do not collect or store personal data beyond what your use case requires
  • Respect user privacy settings — accounts set to private are private for a reason

What You Cannot Get

  • Direct messages
  • Posts from private accounts
  • Real-time streaming (no webhooks or SSE)
  • Complete historical data before 2023
  • Advertiser or business account analytics

Comparison: Threads vs Twitter/X Data Access

AttributeThreadsTwitter/X
Public APINo (publish only)Yes ($100/month minimum)
Free tierNoneVery limited
Historical dataLimitedAvailable (expensive)
Scraping feasibilityMediumMedium

Both platforms have made programmatic data access deliberately expensive or technically difficult. Threads does not charge because it has no API to charge for. Twitter/X charges $100/month for 10,000 tweets — comparable to what you can get from a managed scraper without the API overhead.

Frequently Asked Questions

Does Meta Threads have a public API for reading data?

Threads has a publish-only official API for brands and creators — you can post content but cannot read data programmatically. There is no public read API for third-party developers. All data collection from Threads requires scraping via a valid Instagram session.

What data can you extract from Threads scraping?

With a valid Instagram session, you can access profile data (username, follower counts, bio, verification status), post content (text, media attachments, like/reply/repost counts, timestamps), nested reply threads, and follower/following lists. Direct messages, private account data, and real-time streaming are not accessible.

How does Threads prevent automated scraping?

Threads uses three main defenses: Instagram session requirements that block all anonymous requests, Meta’s device fingerprinting and behavioral analysis that flags automation patterns, and GraphQL operation hashes that change with each app update every few weeks, breaking scrapers that hard-code specific endpoint identifiers.

How often do Threads GraphQL endpoints change?

Meta’s GraphQL operation names and document hashes change with each app update, typically every few weeks. Any scraper hard-coding specific doc IDs or operation names will break. Maintainable scrapers dynamically discover current endpoints from the app’s JavaScript bundle rather than relying on static values.

How does scraping Threads compare to using the Twitter/X API?

Twitter/X offers a public API at $100/month for 10,000 tweets. Threads offers no read API at any price. In practice, both platforms are scraped via browser automation at comparable cost and reliability. Twitter/X’s official API is more stable but expensive; Threads scraping is cheaper but requires ongoing session and endpoint maintenance.

Related Actor

Try the scraper referenced in this article — live on Apify, pay only for results.

Open threads-scraper on Apify →