How to Scrape Meta Threads Data in 2025 (Without Getting Blocked)
Meta Threads has no public API for third-party developers. This guide shows the current working approaches for extracting profile data, post content
The actor referenced in this article is live on Apify. Pay only for results delivered.
Meta launched Threads in July 2023 and reached 200 million monthly active users by the end of 2024. Unlike Twitter/X, which offers an API (at significant cost), Threads has no public API for third-party data access. The official API is restricted to content publishing only — you can post, but you cannot read.
TL;DR: Threads has no public read API. All data access requires a valid Instagram session cookie — anonymous requests are blocked. The most robust approach simulates the mobile app using consistent device headers. At any meaningful volume, GraphQL endpoint changes (every few weeks) and session management make a managed scraper the only practical option.
For researchers, marketers, and developers who need Threads data, web scraping is the only path. This post covers the current state of Threads scraping, what blocks you, and what works.
Why Threads Is Hard to Scrape
Threads loads all content through GraphQL APIs that are gated behind authentication. Unlike Reddit (which has a public OAuth path) or standard social networks with public endpoints, Threads requires a logged-in Instagram account to fetch any data.
The technical blockers:
1. Instagram session cookies
Threads uses the same authentication infrastructure as Instagram. To fetch any Threads data, you need a valid Instagram session (sessionid cookie). Anonymous requests get redirected to a login wall.
2. Anti-automation detection Meta uses device fingerprinting and behavioral analysis across the Instagram/Threads session. Accounts that exhibit automation patterns — consistent timing intervals, high request rates, uniform device signatures — get rate limited or banned.
3. GraphQL endpoint obfuscation The GraphQL operation names and document hashes change with app updates. Hard-coded endpoint targeting breaks every few weeks.
What Data Is Available
Despite the authentication requirement, once you have a valid session, the data available is comprehensive:
- Profile data: Username, display name, biography, follower/following count, verification status, profile photo URL
- Posts: Text content, media attachments (images, videos), post timestamp, like count, reply count, repost count, quote count
- Replies: Nested reply threads up to several levels deep
- Following/followers: The accounts a user follows and their followers (limited by Meta’s pagination)
Approach 1: Session Extraction from a Real Instagram Account
The manual setup path for low-volume use:
- Log into Instagram in Chrome with a dedicated scraper account (not your personal account)
- Extract the
sessionidandcsrftokencookies from DevTools - Use these in your HTTP requests
import requests
INSTAGRAM_SESSION = 'your_sessionid_here'
CSRF_TOKEN = 'your_csrftoken_here'
headers = {
'Cookie': f'sessionid={INSTAGRAM_SESSION}; csrftoken={CSRF_TOKEN}',
'X-CSRFToken': CSRF_TOKEN,
'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X)',
'Referer': 'https://www.threads.net/',
}
# Fetch a user's profile
res = requests.get(
'https://www.threads.net/api/graphql',
params={'doc_id': 'THREADS_DOC_ID', 'variables': '{"username":"zuck"}'},
headers=headers,
)
Limitations: Session cookies expire (typically 90 days). The doc_id values for GraphQL operations change with app updates. This approach requires maintenance.
Approach 2: Mobile App Session Simulation
More robust than browser cookie extraction. This approach simulates the Threads iOS or Android app using the app’s actual API endpoints and authentication flow.
The Threads app communicates with i.instagram.com and www.threads.net endpoints using a combination of the sessionid from Instagram login and device-specific headers that prove the session came from a mobile device.
Key headers the mobile app sends:
X-IG-App-ID: 238260118697367
User-Agent: Barcelona 289.0.0.77.109 Android
X-Device-ID: {consistent_device_uuid}
Limitations: Meta flags device IDs that make requests at non-human rates. Rotating device IDs too frequently also triggers detection.
Approach 3: Managed Scraper
For any volume beyond a few hundred posts, a managed solution is the practical choice. Session management, account rotation, proxy infrastructure, and GraphQL endpoint tracking are significant ongoing maintenance burdens.
Our Threads Scraper handles this automatically:
from apify_client import ApifyClient
client = ApifyClient('YOUR_API_TOKEN')
run = client.actor('themineworks/threads-scraper').call(run_input={
'mode': 'profile',
'profileUsernames': ['zuck', 'mosseri', 'instagram'],
'maxPosts': 50,
'includeReplies': False,
})
for post in client.dataset(run['defaultDatasetId']).iterate_items():
print(post['username'], post['text'][:100], post['likeCount'])
Available modes:
profile— all posts from a specific userpost— a single post and its reply threadsearch— posts matching a keyword (limited availability)
Rate Limits and Ethical Considerations
Meta’s terms of service prohibit automated data collection from Threads and Instagram. Before scraping Threads data, ensure your use case is compliant with applicable law, including EU data protection regulations (GDPR) and US state privacy laws.
From a practical standpoint:
- Keep request rates below 1 per second per session
- Use fresh accounts with organic posting history, not freshly created accounts
- Do not collect or store personal data beyond what your use case requires
- Respect user privacy settings — accounts set to private are private for a reason
What You Cannot Get
- Direct messages
- Posts from private accounts
- Real-time streaming (no webhooks or SSE)
- Complete historical data before 2023
- Advertiser or business account analytics
Comparison: Threads vs Twitter/X Data Access
| Attribute | Threads | Twitter/X |
|---|---|---|
| Public API | No (publish only) | Yes ($100/month minimum) |
| Free tier | None | Very limited |
| Historical data | Limited | Available (expensive) |
| Scraping feasibility | Medium | Medium |
Both platforms have made programmatic data access deliberately expensive or technically difficult. Threads does not charge because it has no API to charge for. Twitter/X charges $100/month for 10,000 tweets — comparable to what you can get from a managed scraper without the API overhead.
Frequently Asked Questions
Does Meta Threads have a public API for reading data?
Threads has a publish-only official API for brands and creators — you can post content but cannot read data programmatically. There is no public read API for third-party developers. All data collection from Threads requires scraping via a valid Instagram session.
What data can you extract from Threads scraping?
With a valid Instagram session, you can access profile data (username, follower counts, bio, verification status), post content (text, media attachments, like/reply/repost counts, timestamps), nested reply threads, and follower/following lists. Direct messages, private account data, and real-time streaming are not accessible.
How does Threads prevent automated scraping?
Threads uses three main defenses: Instagram session requirements that block all anonymous requests, Meta’s device fingerprinting and behavioral analysis that flags automation patterns, and GraphQL operation hashes that change with each app update every few weeks, breaking scrapers that hard-code specific endpoint identifiers.
How often do Threads GraphQL endpoints change?
Meta’s GraphQL operation names and document hashes change with each app update, typically every few weeks. Any scraper hard-coding specific doc IDs or operation names will break. Maintainable scrapers dynamically discover current endpoints from the app’s JavaScript bundle rather than relying on static values.
How does scraping Threads compare to using the Twitter/X API?
Twitter/X offers a public API at $100/month for 10,000 tweets. Threads offers no read API at any price. In practice, both platforms are scraped via browser automation at comparable cost and reliability. Twitter/X’s official API is more stable but expensive; Threads scraping is cheaper but requires ongoing session and endpoint maintenance.
Try the scraper referenced in this article — live on Apify, pay only for results.
Open threads-scraper on Apify →How to Scrape AmbitionBox Company Reviews and Ratings
AmbitionBox is India largest employer review platform with 300,000 companies. Learn how to pull ratings, review counts, salary data, and dimension scores as structured JSON without any official API.
AliExpress Product Data API: Prices, Ratings, and Orders in Python
AliExpress affiliate API has restricted coverage. Learn how to scrape AliExpress product listings for prices, ratings, order counts, and seller data as structured JSON — no affiliate approval needed.
ClinicalTrials.gov API v2: How to Search 500,000 Studies and Track Trial Status
ClinicalTrials.gov upgraded to a v2 REST API in 2024. Here is how to use it, what changed from v1, and how to build automated trial monitoring pipelines in Python.