Reddit's nested comment structure is complex to collect correctly. This guide covers the complete API approach for deep comment trees, deleted comments

Reddit’s comment system is significantly more complex than a flat list. Comments nest arbitrarily deep, popular threads can have tens of thousands of comments, and Reddit’s API truncates deep branches into “load more” placeholders that require separate API calls to expand.

TL;DR: Reddit’s comment API returns at most ~200 top-level comments per request. Deep branches are replaced with “more” placeholder objects containing IDs you must fetch separately via /api/morechildren (up to 100 IDs per call). For a complete thread you need to recursively expand all more-chains while tracking a queue of unresolved IDs, respecting the 100 req/min rate limit.

This guide covers the complete approach to collecting Reddit comment trees correctly.

Reddit’s Comment Structure

A Reddit post has a tree of comments. At the API level:

Top-level comments are direct children of the post
Each comment can have replies — another listing of comments
Deep branches are replaced with a more object containing children (comment IDs to fetch separately)
Reddit’s API returns at most ~200 comments per request at the top level

{
  "kind": "t1",
  "data": {
    "id": "abc123",
    "body": "Comment text",
    "author": "username",
    "score": 42,
    "replies": {
      "kind": "Listing",
      "data": {
        "children": [
          // More comment objects, recursively
          {
            "kind": "more",
            "data": {
              "id": "xyz",
              "children": ["def456", "ghi789"]  // IDs to fetch separately
            }
          }
        ]
      }
    }
  }
}

Fetching a Post and Top-Level Comments

const USER_AGENT = 'android:com.reddit.frontpage:v2024.45.0 (by /u/yourusername)';

async function getPostWithComments(postId, token) {
  const res = await fetch(
    `https://oauth.reddit.com/comments/${postId}?limit=200&sort=top&depth=10`,
    {
      headers: {
        'Authorization': `Bearer ${token}`,
        'User-Agent': USER_AGENT,
      }
    }
  );
  
  const [postListing, commentListing] = await res.json();
  
  const post = postListing.data.children[0].data;
  const comments = commentListing.data.children;
  
  return { post, comments };
}

The depth=10 parameter requests up to 10 levels of nesting. Reddit still truncates large branches, but you get deeper coverage than the default depth of 2.

Expanding “More” Placeholders

When you encounter a more object, fetch those IDs explicitly:

async function expandMoreComments(moreIds, postId, token) {
  // Reddit's morechildren endpoint accepts up to 100 IDs per call
  const batchSize = 100;
  const allComments = [];
  
  for (let i = 0; i < moreIds.length; i += batchSize) {
    const batch = moreIds.slice(i, i + batchSize);
    
    const res = await fetch('https://oauth.reddit.com/api/morechildren', {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${token}`,
        'User-Agent': USER_AGENT,
        'Content-Type': 'application/x-www-form-urlencoded',
      },
      body: new URLSearchParams({
        link_id: `t3_${postId}`,
        children: batch.join(','),
        sort: 'top',
        api_type: 'json',
      }),
    });
    
    const data = await res.json();
    const comments = data.json.data.things
      .filter(t => t.kind === 't1')
      .map(t => t.data);
    
    allComments.push(...comments);
    
    // Rate limit between batches
    await new Promise(r => setTimeout(r, 600));
  }
  
  return allComments;
}

Building the Full Tree

function flattenCommentTree(comments) {
  const flat = [];
  
  function traverse(items, depth = 0, parentId = null) {
    for (const item of items) {
      if (item.kind === 'more') continue;  // Handle separately
      
      const comment = item.data || item;
      flat.push({
        id: comment.id,
        body: comment.body,
        author: comment.author,
        score: comment.score,
        gilded: comment.gilded,
        created_utc: comment.created_utc,
        depth,
        parent_id: parentId,
        is_deleted: comment.body === '[deleted]' || comment.author === '[deleted]',
      });
      
      if (comment.replies?.data?.children) {
        traverse(comment.replies.data.children, depth + 1, comment.id);
      }
    }
  }
  
  traverse(comments);
  return flat;
}

Collecting All Comments (Full Thread)

For a complete thread with hundreds of “load more” chains:

async def collect_full_thread(post_id: str, token: str, max_iterations: int = 10) -> list[dict]:
    """Collect all comments from a Reddit thread, expanding all 'more' chains."""
    
    all_comments = {}  # id → comment
    more_queue = []
    
    # Initial fetch
    post, initial_comments = await get_post_with_comments(post_id, token)
    
    def process_comment_listing(comments, depth=0, parent_id=None):
        for item in comments:
            if item['kind'] == 'more':
                more_queue.extend(item['data']['children'])
            elif item['kind'] == 't1':
                c = item['data']
                all_comments[c['id']] = {
                    'id': c['id'],
                    'body': c.get('body', ''),
                    'author': c.get('author', ''),
                    'score': c.get('score', 0),
                    'depth': depth,
                    'parent_id': parent_id,
                    'created_utc': c.get('created_utc'),
                }
                if c.get('replies') and c['replies'].get('data'):
                    process_comment_listing(
                        c['replies']['data']['children'],
                        depth + 1,
                        c['id']
                    )
    
    process_comment_listing(initial_comments)
    
    # Expand more chains
    iteration = 0
    while more_queue and iteration < max_iterations:
        batch = more_queue[:100]
        more_queue = more_queue[100:]
        
        expanded = await expand_more_comments(batch, post_id, token)
        process_comment_listing([{'kind': 't1', 'data': c} for c in expanded])
        iteration += 1
    
    return list(all_comments.values())

Using the Managed Scraper

The session management, more chain expansion, and rate limiting in the above code is handled automatically in our Reddit Scraper:

run = client.actor('themineworks/reddit-scraper').call(run_input={
    'mode': 'post',
    'postIds': ['abc123', 'def456'],
    'includeComments': True,
    'maxComments': 500,
    'expandMore': True,  # Expand all "load more" chains
})

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(f"Post: {item['title']}")
    print(f"Comments: {len(item['comments'])}")
    
    # Comments include depth and parent_id for tree reconstruction
    top_level = [c for c in item['comments'] if c['depth'] == 0]
    print(f"Top-level: {len(top_level)}")

Working with Deleted Comments

Reddit’s API returns deleted content as [deleted] body with [deleted] or [removed] author. “Deleted” means the user removed it; “removed” means a moderator removed it. Both are irretrievably gone from the API.

For research requiring deleted comment recovery, Pushshift (when available) historically cached comments before deletion. Current access is limited — see the current Pushshift status for researchers.

For NLP work, filter deleted comments before any analysis to avoid training on [deleted] tokens.

Frequently Asked Questions

How does Reddit’s comment tree structure work at the API level?

Reddit comments form a recursive tree. Each comment has a replies field containing more comment listings. When a branch exceeds Reddit’s depth limit or a thread has too many comments, the API returns a more object instead of actual comments, containing the IDs of unexpanded comments you must fetch separately via /api/morechildren.

How do you get all comments from a Reddit thread including hidden branches?

First fetch the post with depth=10 to get the initial tree with up to 10 levels of nesting. Collect all more objects encountered, which contain IDs of unexpanded comments. Batch those IDs (up to 100 per request) to /api/morechildren. Process results recursively — expanded branches may contain further more objects until the queue is empty.

What is the rate limit for fetching Reddit comment trees?

The Reddit OAuth API enforces 100 requests per minute per token. Each /api/morechildren call counts against this limit. For a thread with 5,000 comments you may need 50+ morechildren calls — space them at least 600ms apart per Reddit’s terms of service to avoid rate limit errors.

How do you handle deleted and removed comments in Reddit data?

Deleted comments appear as [deleted] in both the body and author fields when the user removed it. Moderator removals show [removed] in the body with [deleted] as author. The content is permanently gone from the API — filter these before any NLP work to avoid training models on the [deleted] token.

What fields does Reddit’s comment API return for each comment?

Each comment object includes: id, body (text content), author, score, gilded, created_utc timestamp, parent_id (the comment or post being replied to), depth in the tree, edited timestamp if modified, is_submitter (whether the author is the original poster), and distinguished (marking moderator or admin comments).

Scraping Reddit Comments and Full Thread Trees in 2025