The Mine Works
Browse on Apify
Build an India Job Market Intelligence Tool with Claude and the Naukri Scraper
← All posts
tutorial September 29, 2025 · 10 min read

Build an India Job Market Intelligence Tool with Claude and the Naukri Scraper

Use Apify's Naukri Jobs scraper with Claude to automate salary benchmarking, skills demand analysis, and hiring trend tracking for the Indian tech market.

Try the scraper

The actor referenced in this article is live on Apify. Pay only for results delivered.

Open on Apify →

India’s tech job market generates enormous amounts of data that is almost entirely locked inside Naukri.com — the country’s dominant job board with 70+ million registered job seekers. For HR tech companies, compensation consultancies, staffing firms, and investors tracking the Indian tech sector, this data is genuinely valuable.

TL;DR: Build an India job market intelligence tool by combining the Naukri Scraper (handles Akamai bypass and session warming) with Claude for 4 patterns: salary benchmarking by experience band with LPA statistics, skills demand heatmap extracted from job descriptions, company hiring velocity tracking by month, and remote/hybrid work distribution analysis. Claude turns raw aggregates into structured, actionable reports automatically.

The challenge is extraction. Naukri has no public API, runs Akamai bot detection, and requires session warming before any data can be fetched. The Naukri Jobs scraper handles all of that. This guide shows you what to build with the data once you have it — using Claude as the analysis layer.

Core Setup

pip install apify-client anthropic pandas python-dotenv
from apify_client import ApifyClient
import anthropic
import json
import os
from collections import defaultdict

apify = ApifyClient(os.environ["APIFY_TOKEN"])
claude = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

def fetch_naukri_jobs(
    keyword: str,
    location: str = "",
    experience_min: int = 0,
    experience_max: int = 20,
    max_jobs: int = 100,
    include_descriptions: bool = False,
) -> list[dict]:
    run = apify.actor("themineworks/naukri-jobs").call(run_input={
        "searchKeywords": keyword,
        "location": location,
        "experienceMin": experience_min,
        "experienceMax": experience_max,
        "maxJobs": max_jobs,
        "includeJobDescription": include_descriptions,
    })
    return list(apify.dataset(run["defaultDatasetId"]).iterate_items())

Pattern 1: Salary Benchmarking for Specific Roles

Naukri discloses salary ranges on roughly 40% of postings — enough to build reliable benchmarks. This pattern aggregates salary data by experience band and generates a compensation report.

def build_salary_benchmark(role: str, location: str = "") -> str:
    """Generate a salary benchmark report for a specific role."""
    
    # Fetch across experience levels
    all_jobs = []
    for exp_range in [(0, 2), (2, 5), (5, 8), (8, 12), (12, 20)]:
        jobs = fetch_naukri_jobs(
            keyword=role,
            location=location,
            experience_min=exp_range[0],
            experience_max=exp_range[1],
            max_jobs=50,
        )
        for job in jobs:
            job["_exp_range"] = f"{exp_range[0]}-{exp_range[1]} years"
        all_jobs.extend(jobs)
    
    # Extract salary data
    salary_data = defaultdict(list)
    for job in all_jobs:
        salary_min = job.get("salaryMin")
        salary_max = job.get("salaryMax")
        exp_range = job.get("_exp_range", "unknown")
        
        if salary_min and salary_max:
            salary_data[exp_range].append({
                "min": salary_min,
                "max": salary_max,
                "midpoint": (salary_min + salary_max) / 2,
                "company": job.get("company"),
                "title": job.get("title"),
                "location": job.get("location"),
            })
    
    # Compute stats per band
    stats = {}
    for exp_range, salaries in salary_data.items():
        if salaries:
            midpoints = [s["midpoint"] for s in salaries]
            stats[exp_range] = {
                "sample_size": len(salaries),
                "median_midpoint_lpa": round(sorted(midpoints)[len(midpoints)//2] / 100000, 1),
                "p25_lpa": round(sorted(midpoints)[len(midpoints)//4] / 100000, 1),
                "p75_lpa": round(sorted(midpoints)[3*len(midpoints)//4] / 100000, 1),
                "min_seen_lpa": round(min(s["min"] for s in salaries) / 100000, 1),
                "max_seen_lpa": round(max(s["max"] for s in salaries) / 100000, 1),
            }
    
    response = claude.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1500,
        messages=[{
            "role": "user",
            "content": f"""Write a professional salary benchmark report for {role} roles in India{' in ' + location if location else ''}.

SALARY DATA BY EXPERIENCE BAND (amounts in LPA - Lakhs Per Annum):
{json.dumps(stats, indent=2)}

TOTAL JOBS WITH SALARY DISCLOSED: {sum(v['sample_size'] for v in stats.values())}

Write a clear, actionable report covering:
1. Salary ranges by experience level (0-2, 2-5, 5-8, 8-12, 12+ years)
2. Which experience transitions show the biggest compensation jumps
3. How these compare to market expectations (use your knowledge of India tech compensation)
4. Advice for a candidate negotiating salary at each level
5. Caveats (disclosed salary bias, location variance, company size differences)

Format with clear sections. Use LPA as the unit throughout."""
        }]
    )
    
    return response.content[0].text


# Example usage
report = build_salary_benchmark("python developer", location="Bangalore")
print(report)

Pattern 2: Skills Demand Heatmap

Which skills are companies actually requiring versus just nice-to-have? Job descriptions encode this implicitly. Claude can extract the signal from a batch of descriptions.

def skills_demand_analysis(role: str, max_jobs: int = 100) -> dict:
    """Extract skills demand from job descriptions."""
    
    jobs = fetch_naukri_jobs(
        keyword=role,
        max_jobs=max_jobs,
        include_descriptions=True,
    )
    
    # Filter to jobs that have descriptions
    jobs_with_desc = [j for j in jobs if j.get("description") and len(j.get("description", "")) > 100]
    
    all_skills = defaultdict(int)
    
    # Process in batches of 10
    batch_size = 10
    for i in range(0, len(jobs_with_desc), batch_size):
        batch = jobs_with_desc[i:i+batch_size]
        text = "\n\n---\n\n".join([
            f"Title: {j['title']}\nCompany: {j.get('company')}\n{j['description'][:600]}"
            for j in batch
        ])
        
        response = claude.messages.create(
            model="claude-haiku-4-5-20251001",
            max_tokens=600,
            messages=[{
                "role": "user",
                "content": f"""Extract all technical requirements from these {role} job descriptions.

{text}

Return a JSON object where keys are skills/technologies and values are the count of descriptions that mention them. Include:
- Programming languages
- Frameworks and libraries
- Databases
- Cloud platforms (AWS/GCP/Azure)
- Tools and methodologies
- Domain knowledge (e.g. "microservices", "ML/AI", "data pipelines")

Return only valid JSON. Normalize skill names (e.g., "postgres" -> "PostgreSQL")."""
            }]
        )
        
        try:
            batch_skills = json.loads(response.content[0].text)
            for skill, count in batch_skills.items():
                all_skills[skill] += count
        except json.JSONDecodeError:
            pass
    
    return dict(sorted(all_skills.items(), key=lambda x: x[1], reverse=True))


def generate_skills_report(role: str, location: str = "") -> str:
    skills = skills_demand_analysis(role)
    top_skills = dict(list(skills.items())[:30])
    
    response = claude.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1000,
        messages=[{
            "role": "user",
            "content": f"""Analyze this skills demand data for {role} roles in India{' (' + location + ')' if location else ''}.

TOP SKILLS BY FREQUENCY:
{json.dumps(top_skills, indent=2)}

Write a skills demand analysis covering:
1. Must-have skills (table stakes for any job application)
2. High-value differentiators (appear in 30-60% of postings)
3. Emerging skills worth learning now
4. Surprising findings or India-specific patterns
5. A 6-month learning roadmap for someone entering or leveling up in this role

Keep it practical. Developers reading this want to know what to focus on."""
        }]
    )
    
    return response.content[0].text

Pattern 3: Company Hiring Velocity Tracker

Track how many roles a specific company is actively recruiting for each month. A sudden spike indicates expansion; a drop may indicate a hiring freeze. For investors and job seekers alike, this is a meaningful leading indicator.

from datetime import datetime, timedelta

def track_company_hiring_velocity(
    company_names: list[str],
    role_category: str,
    months_back: int = 3,
) -> str:
    """Track and compare hiring velocity across companies."""
    
    company_data = {}
    
    for company in company_names:
        # Search by company name + role category
        jobs = fetch_naukri_jobs(
            keyword=f"{role_category} {company}",
            max_jobs=100,
        )
        
        # Filter to jobs actually at this company
        company_jobs = [
            j for j in jobs
            if company.lower() in j.get("company", "").lower()
        ]
        
        # Group by posting date
        monthly_counts = defaultdict(int)
        for job in company_jobs:
            posted_date = job.get("postedDate", "")
            if posted_date:
                try:
                    dt = datetime.fromisoformat(posted_date.replace("Z", "+00:00"))
                    month_key = dt.strftime("%Y-%m")
                    monthly_counts[month_key] += 1
                except ValueError:
                    pass
        
        company_data[company] = {
            "total_active_listings": len(company_jobs),
            "monthly_posting_counts": dict(monthly_counts),
            "avg_monthly": round(len(company_jobs) / max(months_back, 1), 1),
            "sample_titles": [j.get("title") for j in company_jobs[:5]],
        }
    
    response = claude.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1200,
        messages=[{
            "role": "user",
            "content": f"""Analyze hiring velocity data for these companies in the {role_category} space in India.

COMPANY HIRING DATA:
{json.dumps(company_data, indent=2)}

Write a hiring velocity analysis:
1. Which companies are growing fastest (by active listing volume)
2. What the role types reveal about each company's strategy
3. Any anomalies (sudden spikes or drops)
4. Interpretation for a job seeker: which company is most likely to close roles fastest vs. longest hiring cycle
5. Investment/market signal interpretation: what does relative hiring velocity say about company health

Be specific with numbers. This analysis is for decision-making, not general commentary."""
        }]
    )
    
    return response.content[0].text

Pattern 4: Remote Work Trend Tracker

The shift from WFH to hybrid to return-to-office has been uneven across India’s tech sector. This pattern quantifies where each company stands and how trends are shifting.

def remote_work_analysis(role: str, companies: list[str] = None) -> dict:
    """Analyze remote/hybrid/office distribution across a role category."""
    
    jobs = fetch_naukri_jobs(keyword=role, max_jobs=200)
    
    if companies:
        jobs = [j for j in jobs if any(
            c.lower() in j.get("company", "").lower() for c in companies
        )]
    
    # Classify work mode
    work_mode_counts = {"remote": 0, "hybrid": 0, "office": 0, "unspecified": 0}
    company_work_modes = defaultdict(lambda: defaultdict(int))
    
    for job in jobs:
        work_type = job.get("workType", "").lower()
        location = job.get("location", "").lower()
        company = job.get("company", "Unknown")
        
        if "work from home" in work_type or "remote" in work_type or "wfh" in location:
            mode = "remote"
        elif "hybrid" in work_type or "hybrid" in location:
            mode = "hybrid"
        elif work_type or location:
            mode = "office"
        else:
            mode = "unspecified"
        
        work_mode_counts[mode] += 1
        company_work_modes[company][mode] += 1
    
    return {
        "overall_distribution": work_mode_counts,
        "total_jobs_analyzed": len(jobs),
        "by_company": {
            company: dict(modes)
            for company, modes in list(company_work_modes.items())[:20]
        }
    }

Scheduling and Automation

Run weekly and store outputs for trend tracking:

import schedule

def weekly_india_market_report():
    roles_to_track = ["python developer", "data engineer", "ml engineer", "devops engineer"]
    
    report_sections = []
    for role in roles_to_track:
        salary_section = build_salary_benchmark(role, location="Bangalore")
        skills_section = generate_skills_report(role)
        report_sections.append(f"## {role.title()}\n\n### Salary\n{salary_section}\n\n### Skills\n{skills_section}")
    
    full_report = "# India Tech Market Weekly Report\n\n" + "\n\n---\n\n".join(report_sections)
    
    with open(f"india_market_report_{datetime.now().strftime('%Y_%m_%d')}.md", "w") as f:
        f.write(full_report)
    
    print("India market report generated.")

schedule.every().monday.at("07:00").do(weekly_india_market_report)

Why This Data Matters

India’s tech talent market is one of the fastest-moving in the world, but compensation data is opaque and skills demand shifts quickly. A data pipeline that tracks both weekly gives HR teams, founders, and developers a structural advantage — knowing what the market pays before sitting down to negotiate, and knowing which skills are becoming table stakes before they show up in every job description.

The Naukri scraper makes the data collection tractable. Claude makes the interpretation fast. The combination turns what would be a days-long research project into a scheduled script.

Frequently Asked Questions

How do you benchmark salaries for tech roles in India using Naukri job data?

Collect 200-500 postings per role category using the Naukri scraper with includeJobDescription: true. Parse salary ranges using regex for the “X-Y Lacs PA” format — about 40% of postings disclose salary. Compute median, P25, P75 midpoint by experience band (0-2, 2-5, 5-8, 8-12, 12+ years). Pass the statistics to Claude Sonnet with a prompt requesting a narrative compensation report covering salary bands, transition jump sizes, location variances, and negotiation advice. The result is comparable to what compensation consultancies sell for thousands of dollars.

How does skills demand analysis from Naukri job descriptions work?

Fetch job descriptions with includeJobDescription: true and process in batches of 10 through Claude Haiku. Ask Claude to return a JSON object mapping skill names to the count of descriptions mentioning them, with normalized names (“postgres” → “PostgreSQL”). Aggregate counts across all batches and sort by frequency. Skills appearing in 60%+ of descriptions are table stakes; 30-60% are differentiators; under 15% are emerging. Run this monthly to detect which skills are crossing adoption thresholds.

How can you track hiring velocity as a signal for company health or market trends?

Search by company name using the Naukri scraper and group results by posting date. Count new postings per month per company. A company going from 5 postings/month to 25 over two months is in active expansion; a drop to zero signals a hiring freeze. For market-level analysis, aggregate velocity across all companies in a category — a sector-wide hiring surge precedes revenue growth by 3-6 months, making it a leading economic indicator for India’s tech sector.

What does India’s remote/hybrid work distribution look like in 2025 based on job data?

Q1 2025 Naukri data shows hybrid at 44% of tech postings (up from 28% in 2023), full remote at 18% (down from 40%+ pandemic peak), and full office at 38%. The distribution varies significantly by company type: product companies and startups skew hybrid (55%+); GCCs and IT services firms skew office (60%+). Bangalore and Hyderabad have higher remote availability than other cities. These figures shift quarterly — the scraper makes it easy to track the trend in real time.

How do you automate India tech market intelligence reporting using Claude?

Build a weekly pipeline: Monday morning, trigger the Naukri scraper for 3-5 role categories; Wednesday, run salary benchmarking and skills extraction using Claude Haiku in batches; Thursday, pass aggregated statistics to Claude Sonnet to generate a structured report covering compensation trends, skills demand shifts, company velocity highlights, and work-mode distribution. Save the report as markdown and email it or post to Slack. The full pipeline runs in under 3 hours and costs $10-15/month — less than one hour of analyst time.

Related Actor

Try the scraper referenced in this article — live on Apify, pay only for results.

Open naukri-jobs on Apify →