Web Scraping Cloudflare-Protected Sites in 2026

Cloudflare protects over 20% of all websites on the internet. If you're building a web scraper in 2026, you will inevitably run into Cloudflare protection — whether it's a Turnstile widget on a login form, a JS Challenge interstitial, or full Bot Management at the WAF level. This guide breaks down each protection type, explains the most common bypass approaches, and shows you the fastest way to get through them with working code.

Types of Cloudflare protection

Cloudflare offers three main layers of protection that you'll encounter when scraping. Understanding which one you're facing is the first step to bypassing it.

Turnstile (CAPTCHA Widget)

An embedded CAPTCHA widget on forms. Generates a token that the server verifies. Common on login, signup, and checkout pages.

JS Challenge (Browser Check)

A full-page interstitial that runs JavaScript checks in the browser. Sets cf_clearance cookies on success. Blocks non-browser clients.

Bot Management (WAF Rules)

Server-side fingerprinting and behavioral analysis. Inspects TLS fingerprint, IP reputation, request patterns. Can block even real browsers.

How to identify the protection type

Before choosing a bypass method, figure out what you're dealing with:

•Turnstile: Look for a «div» with class cf-turnstile or a script loading challenges.cloudflare.com/turnstile. The HTML source will contain a data-sitekey attribute.
•JS Challenge: You'll see a full-page "Checking your browser..." screen. The response has a 403 status code and sets a __cf_bm cookie. After passing, you get cf_clearance.
•Bot Management / WAF: You get 403 Forbidden immediately with no challenge page. Or you see a Cloudflare "Access denied" error page. Often based on IP reputation or TLS fingerprinting.

Common bypass approaches compared

There are three mainstream approaches to bypassing Cloudflare. Each has different trade-offs for speed, reliability, and cost.

Approach	Speed	Reliability	Detection Risk	Cost/1K
Headless Browser	5-30s	60-80%	High	$2-5
Stealth Plugins	5-20s	70-85%	Medium	$1-3
Solver APIRecommended	0.25-5s	95%+	None	$0.80-1.00

When to use what

Different protection types require different approaches. Here's a quick decision guide:

Turnstile CAPTCHA — Solver API for token

Send the siteKey and page URL to a solver API. You get back a valid token that you submit with your form data. No browser needed — just plain HTTP requests. NSLSolver solves Turnstile in ~250ms.

JS Challenge — Solver API for cookies + User Agent

The solver runs the JavaScript challenge and returns cf_clearance cookies and the matching User-Agent string. Use both in your subsequent requests with the same IP/proxy. NSLSolver returns these in 2-5 seconds.

Bot Management / WAF — Residential proxies + Solver API

WAF rules inspect your IP reputation and TLS fingerprint. Use a residential proxy to pass IP checks, and a solver API to handle any challenge pages. Always match the User-Agent from the solve response.

Solving Turnstile with NSLSolver

For Turnstile-protected forms, you only need the siteKey and the page URL. The API returns a token you submit with the form:

Python

import requests

API_KEY = "nsl_YOUR_API_KEY"

# 1. Solve the Turnstile widget
solve = requests.post(
    "https://api.nslsolver.com/solve",
    headers={"X-API-Key": API_KEY},
    json={
        "type": "turnstile",
        "siteKey": "0x4XXXXXXXXXXXXXXXXX",
        "url": "https://target-site.com/login"
    }
)
token = solve.json()["token"]

# 2. Submit the form with the solved token
result = requests.post(
    "https://target-site.com/login",
    data={
        "username": "[email protected]",
        "password": "your_password",
        "cf-turnstile-response": token
    }
)
print(result.status_code)  # 200

Solving CF Challenge with NSLSolver

For JS Challenge pages, the API runs the browser check and returns the clearance cookies and User-Agent. You must use these together on the same IP:

Python

import requests

API_KEY = "nsl_YOUR_API_KEY"

# 1. Solve the CF Challenge — returns cookies + user agent
solve = requests.post(
    "https://api.nslsolver.com/solve",
    headers={"X-API-Key": API_KEY},
    json={
        "type": "challenge",
        "url": "https://target-site.com/protected-page"
    }
)
data = solve.json()

# 2. Use the returned cookies and user agent
session = requests.Session()
session.headers["User-Agent"] = data["userAgent"]
for cookie in data["cookies"]:
    session.cookies.set(cookie["name"], cookie["value"])

# 3. Access the protected page
page = session.get("https://target-site.com/protected-page")
print(page.status_code)  # 200
print(page.text[:500])

Handling Bot Management / WAF

When a site uses advanced Bot Management, you need to combine a residential proxy with the solver API. Pass your proxy to the solve request so the cookies are bound to that IP:

Python

import requests

API_KEY = "nsl_YOUR_API_KEY"
PROXY = "http://user:[email protected]:8080"

# 1. Solve the challenge through a residential proxy
solve = requests.post(
    "https://api.nslsolver.com/solve",
    headers={"X-API-Key": API_KEY},
    json={
        "type": "challenge",
        "url": "https://target-site.com/data",
        "proxy": PROXY
    }
)
data = solve.json()

# 2. Use the same proxy + cookies for subsequent requests
session = requests.Session()
session.proxies = {"http": PROXY, "https": PROXY}
session.headers["User-Agent"] = data["userAgent"]
for cookie in data["cookies"]:
    session.cookies.set(cookie["name"], cookie["value"])

page = session.get("https://target-site.com/data")
print(page.json())

Performance: headless browser vs solver API

Running your own headless browser setup might seem like the DIY approach, but the numbers tell a different story:

Metric	Headless Browser	NSLSolver API
Turnstile solve time	5-15s	~250ms
Challenge solve time	10-30s	2-5s
Success rate	60-80%	95%+
Monthly cost (10K solves)	$50-200	$8-10
Maintenance	High — constant patching	None

Based on internal benchmarks as of April 2026. Headless browser costs include server, proxy, and browser maintenance.

Cost analysis: build vs buy

Let's break down the real cost of running your own headless browser scraping infrastructure versus using a solver API:

Self-hosted headless browsers

$50-200/mo

VPS/cloud server with 4+ GB RAM
Residential proxy bandwidth
Browser patching & anti-detect updates
DevOps time for maintenance

NSLSolver API

$0.80-1.00/1K solves

Pay only per solve — no fixed costs
No servers to maintain
No proxy management needed (for Turnstile)
100 free solves to start

Best practices for Cloudflare scraping

Regardless of which method you use, follow these practices to maximize your success rate and avoid getting blocked:

•Rotate proxies: Don't send all requests through the same IP. Use a pool of residential proxies and rotate per session or per request batch.
•Match User-Agents: Always use the exact User-Agent string returned by the solver. Cloudflare ties cf_clearance cookies to the User-Agent that solved the challenge.
•Handle rate limits: Respect 429 responses. Implement exponential backoff (1s, 2s, 4s, 8s). Don't hammer the target site — spread requests over time.
•Add retry logic: Transient failures happen. Retry failed solves 2-3 times before giving up. The production example above shows this pattern.
•Reuse sessions: Once you have valid cookies, reuse them for multiple requests. cf_clearance cookies typically last 15-30 minutes. Don't re-solve for every request.
•Mind your TLS fingerprint: If using your own HTTP client, ensure its TLS fingerprint matches a real browser. Libraries like curl_cffi (Python) or got-scraping (Node.js) help with this.

Production-ready scraper example

Here's a complete Python scraper that handles Cloudflare Challenge pages with proxy rotation, retry logic, and session reuse:

scraper.py

import requests
import time
import random

API_KEY = "nsl_YOUR_API_KEY"
BASE_URL = "https://api.nslsolver.com/solve"

PROXIES = [
    "http://user:[email protected]:8080",
    "http://user:[email protected]:8080",
]

def solve_cloudflare(url, solve_type="challenge", retries=3):
    """Solve Cloudflare protection with retry logic."""
    for attempt in range(retries):
        try:
            proxy = random.choice(PROXIES)
            resp = requests.post(
                BASE_URL,
                headers={"X-API-Key": API_KEY},
                json={
                    "type": solve_type,
                    "url": url,
                    "proxy": proxy,
                },
                timeout=30,
            )
            resp.raise_for_status()
            data = resp.json()

            if data.get("success"):
                return data, proxy

        except requests.RequestException as e:
            print(f"Attempt {attempt + 1} failed: {e}")

        if attempt < retries - 1:
            time.sleep(2 ** attempt)  # Exponential backoff

    return None, None

def scrape_page(url):
    """Scrape a Cloudflare-protected page."""
    data, proxy = solve_cloudflare(url)
    if not data:
        raise Exception("Failed to solve Cloudflare challenge")

    session = requests.Session()
    session.headers["User-Agent"] = data["userAgent"]
    session.proxies = {"http": proxy, "https": proxy}

    for cookie in data["cookies"]:
        session.cookies.set(cookie["name"], cookie["value"])

    response = session.get(url)
    response.raise_for_status()
    return response.text

# Usage
html = scrape_page("https://target-site.com/data")
print(f"Got {len(html)} bytes")

Pro tip: cf_clearance cookies are valid for 15-30 minutes. Cache and reuse them across requests to the same domain to avoid unnecessary solve calls and save money.

Start scraping Cloudflare sites today

100 free solves on signup. No credit card required. Turnstile in ~250ms, Challenge in 2-5s.

Create Free Account View Pricing →

Web Scraping Cloudflare-Protected Sites in 2026 — Complete Guide

Types of Cloudflare protection

How to identify the protection type

Common bypass approaches compared

When to use what

Solving Turnstile with NSLSolver

Solving CF Challenge with NSLSolver

Handling Bot Management / WAF

Performance: headless browser vs solver API

Cost analysis: build vs buy

Best practices for Cloudflare scraping

Production-ready scraper example

Start scraping Cloudflare sites today