← notes

Anti-bot defences in 2026: a practitioner's view of what holds up and what doesn't

The anti-bot stack has layered from IP reputation to fingerprinting to behavioural ML. Here's what each layer actually catches, where the defender economics still work, and what vision-capable agents are starting to change.

Published 2026-04-20 · security, AI/ML, bot detection


The honest frame for bot defence isn’t “how do we stop automation?” — because the answer at the limit is “you can’t, if the attacker is patient and funded enough.” The honest frame is “how do we raise the cost per successful request until the attacker’s business model collapses?” A defender who pushes the cost from $0.001 to $0.10 per success has killed most scraping, ticket-reselling, and credential-stuffing economics in one move, without ever hitting 100% detection.

That cost-asymmetry view is the one thing I keep wishing more teams internalised before picking tools. You’ll ship a better stack if you start from “what does the attacker currently pay per success, and by how much can we raise it” than from “what’s the best bot-detection product.”

Here’s what the stack actually looks like in 2026, layer by layer, and where the honest gaps are.

Layer 1 — IP reputation and velocity

Still the cheapest control, still valuable. Block known hosting ranges, rate-limit obvious patterns, filter on ASN. What it catches: commodity scrapers running from AWS / Hetzner / OVH that never bothered to hide. What it misses: essentially all serious actors, because the residential-proxy ecosystem exists specifically to move traffic out of datacentre IP space and into networks indistinguishable from real consumers. Recent industry reports put residential-proxy traffic somewhere above half of automated traffic against large targets; on categories like sneaker drops and ticket sales, it’s the majority.

The question a defender should ask isn’t “how do I block residential IPs” (you can’t, they belong to real people) but “what signal separates a consumer browsing from my site from a consumer whose home connection is being rented out”. That question pushes you up the stack.

Layer 2 — JS challenges and runtime integrity

Proof-of-work, SRI (Subresource Integrity), loader-count verification, runtime environment checks. Intended to raise the per-request compute cost or force an attacker into a full browser. What it catches: headless clients that can’t execute JS, scrapers using requests / httpx directly, naïve Puppeteer setups that leak navigator.webdriver. What it misses: anyone running a real browser engine under automation — Playwright, Puppeteer with stealth, CDP-based tools. These are browsers. They pass JS challenges because they run the JS.

The useful observation here: SRI wasn’t designed as a bot control. It’s a supply-chain integrity primitive — it stops a CDN from serving you tampered code. Using it as a bot defence is repurposing it, and any time a primitive is being repurposed outside its original threat model, the failure surface grows. Treat SRI as table stakes for legitimate integrity and don’t expect it to carry load it wasn’t built for.

Layer 3 — browser fingerprinting

Canvas, WebGL, audio-context, font enumeration, navigator properties, timing side-channels. Builds a stable-ish identifier per browser that survives cookie clearing. What it catches: automation that forgets to spoof fingerprints, or spoofs them inconsistently (e.g. claims macOS in the User-Agent but renders WebGL like a Linux box).

Where this layer started to lose ground is the availability of fingerprint-spoofing tooling that matches real-browser distributions convincingly — including tooling that samples from real-browser corpuses rather than inventing fingerprints from scratch. If the attacker is drawing fingerprints from a distribution the defender also sees in their real traffic, fingerprinting-alone becomes a false-positive minefield. Privacy regulation (GDPR, CCPA, Apple ITP) has also forced defenders to be more selective about what they collect, which shrinks the signal.

Fingerprinting is still useful as one input to a scoring model. It’s not a standalone defence any more.

Layer 4 — behavioural signals

Mouse movement kinematics, scroll dynamics, keystroke timing, touch-pressure curves on mobile. What it catches: automation that doesn’t simulate interaction at all, or simulates it with robotic constant-velocity movement. What it misses: attackers who record real human sessions and replay the input distributions, and — increasingly — agents that generate realistic pointer trajectories procedurally.

Behavioural modelling is the layer that rewards defenders with lots of real user data and punishes defenders without it. It’s also the layer that struggles most with accessibility users and assistive devices, which produce atypical signals that look bot-shaped.

Layer 5 — ML classifiers and vision challenges

The current frontier. Score-based risk models (reCAPTCHA v3-style), image classification puzzles (hCaptcha, Arkose’s FunCaptcha), behavioural ML that combines every signal from layers 1–4 into a per-request risk score. What it catches: attackers who haven’t adapted to a specific defence. What it still mostly catches today: solver farms that treat each challenge as an isolated API call.

Where this gets interesting — and where I think the frontier actually is — is the vision-capable agent question.

What vision-capable agents are changing

Two years ago, “identify every square containing a traffic light” was an asymmetric tax: seconds of human time, tens of seconds of compute time, cents of API cost to solve at scale. That asymmetry underwrote an entire category of defence.

It’s collapsing. Commodity vision-language models now classify standard CAPTCHA images at near-zero marginal cost. A local Qwen2.5-VL on a consumer GPU solves the same puzzles for effectively free once the model is loaded. The cost-per-solve curve that defenders priced their controls against two years ago no longer describes the current attacker.

This doesn’t mean vision challenges are dead. It means the specific class of “classify this static image” is mostly spent, and the defences that still work are the ones that:

  • Require multi-step interaction over time — a vision model can see the puzzle, but solving it requires a sequence of actions whose cost compounds.
  • Combine the visual challenge with a behavioural one — drag this slider in a way that looks human, which a VLM can propose but a dumb executor can’t convincingly perform.
  • Push the cost onto hardware the attacker doesn’t control at scale — on-device attestation, TPM-signed challenges, WebAuthn flows.

The honest read is that this is an active adversarial ML problem now, not a signatures problem. Both sides train models. The defender’s advantage is labelled data from their own traffic; the attacker’s advantage is a generative model that generalises across targets. Whose advantage is bigger depends on the scale and sophistication of both.

What a builder should be thinking about

Practical framing I wish I’d had earlier:

  1. Measure attacker cost per success, not detection rate. A defence that catches 60% of bots but doubles the attacker’s per-request cost may be better than one that catches 95% with zero cost impact on the other 5%.
  2. Defence in depth, not a single flagship signal. Each layer is individually beatable. What matters is the stack cost.
  3. Instrument the feedback loop. Detection feeds training feeds detection. Static rule sets rot in months against an attacker who iterates.
  4. The legitimate-user cost is the real constraint. A control that adds 2% friction to checkout costs more real revenue than whatever bots you were catching. This is the calculation most vendors won’t do for you.
  5. Watch the cost curve of general-purpose AI. Every time the price of “look at a screen and decide what to do” drops another order of magnitude, an entire category of defence needs to be re-costed. That’s the meta-trend.

What’s still unsolved

Three honest gaps, in order of how much they bother me:

  • Vision-capable agents running locally at zero marginal cost. A $400 consumer GPU running a 4B-ish VLM defeats most image-based challenges today for free. The ten-years-from-now version of this is worse, not better.
  • Behavioural models need data defenders don’t always have. If you’re not at scale, your baseline for “what normal users look like” is too noisy to separate signal from noise. This disadvantages smaller defenders structurally.
  • Some attacks can’t be stopped at the edge. A well-funded attacker who behaves exactly like a human user is a user to the edge. The detection has to move downstream — into transaction graphs, trust signals, post-hoc anomaly detection — and those live in different parts of the stack than most bot-defence products sell into.

None of this is bleak. It’s the normal shape of an adversarial problem: the solved part keeps expanding, the frontier keeps moving, and the good defenders are the ones who plan for the next shift rather than optimising the last one.