How Modern Bot Detection Actually Works: A Technical Deep Dive

Ijaz Ur Rahim●

10 Mar 2026

●

12 min read

Contents

Layered client-side architecture

Obfuscation techniques

Bytecode VMs in bot detection

What gets fingerprinted

Signal encryption and transmission

Key takeaways

FAQ

Modern commercial bot detection systems protect thousands of websites, sitting between users and servers to fingerprint every request at multiple protocol layers before deciding if a visitor is human. I spent several weeks reverse engineering one such system's entire client-side stack: deobfuscating JavaScript, disassembling a custom bytecode virtual machine, and mapping over 200 browser signals it collects.

This post covers what I found - from the obfuscation layers protecting detection code, to the VM architecture hiding the most sensitive fingerprinting logic, to the categories of signals these systems check. If you're building bot defenses or doing security research, understanding how the current state of the art works is essential for building better systems.

Disclaimer

This research was conducted for educational purposes and authorized security testing. The techniques described here are presented to help security professionals understand and improve bot detection systems.

The specific vendor analyzed is not named to avoid enabling circumvention. Always ensure you have proper authorization before testing any system's defenses.

Layered client-side architecture

Modern bot detection doesn't ship a single script. Detection typically runs across multiple JavaScript bundles, each with a different role and different protection level:

Component	Role	When It Runs
Primary fingerprinter	Always-on signal collection	Every page load
Challenge bundle	Interactive challenge UI + extra checks	Challenge pages only
Invisible challenge bundle	Invisible challenge + extra checks	Flagged pages only
VM module	Bytecode-protected sensitive checks	Challenge pages only

The primary fingerprinting script loads on every protected page, collects hundreds of browser signals, encrypts them, and sends them to the detection servers. The challenge bundles only load when a visitor has been flagged - they run the same signal collection plus additional checks, and include a bytecode VM module for the most sensitive fingerprinting logic.

Layered architecture of a bot detection system.png

Obfuscation techniques

All scripts in the system I analyzed use the same obfuscation toolkit. Understanding one unlocks all of them. These are common techniques across the industry, not unique to any single vendor.

String encoding

Every string literal in the code is replaced with a function call into a lookup array. There are typically two encoding schemes per script - one using standard base64 and another using a custom base64 alphabet unique to each bundle:

// What the obfuscated code looks like
var x = i(234);  // Standard base64 decode from lookup array
var y = o(1560); // Custom base64 decode from different array

// After decoding
var x = "navigator";
var y = "webdriver";
f

Each script has its own unique custom base64 alphabet, meaning a generic decoder won't work - you need to extract the alphabet from each script individually. Custom AST-based decoders can handle this. Across the system I analyzed, decoding recovered thousands of unique strings from tens of thousands of encoded function calls.

Mixed Boolean Arithmetic (MBA)

Array indices aren't just numbers - they're wrapped in MBA expressions that compute the index at runtime:

// Instead of e(42), the code has:
e(k(123, 456))

// Where k is something like:
function k(A, B) {
  return (A & B) * 2 + (A & ~B) * 7 - ~(A & B) * 6
       + ~(A | B) * 5 + ~(A | ~B) * 6 + ~A * 1;
}
f

The number of MBA functions varies by bundle - from a handful in the primary fingerprinter to dozens in the challenge bundles. Simplifying MBA expressions is a known hard problem in deobfuscation, but since these are all constant expressions with known inputs, you can just evaluate them directly.

Control flow flattening

Scripts contain large 2D lookup tables used as state machine dispatchers. Instead of if/else chains or switch statements, the code jumps between states using table lookups:

var Q = (function() {
  var D = [];
  for (w = 0; w < 128; w++) D[w] = new Array(512);
  for (A = 0; A < 512; A++)
    for (g = 0; g < 128; g++)
      D[g][A] = D[hashFunction(g, ..., 128, A)];
  return D[seed];
})();
f

This makes the control flow non-linear and impossible to follow statically without resolving the lookup table. The standard approach is to run the table construction, dump the resolved values, then replace table lookups with direct jumps in the deobfuscated output.

Bundle structure

Under the obfuscation, the code is typically a standard module bundle (Browserify, Webpack, etc.). The primary fingerprinter unbundles into around a dozen modules - the largest being the core detection engine. The challenge bundles split into modules covering the slider UI, mouse trajectory analysis, canvas fingerprinting, and the VM.

Tools like webcrack handle the initial unbundling, after which custom string decoders can be applied to the extracted modules. The result: fully readable JavaScript with descriptive string literals.

Bytecode VMs in bot detection

This is where detection systems get serious. Challenge bundles can include a custom virtual machine that decodes a large base64-encoded bytecode blob, loads it into memory alongside VM registers and padding, and executes it instruction by instruction.

Why a VM?

The VM exists to hide the most sensitive fingerprinting checks from static analysis. When fingerprinting logic lives in JavaScript, anyone with browser DevTools can set breakpoints, inspect variables, and understand what's being checked.

With a bytecode VM, the logic is opaque. It's just a stream of bytes being interpreted by a generic execution loop. You can't set breakpoints on specific checks because they don't exist as JavaScript statements. The entire fingerprinting routine becomes a black box that requires full reverse engineering to understand.

Architecture

A typical detection VM operates on a single large array containing several regions:

Region	Purpose
PRNG padding	Random noise to hinder memory analysis
VM registers	Instruction pointer, stack pointer, frame pointer, opcode dispatch table
Bytecode	The actual program
Register file	General-purpose registers

These VMs use a hybrid register-stack architecture: operations push and pop from a stack, but also read/write named registers at fixed offsets. The execution loop is straightforward - fetch a byte, look up the handler function in an opcode table, execute it, dispatch to the next instruction. A hard iteration limit prevents infinite loops.

Instruction set

The instruction set covers everything you'd expect from a general-purpose VM:

Arithmetic and bitwise: ADD, SUB, MUL, DIV, MOD, AND, OR, XOR, shifts
Comparison: strict/loose equality, greater/less than, in operator
Control flow: forward/backward jumps, conditional jumps
Functions: closure creation with captured variables, return, halt
Property access: get/set properties, immediate property access
Objects: object/array creation, for-in iteration

What makes disassembly harder is the presence of many fused opcodes - compound instructions that combine multiple micro-operations into a single opcode for performance. For example, a single opcode might XOR the top of stack, store it to a register, pop, then push two other registers. You need to understand each compound operation to follow the control flow.

String encryption in bytecode

Strings inside bytecodes aren't stored in plaintext. They're typically XOR-encrypted with rolling keys - each character is XOR'd with a key value that changes for the next character, and strings are null-terminated. This prevents simple string extraction from the bytecode blob.

What VMs actually check

After full disassembly, the VM typically adds several checks that the primary fingerprinting script does NOT perform:

CSS Box Model fingerprinting - Creates hidden elements, sets specific CSS properties, and measures computed styles. Headless browsers with broken CSS layout engines produce different measurements than real browsers.
Animation API timing - Checks Web Animations API values that are absent or return unusual values in non-rendering environments.
Document metadata - Checks that can detect synthetically injected pages not loaded through normal navigation.
Navigation timing validation - Compares multiple timing sources to detect time manipulation.
Binary fingerprint construction - Instead of JSON-encoding signals, VMs can build raw binary buffers, encrypt them, and base64-encode them. This binary format is harder to spoof because you need to match the exact byte layout the server expects.

Anti-analysis features

Detection VMs have multiple layers designed to frustrate reverse engineering:

Bytecode opacity: All logic lives in bytecode, not JavaScript. No source-level breakpoints work.
String encryption: Rolling-key encryption on all bytecode strings.
PRNG padding: Memory positions that don't contain bytecode are filled with PRNG output, making dumps harder to parse.
Register indirection: Property names are stored in registers, not as inline string literals.
Timing measurement: High-resolution timers are called before and after VM execution, so debugging slowdown is detectable.

VM architecture and anti-analysis layers.png

What gets fingerprinted

After full deobfuscation, these are the major categories of signals that modern bot detection systems collect. The system I analyzed had over 200 individual signals.

Automation markers

Detection systems check for known automation framework globals:

WebDriver flag - The standard navigator.webdriver property, checked in both the main frame and iframes (catches spoofing that only patches the main frame)
Framework-specific globals - Automation tools like Selenium, Playwright, and Puppeteer leave identifiable global variables and properties
Legacy automation tools - Older tools like PhantomJS and Nightmare are still checked by name

Plugin and API integrity

Modern detection doesn't just check surface-level values. It runs multiple integrity checks on browser APIs:

Count verification - Does the reported number match the actual array?
Property descriptor checks - Are properties on the instance or the prototype? Real browsers keep certain properties on the prototype.
Native code verification - Does toString() show [native code]? What about toString.toString()?
Prototype chain consistency - Properties overridden via Object.defineProperty are detectable through descriptor comparison between the instance and its prototype.

If you patch browser APIs with spoofed values, these layered integrity checks will catch inconsistencies between the count, the prototype chain, the toString() output, and the property descriptors.

Native function verification

Bot detection uses a multi-layered approach to detect monkey-patched browser APIs:

// Primary check
fn.toString().match(/\{\s*\[native code\]\s*\}$/m)

// Double toString (catches toString spoofing)
fn.toString.toString()  // Must also match native pattern

// Debug detection
fn.toString().includes('("debug",arguments);')

// Property descriptor check
Object.getOwnPropertyDescriptor(navigator, "platform")
// Detects instance-level overrides vs prototype-level
f

The prototype chain check is clever - if you override a property on the navigator instance, getOwnPropertyDescriptor will show it exists on the instance rather than the prototype. Real browsers don't have certain properties on the instance level.

Audio fingerprinting

Detection systems collect many audio-related signals using the AudioContext API - channel counts, sample rates, offline rendering results, codec support. Each signal also has a timing companion that measures how long each API call takes, useful for detecting VM environments where audio APIs are stubbed with instant returns.

Screen and window geometry

Numerous measurements of the browser window and screen:

Viewport dimensions (inner, outer, client)
Screen dimensions and available area
Color depth and pixel ratio
DevTools detection - the difference between outer and inner dimensions can reveal if developer tools are open
Fullscreen detection - minimal difference between outer and inner height indicates fullscreen mode

WebGL fingerprinting

The WEBGL_debug_renderer_info extension exposes the actual GPU hardware. Headless browsers running on servers with no GPU report software renderers - an immediate bot signal.

Signal encryption and transmission

Collected signals don't leave the browser in plaintext. A typical encoding pipeline:

Signals serialized to key-value pairs
UTF-8 encoded
Encrypted with a stream cipher using a PRNG-derived keystream (seed provided by the server)
Custom encoding applied
Sent as POST body to the detection endpoint

The server-provided seed means each page load gets a different encryption key. You can't precompute encrypted payloads - the encryption is bound to the session.

Signal collection and encryption pipeline.png

Key takeaways

Layered obfuscation works as defense-in-depth. String encoding, MBA expressions, and control flow flattening each add barriers that compound in difficulty. Even if one layer is broken, the others still slow analysis.

Bytecode VMs provide the strongest protection for sensitive logic. When fingerprinting logic lives in bytecode rather than JavaScript, it requires building a full disassembler before analysis can even begin.

Integrity checks are more valuable than raw data collection. Checking that navigator.plugins reports a consistent count across property descriptors, prototype chains, and toString() output catches more spoofing attempts than just reading the value.

Timing signals catch simulation. Measuring how long API calls take, not just their return values, detects environments where APIs are stubbed.

Multi-layer detection is essential. TLS fingerprinting at the network layer, JavaScript fingerprinting at the browser layer, and behavioral analysis at the interaction layer each catch different classes of bots.

Session-bound encryption prevents replay. Server-provided seeds for signal encryption mean captured payloads can't be reused across sessions.

The system I analyzed collected 200+ signals across automation markers, API integrity checks, audio fingerprinting, screen geometry, and WebGL - all encrypted per-session before transmission.

FAQ

Can you detect bots without a real browser?

For initial page loads, network-layer signals like TLS fingerprinting are effective without any client-side JavaScript. But the strongest detection comes from challenges that require actual rendering - CSS measurement, animation timing, and binary fingerprint construction all need a genuine browser engine. This is why defense-in-depth across network and browser layers is so important.

Why do detection systems use custom VMs instead of just more JavaScript obfuscation?

JavaScript obfuscation, no matter how complex, can be reversed with enough patience because the code eventually executes as readable JS statements. A bytecode VM adds a fundamental barrier - the logic never exists as JavaScript. An attacker needs to first reverse the VM architecture, build a disassembler, then analyze thousands of bytecode instructions across dozens of closures. VMs also enable anti-debugging (timing checks on execution speed) and make it easy to update fingerprinting logic by shipping new bytecode without changing the interpreter.

What's the difference between an invisible challenge and a captcha challenge?

An invisible challenge runs entirely in the background - the detection JavaScript collects browser fingerprints and submits them automatically. If the fingerprint checks out, the visitor gets a cookie and a redirect back to the site with no user interaction needed. A captcha is an interactive challenge - typically a slider puzzle requiring human input. The captcha page collects the same fingerprints as the invisible challenge, plus mouse movement trajectory data (velocity, acceleration, curvature) to verify human-like input. Invisible challenges are the common case; captchas appear when the signals are suspicious but not definitively bot-like.

How can defenders stay ahead of evolving bot techniques?

Focus on signals that are fundamentally hard to fake: timing consistency across multiple APIs, prototype chain integrity, CSS rendering behavior, and cross-frame consistency checks. Rotate encryption seeds per session. Use bytecode VMs for your most sensitive checks so they can be updated independently of the JavaScript shell. And layer your detection - network fingerprinting, browser fingerprinting, and behavioral analysis each catch different evasion strategies.

Related Squids

Try lobstr.io for free today!

No captcha free data

Start now

How Modern Bot Detection Actually Works: A Technical Deep Dive

Layered client-side architecture

Obfuscation techniques

String encoding

Mixed Boolean Arithmetic (MBA)

Control flow flattening

Bundle structure

Bytecode VMs in bot detection

Why a VM?

Architecture

Instruction set

String encryption in bytecode

What VMs actually check

Anti-analysis features

What gets fingerprinted

Automation markers

Plugin and API integrity

Native function verification

Audio fingerprinting

Screen and window geometry

WebGL fingerprinting

Signal encryption and transmission

Key takeaways

FAQ

Can you detect bots without a real browser?

Why do detection systems use custom VMs instead of just more JavaScript obfuscation?

What's the difference between an invisible challenge and a captcha challenge?

How can defenders stay ahead of evolving bot techniques?

TAGS

Related Articles

Related Squids