RTB systems must evaluate, score, and respond to bid requests within 10ms at 100,000+ queries per second. Here is how the infrastructure works at that scale.
A bid request arrives from the ad exchange. The bidding infrastructure has 10ms to evaluate the impression, score it against active campaigns, compute a bid price, and return a response.
Miss the deadline and the impression is gone. There is no retry. At 100,000+ QPS, even a 1% timeout rate means 1,000 lost opportunities per second.
Co-locate bidders with exchanges to reclaim evaluation time
Network latency between the bidder and the ad exchange directly reduces time available for bid evaluation. A bidder 50ms of network distance from the exchange has already lost before its code runs.
DSP development teams that co-locate bidder instances with major exchanges reclaim critical milliseconds:
Deploying in 6-8 global data centers reclaims 5-15ms of evaluation time per request compared to centralized deployment
Auto-scaling groups in each region respond to traffic patterns - US East peaks during business hours while APAC scales down
Regional bidder instances maintain local copies of campaign targeting data, synchronized every 5-10 seconds from the central store
Fiber optic cables between data centers carry synchronization traffic, but the bid evaluation path never crosses regional boundaries
Data center selection is a first-order engineering decision for any demand-side platform. Every millisecond of network distance translates directly into lower win rates.
Co-location with ad exchanges reclaims 5-15ms of evaluation time per bid request
Eliminate memory allocation from the bid evaluation hot path
At 100K QPS, memory allocation patterns determine whether the system meets its latency budget. Garbage collection pauses invisible at 100 QPS become catastrophic at scale.
The bid evaluation path uses specific techniques:
Pre-computed lookup tables for campaign targeting criteria, frequency caps, and budget constraints - updated asynchronously every 5-10 seconds from the main campaign store
Object pooling and arena allocation to eliminate per-request heap allocations entirely
Lockless data structures for shared state - bloom filters for frequency capping, atomic counters for budget pacing
Pre-built decision trees for targeting evaluation - building the tree costs seconds, evaluating it costs microseconds
Zero allocation on the hot path is not an optimization. It is a requirement at 100K QPS.
Run ML model inference in under 3ms per bid
Click-through rate prediction and conversion probability models must execute inference within 2-3ms as part of the overall bid evaluation pipeline. Every millisecond spent on inference is unavailable for other bid logic.
ONNX Runtime with quantized INT8 models provides the best latency-to-accuracy tradeoff:
Feature extraction from the bid request in under 0.5ms using pre-computed feature stores with user and context signals
Model inference in 1-2ms using batched ONNX evaluation with thread-pinned execution - no context switching during scoring
Score calibration and bid price calculation in under 0.5ms using pre-computed price curves per campaign tier
Model updates deployed via blue-green switching - new model loads in shadow mode, validates against production predictions, then swaps atomically
ML inference at 100K QPS requires hardware-optimized model serving with zero allocation overhead
Monitor at scale without adding overhead to the bidding path
Traditional logging at 100K QPS generates more load than the bidding logic itself. The monitoring stack must be as performance-conscious as the application:
Sampling-based metrics collection - log 1 in 1,000 requests in detail, aggregate the rest into counters and histograms updated atomically
Real-time percentile tracking at p50, p95, and p99 per region, per ad exchange, per campaign tier
Automated circuit breakers pulling a bidder instance from rotation when its p99 response times exceed the exchange timeout
Anomaly detection on bid rate drops, win rate changes, and spend velocity deviations - catching stale models and network degradation faster than error-rate monitoring
Handle the SSP side of the auction equation
A supply-side platform runs the mirror problem. It broadcasts bid requests to dozens of bidders, collects responses, evaluates floor prices, runs the auction, and returns a winner - all within its own tight timeout.
SSP development teams face additional complexity:
Header bidding means running multiple auctions in parallel - each bidder's timeout is the SSP's latency budget
Floor price optimization using ML models must execute within the same auction window without adding latency
High performance auction logic evaluates 20-50 bid responses per impression, selecting the winner in under 1ms
Programmatic advertising at scale demands that both sides of the auction optimize relentlessly for low latency.
Adapt bidding infrastructure for mobile app and in-app inventory
In-app bid requests carry different signals than web requests. Device-level identifiers (where available), app context, and SDK-reported viewability replace cookie-based signals.
Click-through rate models trained on web inventory need retraining for in-app contexts where user interaction patterns differ substantially.
Attribution modeling for mobile requires server-to-server postback integration with MMPs
Deduplication across multiple attribution windows prevents double-counting conversions
Probabilistic and deterministic match reconciliation runs asynchronously - results feed back into the bidding model within 24 hours
A real-time bidding platform built only for web inventory leaves 40-60% of programmatic advertising spend on the table. Mobile and in-app require dedicated infrastructure investment.
Need help building this?
Our engineering team specializes in AdTech solutions. Let's discuss how we can bring your project to life.
Related Articles
AdTech
Mar 5, 20267 min read
How AI is Reshaping Programmatic Advertising in 2026
Read post
AdTech
Feb 14, 20266 min read
Privacy-First Targeting: Building Ad Tech Without Third-Party Cookies