HyperLogLog: How Big Tech Counts Billions with Kilobytes
How does Google count 3 billion daily active users across Search, YouTube, and Gmail? How does Netflix know exactly how many unique accounts watched Squid Game in its first week? The naive answer — "just keep a set of user IDs" — falls apart at scale.
The difference between theory and practice is that in theory, there is no difference between theory and practice.
This article tells the story of HyperLogLog through the lens of the companies that bet their analytics infrastructure on it.
The $16 Billion Memory Problem
Consider the problem: you're an engineer at Google, and you need to count unique daily users. You have 3 billion users, each with a 128-bit UUID.
That's 48 GB of RAM — per counter. Now multiply by every dimension you want to track: unique users per country, per product, per feature flag. You'd need petabytes of RAM just for counting.
The breakthrough insight: what if we could trade 2% accuracy for 99.9999% memory reduction?
Case Study 1: Google BigQuery — 93% Cost Reduction
In 2023, a data analytics company documented their migration from COUNT(DISTINCT) to HyperLogLog in BigQuery. The results were staggering:
| Metric | Before HLL | After HLL |
|---|---|---|
| Query Cost | $X | 0.07X (93% reduction) |
| Data Scanned | 6.5 TB | 16.25 GB |
| Query Time | Hours | Seconds |
The transformation was simple. Instead of:
SELECT COUNT(DISTINCT user_id)
FROM events
WHERE date BETWEEN '2023-01-01' AND '2023-12-31';
They pre-computed daily HLL sketches:
-- Daily sketch creation
SELECT
date,
HLL_COUNT.INIT(user_id) AS user_sketch
FROM events
GROUP BY date;
-- Query any date range by merging sketches
SELECT HLL_COUNT.EXTRACT(
HLL_COUNT.MERGE(user_sketch)
) AS unique_users
FROM daily_sketches
WHERE date BETWEEN '2023-01-01' AND '2023-12-31';
The key insight: HLL sketches are mergeable. You can compute daily sketches cheaply, then combine them for any arbitrary date range without re-scanning raw data.
Case Study 2: Meta Presto — 1000x Speedup
Meta's engineering blog documents their HyperLogLog implementation in Presto, their distributed SQL engine. Before HLL, computing distinct counts across their data lake was "prohibitively expensive":
- Some queries took days to complete
- Required terabytes of intermediate memory
After implementing APPROX_DISTINCT:
- Queries run in minutes (7x-1000x speedup)
- Memory usage dropped to < 1 MB
- Error rate: approximately 2.3%
Meta's implementation uses a clever hybrid approach:
if (cardinality < 256):
use SPARSE layout (exact counting)
else:
switch to DENSE layout (~8KB, approximate)
This gives exact results for low-cardinality columns while providing HLL efficiency for high-cardinality ones — all transparent to the user.
Case Study 3: Netflix Muse — Trillion-Row Analytics
Netflix's internal "Muse" platform powers data-driven creative insights. They need to answer questions like:
- How many unique accounts saw the thumbnail for Stranger Things?
- What's the reach of a specific promotional asset?
At Netflix scale, this means counting distinct users across trillion-row datasets. Their solution: HLL sketches that achieved:
- 50% reduction in query latency
- ~2% error rate (acceptable for creative analytics)
- Real-time insights instead of overnight batch jobs
The Netflix tech blog specifically calls out HLL's role in tracking "impressions" (unique users who saw an asset) and "qualified plays" (unique users who watched for a minimum duration).
Case Study 4: Amazon Redshift — Native HLLSKETCH
In October 2020, Amazon Redshift introduced native HyperLogLog support with a dedicated HLLSKETCH data type. This wasn't just a function — it's a first-class citizen in the type system.
-- Create a table with HLL column
CREATE TABLE daily_metrics (
date DATE,
user_sketch HLLSKETCH
);
-- Populate with sketches
INSERT INTO daily_metrics
SELECT
event_date,
HLL_CREATE_SKETCH(user_id)
FROM raw_events
GROUP BY event_date;
-- Query with merge
SELECT
HLL_CARDINALITY(HLL_COMBINE(user_sketch)) AS monthly_uniques
FROM daily_metrics
WHERE date BETWEEN '2024-01-01' AND '2024-01-31';
Amazon's implementation achieves 0.01% - 0.6% error — significantly better than the theoretical 2% bound. The key innovation: using a higher precision parameter (m = 15 by default, configurable up to 26).
Case Study 5: Redis PFCOUNT — 12KB for 2^64 Elements
Redis is perhaps the most elegant HLL implementation. With just three commands, you can count unique elements at any scale:
# Add elements
PFADD visitors "user:1" "user:2" "user:3"
PFADD visitors "user:2" "user:4" # user:2 is a duplicate
# Count unique elements
PFCOUNT visitors
# Returns: 4
# Merge multiple HLLs
PFMERGE total visitors_day1 visitors_day2 visitors_day3
PFCOUNT total
The "PF" prefix honors Philippe Flajolet, the French computer scientist who invented the algorithm. Redis uses:
- 12KB fixed memory per HLL
- 16,384 registers (buckets)
- 0.81% standard error
Redis documentation notes that even with 2^64 possible elements, the HLL data structure never grows beyond 12KB.
The Algorithm: First Principles
Now that we've seen HLL in production, let's understand why it works.
The Coin Flip Intuition
Imagine flipping a fair coin until you get heads. The probability of seeing tails before the first head is:
If someone tells you "I flipped 10 tails in a row before getting heads," you'd estimate they probably flipped the coin about times total. This "rare event" gives information about sample size.
From Coin Flips to Hash Functions
HyperLogLog applies this principle to hash values:
- Hash each element to a uniformly distributed binary string
- Count leading zeros in the hash (equivalent to "tails before heads")
- Track the maximum leading zeros seen
If we observe a hash with 20 leading zeros, we estimate unique elements have been hashed, because that's how many trials we'd expect before seeing such a "rare" hash.
The Bucketing Trick
A single maximum is too noisy. HyperLogLog uses registers (typically 16,384):
- Use the first bits of the hash to select a register ()
- Use remaining bits to count leading zeros
- Store only the maximum leading zeros per register
The cardinality estimate uses the harmonic mean of all register values:
Where is a bias correction constant () and is the max leading zeros in register .
The standard error is provably bounded:
For : .
C++ Implementation
Here's a production-ready implementation with all corrections:
#include <cstdint>
#include <cmath>
#include <vector>
#include <functional>
class HyperLogLog {
private:
static constexpr int P = 14; // Precision
static constexpr int M = 1 << P; // 16384 registers
static constexpr double ALPHA = 0.7213 / (1.0 + 1.079 / M);
std::vector<uint8_t> registers;
// Count leading zeros in the lower (32-P) bits
static uint8_t countLeadingZeros(uint32_t hash) {
uint32_t w = hash << P;
if (w == 0) return 32 - P + 1;
return __builtin_clz(w) + 1;
}
// FNV-1a hash
static uint32_t fnv1a(const void* data, size_t len) {
const uint8_t* bytes = static_cast<const uint8_t*>(data);
uint32_t hash = 2166136261u;
for (size_t i = 0; i < len; i++) {
hash ^= bytes[i];
hash *= 16777619u;
}
return hash;
}
public:
HyperLogLog() : registers(M, 0) {}
void add(const void* data, size_t len) {
uint32_t hash = fnv1a(data, len);
uint32_t idx = hash >> (32 - P);
uint8_t zeros = countLeadingZeros(hash);
if (zeros > registers[idx]) {
registers[idx] = zeros;
}
}
void add(const std::string& s) {
add(s.data(), s.size());
}
double estimate() const {
// Raw harmonic mean estimate
double sum = 0.0;
int emptyRegisters = 0;
for (int j = 0; j < M; j++) {
sum += std::pow(2.0, -registers[j]);
if (registers[j] == 0) emptyRegisters++;
}
double E = ALPHA * M * M / sum;
// Small range correction (linear counting)
if (E <= 2.5 * M && emptyRegisters > 0) {
E = M * std::log(static_cast<double>(M) / emptyRegisters);
}
// Large range correction (hash collision)
constexpr double TWO_32 = 4294967296.0;
if (E > TWO_32 / 30.0) {
E = -TWO_32 * std::log(1.0 - E / TWO_32);
}
return E;
}
// Merge another HLL into this one
void merge(const HyperLogLog& other) {
for (int j = 0; j < M; j++) {
if (other.registers[j] > registers[j]) {
registers[j] = other.registers[j];
}
}
}
// Memory usage in bytes
static constexpr size_t memoryUsage() { return M; }
};
Usage Example
#include <iostream>
int main() {
HyperLogLog hll;
// Simulate 1 million unique users
for (int i = 0; i < 1000000; i++) {
std::string user = "user:" + std::to_string(i);
hll.add(user);
}
std::cout << "Estimated: " << hll.estimate() << std::endl;
std::cout << "Actual: 1000000" << std::endl;
std::cout << "Memory: " << HyperLogLog::memoryUsage() << " bytes" << std::endl;
return 0;
}
// Output:
// Estimated: 1008432 (~0.8% error)
// Actual: 1000000
// Memory: 16384 bytes
Algorithm Evolution
HyperLogLog didn't appear from nowhere. Here's the lineage:
| Year | Algorithm | Innovation | Error |
|---|---|---|---|
| 1985 | Flajolet-Martin | Bit pattern observation | High variance |
| 2003 | LogLog | Bucketing with geometric mean | |
| 2007 | HyperLogLog | Harmonic mean | |
| 2013 | HyperLogLog++ (Google) | Sparse representation + bias correction |
Google's HLL++ paper introduced:
- Sparse representation for low cardinalities (exact counting)
- Empirical bias correction for intermediate cardinalities
- Both improvements are in BigQuery's implementation
Complexity Analysis
| Operation | Time | Space | Notes |
|---|---|---|---|
| Add | — | Hash + bit ops + array write | |
| Estimate | — | Sum over all registers | |
| Merge | — | Element-wise max | |
| Storage | — | Fixed 16KB for |
The in the name comes from the register size: to track leading zeros up to 64 bits, each register needs only bits.
Conclusion
HyperLogLog is one of those rare algorithms where theory and practice align perfectly. It's not just academically elegant — it's the backbone of analytics at Google, Meta, Netflix, Amazon, and Microsoft.
The next time you see a dashboard showing "3.2 billion unique users," remember: behind that number is a 12KB data structure and a beautiful application of probability theory.
For the systems engineer, HLL is a reminder that sometimes the best solution isn't more hardware — it's smarter mathematics.
Related Articles
Bloom Filters: The Story of Knowing What You Don't Know
From a 1970 paper on space/time trade-offs to the backbone of Google Bigtable, Ethereum, and every modern database. The complete story of Bloom filters: origin, evolution, mathematics, and production implementations.
TAO: How Facebook Built a Graph Database for 3 Billion People
The engineering story behind TAO — Facebook's distributed graph store handling 400+ billion edges, 1 billion queries/second, and sub-millisecond latency. From the MySQL scaling crisis to a custom planet-scale system.
System Design: Building Facebook from Scratch
A comprehensive engineering blueprint for building Facebook at scale — covering social graph, news feed, messaging, media storage, ads, and the infrastructure serving 3 billion users.