The single-counter approach has a weakness: it depends entirely on one extreme observation. A single unlucky hash can throw the whole estimate off, and since we only ever keep the maximum, there is no way to recover from it.
The solution is to run many independent experiments at the same time and combine their results. When one experiment goes wrong, the others balance it out.
Rather than hashing each item multiple times, we get multiple experiments for free by splitting a single hash into two parts:
For buckets, we need index bits where . More buckets give a more accurate estimate, but require more memory. And because each bucket is updated independently, one extreme value can only affect one out of results — the rest keep the estimate grounded.
The code splits each hash into two parts: the left bits select the bucket, and the right bits are measured for the leading-zero count.
class HyperLogLog {
// p = number of index bits
// m = number of buckets (always 2^p)
readonly p: number;
readonly m: number;
readonly registers: Uint8Array;
constructor(p = 6) {
this.p = p;
this.m = 1 << p;
this.registers = new Uint8Array(this.m);
}
add(hash: number) {
// Calculate bucket index
const index = hash >>> (32 - this.p);
// Shift left to remove index bits, leaving only measurement bits
const w = (hash << this.p) >>> 0;
// Count leading zeroes and put in the target bucket.
const k = Math.clz32(w) + 1;
if (k > this.registers[index]) this.registers[index] = k;
}
// We will implement the real estimator next
estimate(): number {
throw new Error("Estimator not implemented yet");
}
}