Back Home

Multiple Counters

The single-counter approach has a weakness: it depends entirely on one extreme observation. A single unlucky hash can throw the whole estimate off, and since we only ever keep the maximum, there is no way to recover from it.

The solution is to run many independent experiments at the same time and combine their results. When one experiment goes wrong, the others balance it out.

Splitting the hash into buckets

Rather than hashing each item multiple times, we get multiple experiments for free by splitting a single hash into two parts:

  • Index bits (left side): These bits decide which bucket — also called a register — gets updated.
  • Measurement bits (right side): The remaining bits are used to find the position of the first 1-bit, the same way we did before.

For mm buckets, we need pp index bits where 2p=m2^p = m. More buckets give a more accurate estimate, but require more memory. And because each bucket is updated independently, one extreme value can only affect one out of mm results — the rest keep the estimate grounded.

Implementation

The code splits each hash into two parts: the left bits select the bucket, and the right bits are measured for the leading-zero count.

class HyperLogLog {
  // p = number of index bits
  // m = number of buckets (always 2^p)
  readonly p: number;
  readonly m: number;
  readonly registers: Uint8Array;
 
  constructor(p = 6) {
    this.p = p;
    this.m = 1 << p;
    this.registers = new Uint8Array(this.m);
  }
 
  add(hash: number) {
    // Calculate bucket index
    const index = hash >>> (32 - this.p);
 
    // Shift left to remove index bits, leaving only measurement bits
    const w = (hash << this.p) >>> 0;
 
    // Count leading zeroes and put in the target bucket.
    const k = Math.clz32(w) + 1;
    if (k > this.registers[index]) this.registers[index] = k;
  }
 
  // We will implement the real estimator next
  estimate(): number {
    throw new Error("Estimator not implemented yet");
  }
}