Algorithm for estimating distinct elements in streams

Dec 8, 2023

It is such a beautiful algorithm that it fits in a tweet.

cvm-algorithm

The F0-Estimator algorithm uses a sampling strategy to keep the sample set small. It adjusts the sampling rate to prevent the set from exceeding a threshold.

After processing the data stream, it outputs |X|/p where 'p' is the final sampling rate.

Arxiv preprint