Major challenges
Keep track of a large universe, e.g., pairs of IP address, not ages
Methodology
Synopses (trade-off between accuracy and storage)
Use synopsis data structure, much smaller (O(logk N) space) than their base data set (O(N) space)
Compute an approximate answer within a small error range (factor ε of the actual answer)
Major methods
Random sampling
Histograms
Sliding windows
Multi-resolution model
Sketches
Radomized algorithms