Optimal Frequency Calculation

Different algorithms and structures have been used to solve rate-limiting. In this post I want to focus on use of probabilistic method & count-min sketch in particular.

Why

To save space or time. It approximates the result.

There is extensive research on solving the membership problem using data structures from hash tables to balanced trees and B-tree indices, and these form the backbone of systems from OSs, compilers, databases and beyond. Many of these data structures have been in widespread use for forty or more years.

Count-min consumes a stream of events and produces approximate frequency of each of the events. It can be queried for frequency of a certain event and it will return the frequency of that event with certain probability.

Usage

Compressed sensing, Networking, Databases, NLP, Security (cryptography, finding primes), Computation Geometry (finding vertices),  Machine Learning.

Implementation

Simple python implementation from github.

def query(self, x):
"""
Return an estimation of the amount of times 
`x` has occurred. The returned value always 
overestimates the real value.
"""
return min(table[i] for table, i in zip(self.tables, 
                                           self._hash(x))

References

  • Count-Min C Implementation – https://sites.google.com/site/countminsketch/home
  • Count-Min Go Implementation – https://github.com/tylertreat/BoomFilters
  • Algorithms to live by – http://algorithmstoliveby.com/
  • C-Implementation – http://www.cs.rutgers.edu/∼muthu/ massdal-code-index.html

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s