.. Python BloomFilter documentation master file, created by sphinx-quickstart on Wed Mar 31 16:25:58 2010. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. Welcome to Python BloomFilter's documentation! ============================================== If you are here, you probably don't need to be reminded about the nature of a Bloom filter. If you need to learn more, just visit the `wikipedia page `_ to learn more. This module implements a Bloom filter in python that's fast and uses mmap files for better scalability. Did I mention that it's fast? Here's a quick example:: from pybloomfilter import BloomFilter bf = BloomFilter(10000000, 0.01, 'filter.bloom') with open("/usr/share/dict/words") as f: for word in f: bf.add(word.rstrip()) print 'apple' in bf #outputs True That wasn't so hard, was it? Now, there are a lot of other things we can do. For instance, let's say we want to create a similar filter with just a few pieces of fruit:: fruitbf = bf.copy_template("fruit.bloom") fruitbf.update(("apple", "banana", "orange", "pear")) print fruitbf.to_base64() "eJzt2k13ojAUBuA9f8WFyofF5TWChlTHaPzqrlqFCtj6gQi/frqZM2N7aq3Gis59d2ye85KTRbhk" "0lyu1NRmsQrgRda0I+wZCfXIaxuWv+jqDxA8vdaf21HIOSn1u6LRE0VL9Z/qghfbBmxZoHsqM3k8" "N5XyPAxH2p22TJJoqwU9Q0y0dNDYrOHBIa3BwuznapG+KZZq69JUG0zu1tqI5weJKdpGq7PNJ6tB" "GKmzcGWWy8o0FeNNYNZAQpSdJwajt7eRhJ2YM2NOkTnSsBOCGGKIIYbY2TA663GgWWyWfUwn3oIc" "fyLYxeQwiF07RqBg9NgHrG5ba3jba5yl4zS2LtEMMcQQQwwxmRiBhPGOJOywIPafYhUwqnTvZOfY" "Zu40HH/YxDexZojJwsx6ObDcT7D8vVOtJBxiAhD/AjMmjeF2Wnqd+5RrHdo4azPEzoANabiUhh0b" "xBBDDDHEENsf8twlrizswEjDhnTbzWazbGKpQ5k07E9Ox2iFvXBZ2D9B7DawyqLFu5lshhhiiGUK" "a4nUloa9yxkwR7XhgPPXYdhRIa77uDtnyvqaIXalGK02ufv3J36GmsnG4lquPnN9gJo1VNxqgYbt" "ji/EC8s1PWG5fuVizW4Jox6/3o9XxBBDDLFbwcg9v/AwjrPHtTRsX34O01mxLw37bhCTjJk0+PLK" "08HYd4MYYojdKmYnBfjsktEpySY2tGGZzWaIIfYDGB271Yaieaat/AaOkNKb" Reference ------------ All of the reference information is available below: .. toctree:: :maxdepth: 2 ref Why pybloomfilter --------------------- As I already mentioned, there are a couple reasons to use this module: * It natively uses `mmaped files `_. * It natively does the set things you want a Bloom filter to do. * It is Fast (see Benchmarks). Benchmarks --------------------- Simple load and add speed ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ I have a simple benchmark in `test/speedtest.py `_ which compares this module to the good `pybloom module `_:: (pybloom module) pybloom load took 0.76436 s/run pybloom tests took 0.16205 s/run Errors: 0.25% positive 0.00% negative (this module) pybloomfilter load took 0.05423 s/run pybloomfilter tests took 0.00659 s/run Errors: 0.26% positive 0.00% negative In this test we just looked at adding words from a dictionary file, then testing to see if each word of another file was in the dictionary. Serialization ^^^^^^^^^^^^^^^^^ Since this package natively uses mmap files, **no serialization is needed**. Therefore, if you have to do a lot of moving between disks etc, this module is an obvious win. Install --------------------- You do not need Cython to install from sources, since I keep a cached version of the c output in the source distribution. Thus, to install you should only need to run:: $ sudo pip install pybloomfiltermmap You can also download the latest tar file from the `github tags `_. Once you download it, you should only have to run:: $ sudo python setup.py install to build and install the module. Develop ----------------------- To develop you will need Cython. The setup.py script should automatically build from Cython source if the Cython module is available.