BloomFilter Class Reference

Platforms: Unix, Windows

class pybloomfilter.BloomFilter(capacity : int, error_rate : float, filename : string)

Create a new BloomFilter object with a given capacity and error_rate. Note that we do not check capacity. This is important, because I want to be able to support logical OR and AND (see below). The capacity and error_rate then together serve as a contract—you add less than capacity items, and the Bloom Filter will have an error rate less than error_rate.

NEW: If you specify None for the filename, then the bloom filter will be backed by malloc’d memory, rather than by a file.

Static Methods

static BloomFilter.open(filename)

Open an already existing Bloomfilter file.

static BloomFilter.from_base64(filename, input[, perm = 0755])

Create a new BloomFilter object on filename from the input base64 string. Example:

>>> bf = BloomFilter.from_base64("/tmp/mike.bf",
     "eJwFwcuWgiAAANC9v+JCx7By0QKt0GHEbKSknflAQ9QmTyRfP/fW5E9XTRSX"
     "qcLlqGNXphAqcfVH\nRoNv0n4JlTpIvAP0e1+RyXX6I637ggA+VPZnTYR1A4"
     "Um5s9geYaZZLiT208JIiG3iwhf3Fwlzb3Y\n5NRL4uNQS6/d9OvTDJbnZMnR"
     "zcrplOX5kmsVIkQziM+vw4hCDQ3OkN9m3WVfPWzGfaTeRftMCLws\nPnzEzs"
     "gjAW60xZTBbj/bOAgYbK50PqjdzvgHZ6FHZw==\n")
>>> "MIKE" in bf
True

Instance Attributes

BloomFilter.capacity

The number of elements for this filter.

BloomFilter.error_rate

The acceptable probability of false positives.

BloomFilter.hash_seeds

The integer seeds used for the random hashing.

BloomFilter.name

The file name (compatible with file objects)

BloomFilter.num_bits

The number of bits used in the filter as buckets

BloomFilter.num_hashes

The number of hash functions used when computing

Instance Methods

BloomFilter.add(item) → Boolean

Add the item to the bloom filter.

Parameters:
  • item – Hashable object
Return type:

Boolean (True if item already in the filter)

BloomFilter.clear_all()

Remove all elements from the bloom filter at once.

BloomFilter.copy(filename) → BloomFilter

Copies the current BloomFilter object to another object with new filename.

Parameters:
  • filename – string filename
Return type:

new BloomFilter object

BloomFilter.copy_template(filename[, perm=0755]) → BloomFilter

Creates a new BloomFilter object with the same parameters–same hash seeds, same size.. everything. Once this is performed, the two filters are comparable, so you can perform logical operators. Example:

>>> apple = BloomFilter(100, 0.1, '/tmp/apple')
>>> apple.add('apple')
False
>>> pear = apple.copy_template('/tmp/pear')
>>> pear.add('pear')
False
>>> pear |= apple
BloomFilter.sync()

Forces a sync() call on the underlying mmap file object. Use this if you are about to copy the file and you want to be Sure (TM) you got everything correctly.

BloomFilter.to_base64() → string

Creates a compressed, base64 encoded version of the Bloom filter. Since the bloom filter is efficiently in binary on the file system this may not be too useful. I find it useful for debugging so I can copy filters from one terminal to another in their entirety.

Return type:Base64 encoded string representing filter
BloomFilter.update(iterable)

Calls add() on all items in the iterable.

BloomFilter.union(filter) → BloomFilter

Perform a set OR with another comparable filter. You can (only) construct comparable filters with copy_template above. See the example in copy_template. In that example, pear will have both “apple” and “pear”.

The result will occur in place. That is, calling:

bf.union(bf2)

is a way to add all the elements of bf2 to bf.

N.B.: Calling this function will render future calls to len() invalid.

BloomFilter.intersection(filter) → BloomFilter

The same as union() above except it uses a set AND instead of a set OR.

N.B.: Calling this function will render future calls to len() invalid.

Magic Methods

BloomFilter.__len__(item) → Integer

Returns the number of distinct elements that have been added to the BloomFilter object, subject to the error given in error_rate.

Example:

>>> bf = BloomFilter(100, 0.1, '/tmp/fruit.bloom')
>>> bf.add("Apple")
>>> bf.add('Apple')
>>> bf.add('orange')
>>> len(bf)
2
>>> bf2 = bf.copy_template('/tmp/new.bloom')
>>> bf2 |= bf
>>> len(bf2)
Traceback (most recent call last):
  ...
pybloomfilter.IndeterminateCountError: Length of BloomFilter object is unavailable after intersection or union called.
BloomFilter.__in__(item) → Boolean

Check to see if item is contained in the filter, with an acceptable false positive rate of error_rate (see above).

BloomFilter.__ior__(filter) → BloomFilter

See union(filter)

BloomFilter.__iand__(filter) → BloomFilter

See intersection(filter)

Exceptions

class pybloomfilter.IndeterminateCountError(message)

The exception that is raised if len() is called on a BloomFilter object after |=, &=, intersection(), or union() is used.

Table Of Contents

Previous topic

Welcome to Python BloomFilter’s documentation!

This Page