BloomFilter Class Reference¶

Platforms: Unix, Windows

class pybloomfilter.BloomFilter(capacity : int, error_rate : float, filename : string)¶

Create a new BloomFilter object with a given capacity and error_rate. Note that we do not check capacity. This is important, because I want to be able to support logical OR and AND (see below). The capacity and error_rate then together serve as a contract—you add less than capacity items, and the Bloom Filter will have an error rate less than error_rate.

NEW: If you specify None for the filename, then the bloom filter will be backed by malloc’d memory, rather than by a file.

Static Methods¶

static BloomFilter.open(filename)¶: Open an already existing Bloomfilter file.

static BloomFilter.from_base64(filename, input[, perm = 0755])¶

Create a new BloomFilter object on filename from the input base64 string. Example:

>>> bf = BloomFilter.from_base64("/tmp/mike.bf",
     "eJwFwcuWgiAAANC9v+JCx7By0QKt0GHEbKSknflAQ9QmTyRfP/fW5E9XTRSX"
     "qcLlqGNXphAqcfVH\nRoNv0n4JlTpIvAP0e1+RyXX6I637ggA+VPZnTYR1A4"
     "Um5s9geYaZZLiT208JIiG3iwhf3Fwlzb3Y\n5NRL4uNQS6/d9OvTDJbnZMnR"
     "zcrplOX5kmsVIkQziM+vw4hCDQ3OkN9m3WVfPWzGfaTeRftMCLws\nPnzEzs"
     "gjAW60xZTBbj/bOAgYbK50PqjdzvgHZ6FHZw==\n")
>>> "MIKE" in bf
True

Instance Attributes¶

BloomFilter.capacity¶: The number of elements for this filter.

BloomFilter.error_rate¶: The acceptable probability of false positives.

BloomFilter.hash_seeds¶: The integer seeds used for the random hashing.

BloomFilter.name¶: The file name (compatible with file objects)

BloomFilter.num_bits¶: The number of bits used in the filter as buckets

BloomFilter.num_hashes¶: The number of hash functions used when computing

Instance Methods¶

BloomFilter.add(item) → Boolean¶

Add the item to the bloom filter.

Parameters:	item – Hashable object
Return type:	Boolean (True if item already in the filter)

BloomFilter.clear_all()¶: Remove all elements from the bloom filter at once.

BloomFilter.copy(filename) → BloomFilter¶

Copies the current BloomFilter object to another object with new filename.

Parameters:	filename – string filename
Return type:	new BloomFilter object

BloomFilter.copy_template(filename[, perm=0755]) → BloomFilter¶

Creates a new BloomFilter object with the same parameters–same hash seeds, same size.. everything. Once this is performed, the two filters are comparable, so you can perform logical operators. Example:

>>> apple = BloomFilter(100, 0.1, '/tmp/apple')
>>> apple.add('apple')
False
>>> pear = apple.copy_template('/tmp/pear')
>>> pear.add('pear')
False
>>> pear |= apple

BloomFilter.sync()¶: Forces a sync() call on the underlying mmap file object. Use this if you are about to copy the file and you want to be Sure (TM) you got everything correctly.

BloomFilter.to_base64() → string¶

Creates a compressed, base64 encoded version of the Bloom filter. Since the bloom filter is efficiently in binary on the file system this may not be too useful. I find it useful for debugging so I can copy filters from one terminal to another in their entirety.

Return type:	Base64 encoded string representing filter

BloomFilter.update(iterable)¶: Calls add() on all items in the iterable.

BloomFilter.union(filter) → BloomFilter¶

Perform a set OR with another comparable filter. You can (only) construct comparable filters with copy_template above. See the example in copy_template. In that example, pear will have both “apple” and “pear”.

The result will occur in place. That is, calling:

bf.union(bf2)

is a way to add all the elements of bf2 to bf.

N.B.: Calling this function will render future calls to len() invalid.

BloomFilter.intersection(filter) → BloomFilter¶

The same as union() above except it uses a set AND instead of a set OR.

N.B.: Calling this function will render future calls to len() invalid.

Magic Methods¶

BloomFilter.__len__(item) → Integer¶

Returns the number of distinct elements that have been added to the BloomFilter object, subject to the error given in error_rate.

Example:

>>> bf = BloomFilter(100, 0.1, '/tmp/fruit.bloom')
>>> bf.add("Apple")
>>> bf.add('Apple')
>>> bf.add('orange')
>>> len(bf)
2
>>> bf2 = bf.copy_template('/tmp/new.bloom')
>>> bf2 |= bf
>>> len(bf2)
Traceback (most recent call last):
  ...
pybloomfilter.IndeterminateCountError: Length of BloomFilter object is unavailable after intersection or union called.

BloomFilter.__in__(item) → Boolean¶: Check to see if item is contained in the filter, with an acceptable false positive rate of error_rate (see above).

BloomFilter.__ior__(filter) → BloomFilter¶: See union(filter)

BloomFilter.__iand__(filter) → BloomFilter¶: See intersection(filter)

Exceptions¶

class pybloomfilter.IndeterminateCountError(message)¶: The exception that is raised if len() is called on a BloomFilter object after |=, &=, intersection(), or union() is used.

BloomFilter Class Reference¶

Static Methods¶

Instance Attributes¶

Instance Methods¶

Magic Methods¶

Exceptions¶

Table Of Contents

Previous topic

This Page

Navigation

BloomFilter Class Reference¶

Static Methods¶

Instance Attributes¶

Instance Methods¶

Magic Methods¶

Exceptions¶

Table Of Contents

Previous topic

This Page

Quick search

Navigation