Platforms: Unix, Windows
Create a new BloomFilter object with a given capacity and error_rate. Note that we do not check capacity. This is important, because I want to be able to support logical OR and AND (see below). The capacity and error_rate then together serve as a contract—you add less than capacity items, and the Bloom Filter will have an error rate less than error_rate.
NEW: If you specify None for the filename, then the bloom filter will be backed by malloc’d memory, rather than by a file.
Open an already existing Bloomfilter file.
Create a new BloomFilter object on filename from the input base64 string. Example:
>>> bf = BloomFilter.from_base64("/tmp/mike.bf",
"eJwFwcuWgiAAANC9v+JCx7By0QKt0GHEbKSknflAQ9QmTyRfP/fW5E9XTRSX"
"qcLlqGNXphAqcfVH\nRoNv0n4JlTpIvAP0e1+RyXX6I637ggA+VPZnTYR1A4"
"Um5s9geYaZZLiT208JIiG3iwhf3Fwlzb3Y\n5NRL4uNQS6/d9OvTDJbnZMnR"
"zcrplOX5kmsVIkQziM+vw4hCDQ3OkN9m3WVfPWzGfaTeRftMCLws\nPnzEzs"
"gjAW60xZTBbj/bOAgYbK50PqjdzvgHZ6FHZw==\n")
>>> "MIKE" in bf
True
The number of elements for this filter.
The acceptable probability of false positives.
The integer seeds used for the random hashing.
The file name (compatible with file objects)
The number of bits used in the filter as buckets
The number of hash functions used when computing
Add the item to the bloom filter.
Parameters: |
|
---|---|
Return type: | Boolean (True if item already in the filter) |
Remove all elements from the bloom filter at once.
Copies the current BloomFilter object to another object with new filename.
Parameters: |
|
---|---|
Return type: | new BloomFilter object |
Creates a new BloomFilter object with the same parameters–same hash seeds, same size.. everything. Once this is performed, the two filters are comparable, so you can perform logical operators. Example:
>>> apple = BloomFilter(100, 0.1, '/tmp/apple')
>>> apple.add('apple')
False
>>> pear = apple.copy_template('/tmp/pear')
>>> pear.add('pear')
False
>>> pear |= apple
Forces a sync() call on the underlying mmap file object. Use this if you are about to copy the file and you want to be Sure (TM) you got everything correctly.
Creates a compressed, base64 encoded version of the Bloom filter. Since the bloom filter is efficiently in binary on the file system this may not be too useful. I find it useful for debugging so I can copy filters from one terminal to another in their entirety.
Return type: | Base64 encoded string representing filter |
---|
Calls add() on all items in the iterable.
Perform a set OR with another comparable filter. You can (only) construct comparable filters with copy_template above. See the example in copy_template. In that example, pear will have both “apple” and “pear”.
The result will occur in place. That is, calling:
bf.union(bf2)
is a way to add all the elements of bf2 to bf.
N.B.: Calling this function will render future calls to len() invalid.
The same as union() above except it uses a set AND instead of a set OR.
N.B.: Calling this function will render future calls to len() invalid.
Returns the number of distinct elements that have been added to the BloomFilter object, subject to the error given in error_rate.
Example:
>>> bf = BloomFilter(100, 0.1, '/tmp/fruit.bloom')
>>> bf.add("Apple")
>>> bf.add('Apple')
>>> bf.add('orange')
>>> len(bf)
2
>>> bf2 = bf.copy_template('/tmp/new.bloom')
>>> bf2 |= bf
>>> len(bf2)
Traceback (most recent call last):
...
pybloomfilter.IndeterminateCountError: Length of BloomFilter object is unavailable after intersection or union called.
Check to see if item is contained in the filter, with an acceptable false positive rate of error_rate (see above).
See union(filter)
See intersection(filter)