BloomFilter Class Reference ============================ .. toctree:: :maxdepth: 2 .. module:: pybloomfilter :platform: Unix, Windows :synopsis: A fast BloomFilter for Python .. moduleauthor:: Michael Axiak .. class:: BloomFilter(capacity : int, error_rate : float, filename : string) Create a new BloomFilter object with a given capacity and error_rate. **Note that we do not check capacity.** This is important, because I want to be able to support logical OR and AND (see below). The capacity and error_rate then together serve as a contract---you add less than capacity items, and the Bloom Filter will have an error rate less than error_rate. **NEW**: If you specify ``None`` for the filename, then the bloom filter will be backed by malloc'd memory, rather than by a file. Static Methods ------------------ .. staticmethod:: BloomFilter.open(filename) Open an already existing Bloomfilter file. .. staticmethod:: BloomFilter.from_base64(filename, input, [perm = 0755]) Create a new BloomFilter object on filename from the input base64 string. Example:: >>> bf = BloomFilter.from_base64("/tmp/mike.bf", "eJwFwcuWgiAAANC9v+JCx7By0QKt0GHEbKSknflAQ9QmTyRfP/fW5E9XTRSX" "qcLlqGNXphAqcfVH\nRoNv0n4JlTpIvAP0e1+RyXX6I637ggA+VPZnTYR1A4" "Um5s9geYaZZLiT208JIiG3iwhf3Fwlzb3Y\n5NRL4uNQS6/d9OvTDJbnZMnR" "zcrplOX5kmsVIkQziM+vw4hCDQ3OkN9m3WVfPWzGfaTeRftMCLws\nPnzEzs" "gjAW60xZTBbj/bOAgYbK50PqjdzvgHZ6FHZw==\n") >>> "MIKE" in bf True Instance Attributes --------------------- .. attribute:: BloomFilter.capacity The number of elements for this filter. .. attribute:: BloomFilter.error_rate The acceptable probability of false positives. .. attribute:: BloomFilter.hash_seeds The integer seeds used for the random hashing. .. attribute:: BloomFilter.name The file name (compatible with file objects) .. attribute:: BloomFilter.num_bits The number of bits used in the filter as buckets .. attribute:: BloomFilter.num_hashes The number of hash functions used when computing Instance Methods ------------------- .. method:: BloomFilter.add(item) -> Boolean Add the item to the bloom filter. :param item: Hashable object :rtype: Boolean (True if item already in the filter) .. method:: BloomFilter.clear_all() Remove all elements from the bloom filter at once. .. method:: BloomFilter.copy(filename) -> BloomFilter Copies the current BloomFilter object to another object with new filename. :param filename: string filename :rtype: new BloomFilter object .. method:: BloomFilter.copy_template(filename, [perm=0755]) -> BloomFilter Creates a new BloomFilter object with the same *parameters*--same hash seeds, same size.. everything. Once this is performed, the two filters are *comparable*, so you can perform logical operators. Example:: >>> apple = BloomFilter(100, 0.1, '/tmp/apple') >>> apple.add('apple') False >>> pear = apple.copy_template('/tmp/pear') >>> pear.add('pear') False >>> pear |= apple .. method:: BloomFilter.sync() Forces a sync() call on the underlying mmap file object. Use this if you are about to copy the file and you want to be Sure (TM) you got everything correctly. .. method:: BloomFilter.to_base64() -> string Creates a compressed, base64 encoded version of the Bloom filter. Since the bloom filter is efficiently in binary on the file system this may not be too useful. I find it useful for debugging so I can copy filters from one terminal to another in their entirety. :rtype: Base64 encoded string representing filter .. method:: BloomFilter.update(iterable) Calls add() on all items in the iterable. .. method:: BloomFilter.union(filter) -> BloomFilter Perform a set OR with another *comparable* filter. You can (only) construct comparable filters with **copy_template** above. See the example in copy_template. In that example, pear will have both "apple" and "pear". The result will occur **in place**. That is, calling:: bf.union(bf2) is a way to add all the elements of bf2 to bf. *N.B.: Calling this function will render future calls to len() invalid.* .. method:: BloomFilter.intersection(filter) -> BloomFilter The same as union() above except it uses a set AND instead of a set OR. *N.B.: Calling this function will render future calls to len() invalid.* Magic Methods -------------- .. method:: BloomFilter.__len__(item) -> Integer Returns the number of distinct elements that have been added to the BloomFilter object, subject to the error given in error_rate. Example:: >>> bf = BloomFilter(100, 0.1, '/tmp/fruit.bloom') >>> bf.add("Apple") >>> bf.add('Apple') >>> bf.add('orange') >>> len(bf) 2 >>> bf2 = bf.copy_template('/tmp/new.bloom') >>> bf2 |= bf >>> len(bf2) Traceback (most recent call last): ... pybloomfilter.IndeterminateCountError: Length of BloomFilter object is unavailable after intersection or union called. .. method:: BloomFilter.__in__(item) -> Boolean Check to see if item is contained in the filter, with an acceptable false positive rate of error_rate (see above). .. method:: BloomFilter.__ior__(filter) -> BloomFilter See union(filter) .. method:: BloomFilter.__iand__(filter) -> BloomFilter See intersection(filter) Exceptions -------------- .. class:: IndeterminateCountError(message) The exception that is raised if len() is called on a BloomFilter object after |=, &=, intersection(), or union() is used.