org.apache.cassandra.utils
Class BloomCalculations
java.lang.Object
org.apache.cassandra.utils.BloomCalculations
public class BloomCalculations
- extends java.lang.Object
The following calculations are taken from:
http://www.cs.wisc.edu/~cao/papers/summary-cache/node8.html
"Bloom Filters - the math"
This class's static methods are meant to facilitate the use of the Bloom
Filter class by helping to choose correct values of 'bits per element' and
'number of hash functions, k'.
Nested Class Summary |
static class |
BloomCalculations.BloomSpecification
A wrapper class that holds two key parameters for a Bloom Filter: the
number of hash functions used, and the number of buckets per element used. |
Method Summary |
static int |
computeBestK(int bucketsPerElement)
Given the number of buckets that can be used per element, return the optimal
number of hash functions in order to minimize the false positive rate. |
static BloomCalculations.BloomSpecification |
computeBucketsAndK(double maxFalsePosProb)
Given a maximum tolerable false positive probability, compute a Bloom
specification which will give less than the specified false positive rate,
but minimize the number of buckets per element and the number of hash
functions used. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
BloomCalculations
public BloomCalculations()
computeBestK
public static int computeBestK(int bucketsPerElement)
- Given the number of buckets that can be used per element, return the optimal
number of hash functions in order to minimize the false positive rate.
- Parameters:
bucketsPerElement
-
- Returns:
- The number of hash functions that minimize the false positive rate.
computeBucketsAndK
public static BloomCalculations.BloomSpecification computeBucketsAndK(double maxFalsePosProb)
- Given a maximum tolerable false positive probability, compute a Bloom
specification which will give less than the specified false positive rate,
but minimize the number of buckets per element and the number of hash
functions used. Because bandwidth (and therefore total bitvector size)
is considered more expensive than computing power, preference is given
to minimizing buckets per element rather than number of hash functions.
- Parameters:
maxFalsePosProb
- The maximum tolerable false positive rate.
- Returns:
- A Bloom Specification which would result in a false positive rate
less than specified by the function call.
Copyright © 2009 The Apache Software Foundation