org.apache.commons.math.random
Class EmpiricalDistributionImpl

java.lang.Object
  extended by org.apache.commons.math.random.EmpiricalDistributionImpl
All Implemented Interfaces:
Serializable, EmpiricalDistribution

public class EmpiricalDistributionImpl
extends Object
implements Serializable, EmpiricalDistribution

Implements EmpiricalDistribution interface. This implementation uses what amounts to the Variable Kernel Method with Gaussian smoothing:

Digesting the input file

  1. Pass the file once to compute min and max.
  2. Divide the range from min-max into binCount "bins."
  3. Pass the data file again, computing bin counts and univariate statistics (mean, std dev.) for each of the bins
  4. Divide the interval (0,1) into subintervals associated with the bins, with the length of a bin's subinterval proportional to its count.
Generating random values from the distribution
  1. Generate a uniformly distributed value in (0,1)
  2. Select the subinterval to which the value belongs.
  3. Generate a random Gaussian value with mean = mean of the associated bin and std dev = std dev of associated bin.

USAGE NOTES:

Version:
$Revision: 772119 $ $Date: 2009-05-06 05:43:28 -0400 (Wed, 06 May 2009) $
See Also:
Serialized Form

Nested Class Summary
private  class EmpiricalDistributionImpl.ArrayDataAdapter
          DataAdapter for data provided as array of doubles.
private  class EmpiricalDistributionImpl.DataAdapter
          Provides methods for computing sampleStats and beanStats abstracting the source of data.
private  class EmpiricalDistributionImpl.DataAdapterFactory
          Factory of DataAdapter objects.
private  class EmpiricalDistributionImpl.StreamDataAdapter
          DataAdapter for data provided through some input stream
 
Field Summary
private  int binCount
          number of bins
private  List<SummaryStatistics> binStats
          List of SummaryStatistics objects characterizing the bins
private  boolean loaded
          is the distribution loaded?
private  RandomData randomData
          RandomData instance to use in repeated calls to getNext()
private  SummaryStatistics sampleStats
          Sample statistics
private static long serialVersionUID
          Serializable version identifier
private  double[] upperBounds
          upper bounds of subintervals in (0,1) "belonging" to the bins
 
Constructor Summary
EmpiricalDistributionImpl()
          Creates a new EmpiricalDistribution with the default bin count.
EmpiricalDistributionImpl(int binCount)
          Creates a new EmpiricalDistribution with the specified bin count.
 
Method Summary
private  void fillBinStats(Object in)
          Fills binStats array (second pass through data file).
private  int findBin(double min, double value, double delta)
          Returns the index of the bin to which the given value belongs
 int getBinCount()
          Returns the number of bins.
 List<SummaryStatistics> getBinStats()
          Returns a List of SummaryStatistics instances containing statistics describing the values in each of the bins.
 double getNextValue()
          Generates a random value from this distribution.
 StatisticalSummary getSampleStats()
          Returns a StatisticalSummary describing this distribution.
 double[] getUpperBounds()
          Returns (a fresh copy of) the array of upper bounds for the bins.
 boolean isLoaded()
          Property indicating whether or not the distribution has been loaded.
 void load(double[] in)
          Computes the empirical distribution from the provided array of numbers.
 void load(File file)
          Computes the empirical distribution from the input file.
 void load(URL url)
          Computes the empirical distribution using data read from a URL.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

serialVersionUID

private static final long serialVersionUID
Serializable version identifier

See Also:
Constant Field Values

binStats

private List<SummaryStatistics> binStats
List of SummaryStatistics objects characterizing the bins


sampleStats

private SummaryStatistics sampleStats
Sample statistics


binCount

private int binCount
number of bins


loaded

private boolean loaded
is the distribution loaded?


upperBounds

private double[] upperBounds
upper bounds of subintervals in (0,1) "belonging" to the bins


randomData

private RandomData randomData
RandomData instance to use in repeated calls to getNext()

Constructor Detail

EmpiricalDistributionImpl

public EmpiricalDistributionImpl()
Creates a new EmpiricalDistribution with the default bin count.


EmpiricalDistributionImpl

public EmpiricalDistributionImpl(int binCount)
Creates a new EmpiricalDistribution with the specified bin count.

Parameters:
binCount - number of bins
Method Detail

load

public void load(double[] in)
Computes the empirical distribution from the provided array of numbers.

Specified by:
load in interface EmpiricalDistribution
Parameters:
in - the input data array

load

public void load(URL url)
          throws IOException
Computes the empirical distribution using data read from a URL.

Specified by:
load in interface EmpiricalDistribution
Parameters:
url - url of the input file
Throws:
IOException - if an IO error occurs

load

public void load(File file)
          throws IOException
Computes the empirical distribution from the input file.

Specified by:
load in interface EmpiricalDistribution
Parameters:
file - the input file
Throws:
IOException - if an IO error occurs

fillBinStats

private void fillBinStats(Object in)
                   throws IOException
Fills binStats array (second pass through data file).

Parameters:
in - object providing access to the data
Throws:
IOException - if an IO error occurs

findBin

private int findBin(double min,
                    double value,
                    double delta)
Returns the index of the bin to which the given value belongs

Parameters:
min - the minimum value
value - the value whose bin we are trying to find
delta - the grid size
Returns:
the index of the bin containing the value

getNextValue

public double getNextValue()
                    throws IllegalStateException
Generates a random value from this distribution.

Specified by:
getNextValue in interface EmpiricalDistribution
Returns:
the random value.
Throws:
IllegalStateException - if the distribution has not been loaded

getSampleStats

public StatisticalSummary getSampleStats()
Returns a StatisticalSummary describing this distribution. Preconditions:

Specified by:
getSampleStats in interface EmpiricalDistribution
Returns:
the sample statistics
Throws:
IllegalStateException - if the distribution has not been loaded

getBinCount

public int getBinCount()
Returns the number of bins.

Specified by:
getBinCount in interface EmpiricalDistribution
Returns:
the number of bins.

getBinStats

public List<SummaryStatistics> getBinStats()
Returns a List of SummaryStatistics instances containing statistics describing the values in each of the bins. The list is indexed on the bin number.

Specified by:
getBinStats in interface EmpiricalDistribution
Returns:
List of bin statistics.

getUpperBounds

public double[] getUpperBounds()
Returns (a fresh copy of) the array of upper bounds for the bins. Bins are:
[min,upperBounds[0]],(upperBounds[0],upperBounds[1]],..., (upperBounds[binCount-1],max]

Specified by:
getUpperBounds in interface EmpiricalDistribution
Returns:
array of bin upper bounds

isLoaded

public boolean isLoaded()
Property indicating whether or not the distribution has been loaded.

Specified by:
isLoaded in interface EmpiricalDistribution
Returns:
true if the distribution has been loaded


Copyright (c) 2003-2010 Apache Software Foundation