org.apache.commons.math3.stat.inference
Class ChiSquareTest

java.lang.Object
  extended by org.apache.commons.math3.stat.inference.ChiSquareTest

public class ChiSquareTest
extends Object

Implements Chi-Square test statistics.

This implementation handles both, known and unknown distributions.

Two samples tests are used when the distribution is unknown a priori but provided by one sample. We compare the second sample against the first.

Version:
$Id: ChiSquareTest.java 1244107 2012-02-14 16:17:55Z erans $

Constructor Summary
ChiSquareTest()
          Construct a ChiSquareTest
 
Method Summary
private  void checkArray(long[][] in)
          Checks to make sure that the input long[][] array is rectangular, has at least 2 rows and 2 columns, and has all non-negative entries.
private  void checkNonNegative(long[] in)
          Check all entries of the input array are >= 0.
private  void checkNonNegative(long[][] in)
          Check all entries of the input array are >= 0.
private  void checkPositive(double[] in)
          Check all entries of the input array are strictly positive.
private  void checkRectangular(long[][] in)
          Throws DimensionMismatchException if the input array is not rectangular.
 double chiSquare(double[] expected, long[] observed)
          Computes the Chi-Square statistic comparing observed and expected frequency counts.
 double chiSquare(long[][] counts)
          Computes the Chi-Square statistic associated with a chi-square test of independence based on the input counts array, viewed as a two-way table.
 double chiSquareDataSetsComparison(long[] observed1, long[] observed2)
          Computes a Chi-Square two sample test statistic comparing bin frequency counts in observed1 and observed2.
 double chiSquareTest(double[] expected, long[] observed)
          Returns the observed significance level, or p-value, associated with a Chi-square goodness of fit test comparing the observed frequency counts to those in the expected array.
 boolean chiSquareTest(double[] expected, long[] observed, double alpha)
          Performs a Chi-square goodness of fit test evaluating the null hypothesis that the observed counts conform to the frequency distribution described by the expected counts, with significance level alpha.
 double chiSquareTest(long[][] counts)
          Returns the observed significance level, or p-value, associated with a chi-square test of independence based on the input counts array, viewed as a two-way table.
 boolean chiSquareTest(long[][] counts, double alpha)
          Performs a chi-square test of independence evaluating the null hypothesis that the classifications represented by the counts in the columns of the input 2-way table are independent of the rows, with significance level alpha.
 double chiSquareTestDataSetsComparison(long[] observed1, long[] observed2)
          Returns the observed significance level, or p-value, associated with a Chi-Square two sample test comparing bin frequency counts in observed1 and observed2.
 boolean chiSquareTestDataSetsComparison(long[] observed1, long[] observed2, double alpha)
          Performs a Chi-Square two sample test comparing two binned data sets.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ChiSquareTest

public ChiSquareTest()
Construct a ChiSquareTest

Method Detail

chiSquare

public double chiSquare(double[] expected,
                        long[] observed)
                 throws NotPositiveException,
                        NotStrictlyPositiveException,
                        DimensionMismatchException
Computes the Chi-Square statistic comparing observed and expected frequency counts.

This statistic can be used to perform a Chi-Square test evaluating the null hypothesis that the observed counts follow the expected distribution.

Preconditions:

If any of the preconditions are not met, an IllegalArgumentException is thrown.

Note: This implementation rescales the expected array if necessary to ensure that the sum of the expected and observed counts are equal.

Parameters:
observed - array of observed frequency counts
expected - array of expected frequency counts
Returns:
chiSquare test statistic
Throws:
NotPositiveException - if one element of expected is not positive
NotStrictlyPositiveException - if one element of observed is not strictly positive
DimensionMismatchException - if the arrays length is less than 2

chiSquareTest

public double chiSquareTest(double[] expected,
                            long[] observed)
                     throws NotPositiveException,
                            NotStrictlyPositiveException,
                            DimensionMismatchException,
                            MaxCountExceededException
Returns the observed significance level, or p-value, associated with a Chi-square goodness of fit test comparing the observed frequency counts to those in the expected array.

The number returned is the smallest significance level at which one can reject the null hypothesis that the observed counts conform to the frequency distribution described by the expected counts.

Preconditions:

If any of the preconditions are not met, an IllegalArgumentException is thrown.

Note: This implementation rescales the expected array if necessary to ensure that the sum of the expected and observed counts are equal.

Parameters:
observed - array of observed frequency counts
expected - array of expected frequency counts
Returns:
p-value
Throws:
NotPositiveException - if one element of expected is not positive
NotStrictlyPositiveException - if one element of observed is not strictly positive
DimensionMismatchException - if the arrays length is less than 2
MaxCountExceededException - if an error occurs computing the p-value

chiSquareTest

public boolean chiSquareTest(double[] expected,
                             long[] observed,
                             double alpha)
                      throws NotPositiveException,
                             NotStrictlyPositiveException,
                             DimensionMismatchException,
                             OutOfRangeException,
                             MaxCountExceededException
Performs a Chi-square goodness of fit test evaluating the null hypothesis that the observed counts conform to the frequency distribution described by the expected counts, with significance level alpha. Returns true iff the null hypothesis can be rejected with 100 * (1 - alpha) percent confidence.

Example:
To test the hypothesis that observed follows expected at the 99% level, use

chiSquareTest(expected, observed, 0.01)

Preconditions:

If any of the preconditions are not met, an IllegalArgumentException is thrown.

Note: This implementation rescales the expected array if necessary to ensure that the sum of the expected and observed counts are equal.

Parameters:
observed - array of observed frequency counts
expected - array of expected frequency counts
alpha - significance level of the test
Returns:
true iff null hypothesis can be rejected with confidence 1 - alpha
Throws:
NotPositiveException - if one element of expected is not positive
NotStrictlyPositiveException - if one element of observed is not strictly positive
DimensionMismatchException - if the arrays length is less than 2
OutOfRangeException - if alpha is not in the range (0, 0.5]
MaxCountExceededException - if an error occurs computing the p-value

chiSquare

public double chiSquare(long[][] counts)
                 throws NullArgumentException,
                        NotPositiveException,
                        DimensionMismatchException
Computes the Chi-Square statistic associated with a chi-square test of independence based on the input counts array, viewed as a two-way table.

The rows of the 2-way table are count[0], ... , count[count.length - 1]

Preconditions:

If any of the preconditions are not met, an IllegalArgumentException is thrown.

Parameters:
counts - array representation of 2-way table
Returns:
chiSquare test statistic
Throws:
NullArgumentException - if the array is null
DimensionMismatchException - if the array is not rectangular
NotPositiveException - if one entry is not positive

chiSquareTest

public double chiSquareTest(long[][] counts)
                     throws NullArgumentException,
                            DimensionMismatchException,
                            NotPositiveException,
                            MaxCountExceededException
Returns the observed significance level, or p-value, associated with a chi-square test of independence based on the input counts array, viewed as a two-way table.

The rows of the 2-way table are count[0], ... , count[count.length - 1]

Preconditions:

If any of the preconditions are not met, an IllegalArgumentException is thrown.

Parameters:
counts - array representation of 2-way table
Returns:
p-value
Throws:
NullArgumentException - if the array is null
DimensionMismatchException - if the array is not rectangular
NotPositiveException - if one entry is not positive
MaxCountExceededException - if an error occurs computing the p-value

chiSquareTest

public boolean chiSquareTest(long[][] counts,
                             double alpha)
                      throws NullArgumentException,
                             DimensionMismatchException,
                             NotPositiveException,
                             OutOfRangeException,
                             MaxCountExceededException
Performs a chi-square test of independence evaluating the null hypothesis that the classifications represented by the counts in the columns of the input 2-way table are independent of the rows, with significance level alpha. Returns true iff the null hypothesis can be rejected with 100 * (1 - alpha) percent confidence.

The rows of the 2-way table are count[0], ... , count[count.length - 1]

Example:
To test the null hypothesis that the counts in count[0], ... , count[count.length - 1] all correspond to the same underlying probability distribution at the 99% level, use

chiSquareTest(counts, 0.01)

Preconditions:

If any of the preconditions are not met, an IllegalArgumentException is thrown.

Parameters:
counts - array representation of 2-way table
alpha - significance level of the test
Returns:
true iff null hypothesis can be rejected with confidence 1 - alpha
Throws:
NullArgumentException - if the array is null
DimensionMismatchException - if the array is not rectangular
NotPositiveException - if one entry is not positive
OutOfRangeException - if alpha is not in the range (0, 0.5]
MaxCountExceededException - if an error occurs computing the p-value

chiSquareDataSetsComparison

public double chiSquareDataSetsComparison(long[] observed1,
                                          long[] observed2)
                                   throws DimensionMismatchException,
                                          NotPositiveException,
                                          ZeroException

Computes a Chi-Square two sample test statistic comparing bin frequency counts in observed1 and observed2. The sums of frequency counts in the two samples are not required to be the same. The formula used to compute the test statistic is

∑[(K * observed1[i] - observed2[i]/K)2 / (observed1[i] + observed2[i])] where
K = &sqrt;[&sum(observed2 / ∑(observed1)]

This statistic can be used to perform a Chi-Square test evaluating the null hypothesis that both observed counts follow the same distribution.

Preconditions:

If any of the preconditions are not met, an IllegalArgumentException is thrown.

Parameters:
observed1 - array of observed frequency counts of the first data set
observed2 - array of observed frequency counts of the second data set
Returns:
chiSquare test statistic
Throws:
DimensionMismatchException - the the length of the arrays does not match
NotPositiveException - if one entry in observed1 or observed2 is not positive
ZeroException - if either all counts of observed1 or observed2 are zero, or if the count at the same index is zero for both arrays
Since:
1.2

chiSquareTestDataSetsComparison

public double chiSquareTestDataSetsComparison(long[] observed1,
                                              long[] observed2)
                                       throws DimensionMismatchException,
                                              NotPositiveException,
                                              ZeroException,
                                              MaxCountExceededException

Returns the observed significance level, or p-value, associated with a Chi-Square two sample test comparing bin frequency counts in observed1 and observed2.

The number returned is the smallest significance level at which one can reject the null hypothesis that the observed counts conform to the same distribution.

See chiSquareDataSetsComparison(long[], long[]) for details on the formula used to compute the test statistic. The degrees of of freedom used to perform the test is one less than the common length of the input observed count arrays.

Preconditions:

If any of the preconditions are not met, an IllegalArgumentException is thrown.

Parameters:
observed1 - array of observed frequency counts of the first data set
observed2 - array of observed frequency counts of the second data set
Returns:
p-value
Throws:
DimensionMismatchException - the the length of the arrays does not match
NotPositiveException - if one entry in observed1 or observed2 is not positive
ZeroException - if either all counts of observed1 or observed2 are zero, or if the count at the same index is zero for both arrays
MaxCountExceededException - if an error occurs computing the p-value
Since:
1.2

chiSquareTestDataSetsComparison

public boolean chiSquareTestDataSetsComparison(long[] observed1,
                                               long[] observed2,
                                               double alpha)
                                        throws DimensionMismatchException,
                                               NotPositiveException,
                                               ZeroException,
                                               OutOfRangeException,
                                               MaxCountExceededException

Performs a Chi-Square two sample test comparing two binned data sets. The test evaluates the null hypothesis that the two lists of observed counts conform to the same frequency distribution, with significance level alpha. Returns true iff the null hypothesis can be rejected with 100 * (1 - alpha) percent confidence.

See chiSquareDataSetsComparison(long[], long[]) for details on the formula used to compute the Chisquare statistic used in the test. The degrees of of freedom used to perform the test is one less than the common length of the input observed count arrays.

Preconditions:

If any of the preconditions are not met, an IllegalArgumentException is thrown.

Parameters:
observed1 - array of observed frequency counts of the first data set
observed2 - array of observed frequency counts of the second data set
alpha - significance level of the test
Returns:
true iff null hypothesis can be rejected with confidence 1 - alpha
Throws:
DimensionMismatchException - the the length of the arrays does not match
NotPositiveException - if one entry in observed1 or observed2 is not positive
ZeroException - if either all counts of observed1 or observed2 are zero, or if the count at the same index is zero for both arrays
OutOfRangeException - if alpha is not in the range (0, 0.5]
MaxCountExceededException - if an error occurs performing the test
Since:
1.2

checkArray

private void checkArray(long[][] in)
                 throws NullArgumentException,
                        DimensionMismatchException,
                        NotPositiveException
Checks to make sure that the input long[][] array is rectangular, has at least 2 rows and 2 columns, and has all non-negative entries.

Parameters:
in - input 2-way table to check
Throws:
NullArgumentException - if the array is null
DimensionMismatchException - if the array is not valid
NotPositiveException - if one entry is not positive

checkRectangular

private void checkRectangular(long[][] in)
                       throws NullArgumentException,
                              DimensionMismatchException
Throws DimensionMismatchException if the input array is not rectangular.

Parameters:
in - array to be tested
Throws:
NullArgumentException - if input array is null
DimensionMismatchException - if input array is not rectangular

checkPositive

private void checkPositive(double[] in)
                    throws NotStrictlyPositiveException
Check all entries of the input array are strictly positive.

Parameters:
in - Array to be tested.
Throws:
NotStrictlyPositiveException - if one entry is not strictly positive.

checkNonNegative

private void checkNonNegative(long[] in)
                       throws NotPositiveException
Check all entries of the input array are >= 0.

Parameters:
in - Array to be tested.
Throws:
NotPositiveException - if one entry is negative.

checkNonNegative

private void checkNonNegative(long[][] in)
                       throws NotPositiveException
Check all entries of the input array are >= 0.

Parameters:
in - Array to be tested.
Throws:
NotPositiveException - if one entry is negative.


Copyright (c) 2003-2013 Apache Software Foundation