org.biojava.bio.dist
Class DistributionTools

java.lang.Object
  extended by org.biojava.bio.dist.DistributionTools

public final class DistributionTools
extends Object

A class to hold static methods for calculations and manipulations using Distributions.

Since:
1.2
Author:
Mark Schreiber, Matthew Pocock

Method Summary
static boolean areEmissionSpectraEqual(Distribution[] a, Distribution[] b)
          Compares the emission spectra of two distribution arrays.
static boolean areEmissionSpectraEqual(Distribution a, Distribution b)
          Compares the emission spectra of two distributions.
static Distribution average(Distribution[] dists)
          Averages two or more distributions.
static double bitsOfInformation(Distribution observed)
          Calculates the total bits of information for a distribution.
static Distribution countToDistribution(Count c)
          Make a distribution from a count.
static Distribution[] distOverAlignment(Alignment a)
          Equivalent to distOverAlignment(a, false, 0.0).
static Distribution[] distOverAlignment(Alignment a, boolean countGaps)
          Creates an array of distributions, one for each column of the alignment.
static Distribution[] distOverAlignment(Alignment a, boolean countGaps, double nullWeight)
          Creates an array of distributions, one for each column of the alignment.
protected static Sequence generateOrderNSequence(String name, OrderNDistribution d, int length)
          Deprecated. use generateSequence() or generateSymbolList() instead.
static Sequence generateSequence(String name, Distribution d, int length)
          Produces a sequence by randomly sampling the Distribution.
static SymbolList generateSymbolList(Distribution d, int length)
          Produces a SymbolList by randomly sampling a Distribution.
static Distribution jointDistOverAlignment(Alignment a, boolean countGaps, double nullWeight, int[] cols)
          Creates a joint distribution.
static HashMap KLDistance(Distribution observed, Distribution expected, double logBase)
          A method to calculate the Kullback-Liebler Distance (relative entropy).
static void randomizeDistribution(Distribution d)
          Randomizes the weights of a Distribution.
static Distribution readFromXML(InputStream is)
          Read a distribution from XML.
static HashMap shannonEntropy(Distribution observed, double logBase)
          A method to calculate the Shannon Entropy for a Distribution.
static double totalEntropy(Distribution observed)
          Calculates the total Entropy for a Distribution.
static void writeToXML(Distribution d, OutputStream os)
          Writes a Distribution to XML that can be read with the readFromXML method.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

writeToXML

public static void writeToXML(Distribution d,
                              OutputStream os)
                       throws IOException
Writes a Distribution to XML that can be read with the readFromXML method.

Parameters:
d - the Distribution to write.
os - where to write it to.
Throws:
IOException - if writing fails

readFromXML

public static Distribution readFromXML(InputStream is)
                                throws IOException,
                                       SAXException
Read a distribution from XML.

Parameters:
is - an InputStream to read from
Returns:
a Distribution parameterised by the xml in is
Throws:
IOException - if is failed
SAXException - if is could not be processed as XML

randomizeDistribution

public static void randomizeDistribution(Distribution d)
                                  throws ChangeVetoException
Randomizes the weights of a Distribution.

Parameters:
d - the Distribution to randomize
Throws:
ChangeVetoException - if the Distribution is locked

countToDistribution

public static Distribution countToDistribution(Count c)
Make a distribution from a count.

Parameters:
c - the count
Returns:
a Distrubution over the same FiniteAlphabet as c and trained with the counts of c

areEmissionSpectraEqual

public static final boolean areEmissionSpectraEqual(Distribution a,
                                                    Distribution b)
                                             throws BioException
Compares the emission spectra of two distributions.

Parameters:
a - A Distribution with the same Alphabet as b
b - A Distribution with the same Alphabet as a
Returns:
true if alphabets and symbol weights are equal for the two distributions.
Throws:
BioException - if one or both of the Distributions are over infinite alphabets.
Since:
1.2

areEmissionSpectraEqual

public static final boolean areEmissionSpectraEqual(Distribution[] a,
                                                    Distribution[] b)
                                             throws BioException
Compares the emission spectra of two distribution arrays.

Parameters:
a - A Distribution[] consisting of Distributions over a FiniteAlphabet
b - A Distribution[] consisting of Distributions over a FiniteAlphabet
Returns:
true if alphabets and symbol weights are equal for each pair of distributions. Will return false if the arrays are of unequal length.
Throws:
BioException - if one of the Distributions is over an infinite alphabet.
Since:
1.3

KLDistance

public static final HashMap KLDistance(Distribution observed,
                                       Distribution expected,
                                       double logBase)
A method to calculate the Kullback-Liebler Distance (relative entropy).

Parameters:
logBase - - the log base for the entropy calculation. 2 is standard.
observed - - the observed frequence of Symbols .
expected - - the excpected or background frequency.
Returns:
- A HashMap mapping Symbol to (Double) relative entropy.
Since:
1.2

shannonEntropy

public static final HashMap shannonEntropy(Distribution observed,
                                           double logBase)
A method to calculate the Shannon Entropy for a Distribution.

Parameters:
logBase - - the log base for the entropy calculation. 2 is standard.
observed - - the observed frequence of Symbols .
Returns:
- A HashMap mapping Symbol to (Double) entropy.
Since:
1.2

totalEntropy

public static double totalEntropy(Distribution observed)
Calculates the total Entropy for a Distribution. Entropies for individual Symbols are weighted by their probability of occurence.

Parameters:
observed - the observed frequence of Symbols .
Returns:
the total entropy of the Distribution .

bitsOfInformation

public static final double bitsOfInformation(Distribution observed)
Calculates the total bits of information for a distribution.

Parameters:
observed - - the observed frequence of Symbols .
Returns:
the total information content of the Distribution .
Since:
1.2

distOverAlignment

public static Distribution[] distOverAlignment(Alignment a)
                                        throws IllegalAlphabetException
Equivalent to distOverAlignment(a, false, 0.0).

Parameters:
a - the Alignment
Returns:
an array of Distribution instances representing columns of the alignment
Throws:
IllegalAlphabetException - if the alignment alphabet is not compattible

jointDistOverAlignment

public static final Distribution jointDistOverAlignment(Alignment a,
                                                        boolean countGaps,
                                                        double nullWeight,
                                                        int[] cols)
                                                 throws IllegalAlphabetException
Creates a joint distribution.

Parameters:
a - the Alignment to build the Distribution[] over.
countGaps - if true gaps will be included in the distributions (NOT YET IMPLEMENTED!!, CURRENTLY EITHER OPTION WILL PRODUCE THE SAME RESULT)
nullWeight - the number of pseudo counts to add to each distribution
cols - a list of positions in the alignment to include in the joint distribution
Returns:
a Distribution
Throws:
IllegalAlphabetException - if all sequences don't use the same alphabet
Since:
1.2

distOverAlignment

public static final Distribution[] distOverAlignment(Alignment a,
                                                     boolean countGaps,
                                                     double nullWeight)
                                              throws IllegalAlphabetException
Creates an array of distributions, one for each column of the alignment.

Parameters:
a - the Alignment to build the Distribution[] over.
countGaps - if true gaps will be included in the distributions
nullWeight - the number of pseudo counts to add to each distribution, pseudo counts will not affect gaps, no gaps, no gap counts.
Returns:
a Distribution[] where each member of the array is a Distribution of the Symbols found at that position of the Alignment .
Throws:
IllegalAlphabetException - if all sequences don't use the same alphabet
Since:
1.2

distOverAlignment

public static final Distribution[] distOverAlignment(Alignment a,
                                                     boolean countGaps)
                                              throws IllegalAlphabetException
Creates an array of distributions, one for each column of the alignment. No pseudo counts are used.

Parameters:
countGaps - if true gaps will be included in the distributions
a - the Alignment to build the Distribution[] over.
Returns:
a Distribution[] where each member of the array is a Distribution of the Symbols found at that position of the Alignment .
Throws:
IllegalAlphabetException - if the alignment is not composed from sequences all with the same alphabet
Since:
1.2

average

public static final Distribution average(Distribution[] dists)
Averages two or more distributions. NOTE the current implementation ignore the null model.

Parameters:
dists - the Distributions to average
Returns:
a Distribution were the weight of each Symbol is the average of the weights of that Symbol in each Distribution .
Since:
1.2

generateSequence

public static final Sequence generateSequence(String name,
                                              Distribution d,
                                              int length)
Produces a sequence by randomly sampling the Distribution.

Parameters:
name - the name for the sequence
d - the distribution to sample. If this distribution is of order N a seed sequence is generated allowed to 'burn in' for 1000 iterations and used to produce a sequence over the conditioned alphabet.
length - the number of symbols in the sequence.
Returns:
a Sequence with name and urn = to name and an Empty Annotation.

generateSymbolList

public static final SymbolList generateSymbolList(Distribution d,
                                                  int length)
Produces a SymbolList by randomly sampling a Distribution.

Parameters:
d - the distribution to sample. If this distribution is of order N a seed sequence is generated allowed to 'burn in' for 1000 iterations and used to produce a sequence over the conditioned alphabet.
length - the number of symbols in the sequence.
Returns:
a SymbolList or length length

generateOrderNSequence

protected static final Sequence generateOrderNSequence(String name,
                                                       OrderNDistribution d,
                                                       int length)
Deprecated. use generateSequence() or generateSymbolList() instead.

Generate a sequence by sampling a distribution.

Parameters:
name - the name of the sequence
d - the distribution to sample
length - the length of the sequence
Returns:
a new sequence with the required composition