org.apache.commons.math3.stat.clustering
Class DBSCANClusterer<T extends Clusterable<T>>

java.lang.Object
  extended by org.apache.commons.math3.stat.clustering.DBSCANClusterer<T>
Type Parameters:
T - type of the points to cluster

public class DBSCANClusterer<T extends Clusterable<T>>
extends Object

DBSCAN (density-based spatial clustering of applications with noise) algorithm.

The DBSCAN algorithm forms clusters based on the idea of density connectivity, i.e. a point p is density connected to another point q, if there exists a chain of points pi, with i = 1 .. n and p1 = p and pn = q, such that each pair <pi, pi+1> is directly density-reachable. A point q is directly density-reachable from point p if it is in the ε-neighborhood of this point.

Any point that is not density-reachable from a formed cluster is treated as noise, and will thus not be present in the result.

The algorithm requires two parameters:

Note: as DBSCAN is not a centroid-based clustering algorithm, the resulting Cluster objects will have no defined center, i.e. Cluster.getCenter() will return null.

Since:
3.1
Version:
$Id: DBSCANClusterer.java 1410882 2012-11-18 12:49:49Z tn $
See Also:
DBSCAN (wikipedia), A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise

Nested Class Summary
private static class DBSCANClusterer.PointStatus
          Status of a point during the clustering process.
 
Field Summary
private  double eps
          Maximum radius of the neighborhood to be considered.
private  int minPts
          Minimum number of points needed for a cluster.
 
Constructor Summary
DBSCANClusterer(double eps, int minPts)
          Creates a new instance of a DBSCANClusterer.
 
Method Summary
 List<Cluster<T>> cluster(Collection<T> points)
          Performs DBSCAN cluster analysis.
private  Cluster<T> expandCluster(Cluster<T> cluster, T point, List<T> neighbors, Collection<T> points, Map<Clusterable<T>,DBSCANClusterer.PointStatus> visited)
          Expands the cluster to include density-reachable items.
 double getEps()
          Returns the maximum radius of the neighborhood to be considered.
 int getMinPts()
          Returns the minimum number of points needed for a cluster.
private  List<T> getNeighbors(T point, Collection<T> points)
          Returns a list of density-reachable neighbors of a point.
private  List<T> merge(List<T> one, List<T> two)
          Merges two lists together.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

eps

private final double eps
Maximum radius of the neighborhood to be considered.


minPts

private final int minPts
Minimum number of points needed for a cluster.

Constructor Detail

DBSCANClusterer

public DBSCANClusterer(double eps,
                       int minPts)
                throws NotPositiveException
Creates a new instance of a DBSCANClusterer.

Parameters:
eps - maximum radius of the neighborhood to be considered
minPts - minimum number of points needed for a cluster
Throws:
NotPositiveException - if eps < 0.0 or minPts < 0
Method Detail

getEps

public double getEps()
Returns the maximum radius of the neighborhood to be considered.

Returns:
maximum radius of the neighborhood

getMinPts

public int getMinPts()
Returns the minimum number of points needed for a cluster.

Returns:
minimum number of points needed for a cluster

cluster

public List<Cluster<T>> cluster(Collection<T> points)
                                                throws NullArgumentException
Performs DBSCAN cluster analysis.

Note: as DBSCAN is not a centroid-based clustering algorithm, the resulting Cluster objects will have no defined center, i.e. Cluster.getCenter() will return null.

Parameters:
points - the points to cluster
Returns:
the list of clusters
Throws:
NullArgumentException - if the data points are null

expandCluster

private Cluster<T> expandCluster(Cluster<T> cluster,
                                 T point,
                                 List<T> neighbors,
                                 Collection<T> points,
                                 Map<Clusterable<T>,DBSCANClusterer.PointStatus> visited)
Expands the cluster to include density-reachable items.

Parameters:
cluster - Cluster to expand
point - Point to add to cluster
neighbors - List of neighbors
points - the data set
visited - the set of already visited points
Returns:
the expanded cluster

getNeighbors

private List<T> getNeighbors(T point,
                             Collection<T> points)
Returns a list of density-reachable neighbors of a point.

Parameters:
point - the point to look for
points - possible neighbors
Returns:
the List of neighbors

merge

private List<T> merge(List<T> one,
                      List<T> two)
Merges two lists together.

Parameters:
one - first list
two - second list
Returns:
merged lists


Copyright (c) 2003-2013 Apache Software Foundation