Parent

Files

Ai4r::Clusterers::Diana

DIANA (Divisive ANAlysis) (Kaufman and Rousseeuw, 1990; Macnaughton - Smith et al. 1964) is a Divisive Hierarchical Clusterer. It begins with only one cluster with all data items, and divides the clusters until the desired clusters number is reached.

Attributes

clusters[R]
data_set[R]
number_of_clusters[R]

Public Class Methods

new() click to toggle source
# File lib/ai4r/clusterers/diana.rb, line 31
def initialize
  @distance_function = lambda do |a,b| 
      Ai4r::Data::Proximity.squared_euclidean_distance(
        a.select {|att_a| att_a.is_a? Numeric} , 
        b.select {|att_b| att_b.is_a? Numeric})
    end
end

Public Instance Methods

build(data_set, number_of_clusters) click to toggle source

Build a new clusterer, using divisive analysis (DIANA algorithm)

# File lib/ai4r/clusterers/diana.rb, line 40
def build(data_set, number_of_clusters)
  @data_set = data_set
  @number_of_clusters = number_of_clusters
  @clusters = [@data_set[0..-1]]
  
  while(@clusters.length < @number_of_clusters)
    cluster_index_to_split = max_diameter_cluster(@clusters)
    cluster_to_split = @clusters[cluster_index_to_split]
    splinter_cluster = init_splinter_cluster(cluster_to_split)
    while true
      dist_diff, index = max_distance_difference(cluster_to_split, splinter_cluster)
      break if dist_diff < 0
      splinter_cluster << cluster_to_split.data_items[index]
      cluster_to_split.data_items.delete_at(index)
    end
    @clusters << splinter_cluster
  end
 
  return self
end
eval(data_item) click to toggle source

Classifies the given data item, returning the cluster index it belongs to (0-based).

# File lib/ai4r/clusterers/diana.rb, line 63
def eval(data_item)
  get_min_index(@clusters.collect do |cluster|
    distance_sum(data_item, cluster) / cluster.data_items.length
    end)
end

Protected Instance Methods

cluster_diameter(cluster) click to toggle source

Max distance between 2 items in a cluster

# File lib/ai4r/clusterers/diana.rb, line 86
def cluster_diameter(cluster)
  diameter = 0
  cluster.data_items.each_with_index do |item_a, item_a_pos|
    item_a_pos.times do |item_b_pos|
      d = @distance_function.call(item_a, cluster.data_items[item_b_pos])
      diameter = d if d > diameter
    end
  end
  return diameter
end
distance_sum(item_a, cluster) click to toggle source

Sum up the distance between an item and all the items in a cluster

# File lib/ai4r/clusterers/diana.rb, line 131
def distance_sum(item_a, cluster)
  cluster.data_items.inject(0.0) do |sum, item_b|
    sum + @distance_function.call(item_a, item_b)
  end
end
init_splinter_cluster(cluster_to_split) click to toggle source

Create a cluster with the item with mx distance to the rest of the cluster's items. That item is removed from the initial cluster.

# File lib/ai4r/clusterers/diana.rb, line 100
def init_splinter_cluster(cluster_to_split)
  max = 0.0
  max_index = 0
  cluster_to_split.data_items.each_with_index do |item, index|
    sum = distance_sum(item, cluster_to_split)
    max, max_index = sum, index if sum > max
  end
  splinter_cluster = cluster_to_split[max_index]
  cluster_to_split.data_items.delete_at(max_index)
  return splinter_cluster
end
max_diameter_cluster(clusters) click to toggle source

return the cluster with max diameter

# File lib/ai4r/clusterers/diana.rb, line 72
def max_diameter_cluster(clusters)
  max_index = 0
  max_diameter = 0
  clusters.each_with_index do |cluster, index|
    diameter = cluster_diameter(cluster)
    if diameter > max_diameter
      max_index = index
      max_diameter = diameter
    end
  end
  return max_index
end
max_distance_difference(cluster_to_split, splinter_cluster) click to toggle source

Return the max average distance between any item of cluster_to_split and the rest of items in that cluster, minus the average distance with the items of splinter_cluster, and the index of the item. A positive value means that the items is closer to the splinter group than to its current cluster.

# File lib/ai4r/clusterers/diana.rb, line 118
def max_distance_difference(cluster_to_split, splinter_cluster)
  max_diff = -1.0/0
  max_diff_index = 0
  cluster_to_split.data_items.each_with_index do |item, index|
    dist_a = distance_sum(item, cluster_to_split) / (cluster_to_split.data_items.length-1)
    dist_b = distance_sum(item, splinter_cluster) / (splinter_cluster.data_items.length)
    dist_diff = dist_a - dist_b
    max_diff, max_diff_index = dist_diff, index if dist_diff > max_diff
  end
  return max_diff, max_diff_index
end

[Validate]

Generated with the Darkfish Rdoc Generator 2.