Class Ai4r::Classifiers::ID3
In: lib/ai4r/classifiers/id3.rb
Parent: Classifier

Introduction

This is an implementation of the ID3 algorithm (Quinlan) Given a set of preclassified examples, it builds a top-down induction of decision tree, biased by the information gain and entropy measure.

How to use it

  DATA_LABELS = [ 'city', 'age_range', 'gender', 'marketing_target'  ]

  DATA_ITEMS = [
         ['New York',  '<30',      'M', 'Y'],
         ['Chicago',     '<30',      'M', 'Y'],
         ['Chicago',     '<30',      'F', 'Y'],
         ['New York',  '<30',      'M', 'Y'],
         ['New York',  '<30',      'M', 'Y'],
         ['Chicago',     '[30-50)',  'M', 'Y'],
         ['New York',  '[30-50)',  'F', 'N'],
         ['Chicago',     '[30-50)',  'F', 'Y'],
         ['New York',  '[30-50)',  'F', 'N'],
         ['Chicago',     '[50-80]', 'M', 'N'],
         ['New York',  '[50-80]', 'F', 'N'],
         ['New York',  '[50-80]', 'M', 'N'],
         ['Chicago',     '[50-80]', 'M', 'N'],
         ['New York',  '[50-80]', 'F', 'N'],
         ['Chicago',     '>80',      'F', 'Y']
       ]

  data_set = DataSet.new(:data_items=>DATA_SET, :data_labels=>DATA_LABELS)
  id3 = Ai4r::Classifiers::ID3.new.build(data_set)

  id3.get_rules
    # =>  if age_range=='<30' then marketing_target='Y'
          elsif age_range=='[30-50)' and city=='Chicago' then marketing_target='Y'
          elsif age_range=='[30-50)' and city=='New York' then marketing_target='N'
          elsif age_range=='[50-80]' then marketing_target='N'
          elsif age_range=='>80' then marketing_target='Y'
          else raise 'There was not enough information during training to do a proper induction for this data element' end

  id3.eval(['New York', '<30', 'M'])
    # =>  'Y'

A better way to load the data

In the real life you will use lot more data training examples, with more attributes. Consider moving your data to an external CSV (comma separate values) file.

  data_file = "#{File.dirname(__FILE__)}/data_set.csv"
  data_set = DataSet.load_csv_with_labels data_file
  id3 = Ai4r::Classifiers::ID3.new.build(data_set)

A nice tip for data evaluation

  id3 = Ai4r::Classifiers::ID3.new.build(data_set)

  age_range = '<30'
  marketing_target = nil
  eval id3.get_rules
  puts marketing_target
    # =>  'Y'

More about ID3 and decision trees

About the project

Author:Sergio Fierens
License:MPL 1.1
Url:ai4r.rubyforge.org/

Methods

build   eval   get_rules  

Constants

LOG2 = Math.log(2)

Attributes

data_set  [R] 

Public Instance methods

Create a new ID3 classifier. You must provide a DataSet instance as parameter. The last attribute of each item is considered as the item class.

You can evaluate new data, predicting its category. e.g.

  id3.eval(['New York',  '<30', 'F'])  # => 'Y'

This method returns the generated rules in ruby code. e.g.

  id3.get_rules
    # =>  if age_range=='<30' then marketing_target='Y'
          elsif age_range=='[30-50)' and city=='Chicago' then marketing_target='Y'
          elsif age_range=='[30-50)' and city=='New York' then marketing_target='N'
          elsif age_range=='[50-80]' then marketing_target='N'
          elsif age_range=='>80' then marketing_target='Y'
          else raise 'There was not enough information during training to do a proper induction for this data element' end

It is a nice way to inspect induction results, and also to execute them:

    age_range = '<30'
    marketing_target = nil
    eval id3.get_rules
    puts marketing_target
      # =>  'Y'

[Validate]