Algorithm: Generate_decision_tree. Generate a decision tree from the training tuples of data partition D.

Input:

  • Data partition, D, which is a set of training tuples and their associated class labels;
  • Attribute_list, the set of candidate attributes;
  • Attribute_selection_method, a procedure to determine the splitting criterion that "best" partitions the data tuples into individual classes. This criterion consists of a 'splitting_attribute' and, possibly, either a split point or splitting subset.


Output:

A decision tree


Method:

create a node N;

if tuples in D are all of the same class C then
 	return N as a leaf node labeled with the class C;

if attribute_list is empty then
	return N as a leaf node labeled with the majority class in D;	// majority voting

apply Attribute_selection_method(D, attribute_list) to find the "best" splitting_criterion;

label node N with splitting_criterion;

if splitting_attribute is discrete-valued and multiway splits allowed then	// not restricted to binary trees
	attribute_list ← attribute_list - splitting_attribute;	// remove splitting_attribute

for each outcome j of splitting_criterion	// partition the tuples and grow subtrees for each partition
	let Dj be the set of data tuples in D satisfying outcome j;	// a partition
	if Dj is empty then
		attach a leaf labeled with the majority class in D to node N;
	else attach the node returned by Generate_decision_tree(Dj, attribute_list) to node N;
endfor

return N;



References