Algorithm: Generate_decision_tree. Generate a decision tree from the training tuples of data partition D.
Input:
- Data partition, D, which is a set of training tuples and their associated class labels;
- Attribute_list, the set of candidate attributes;
- Attribute_selection_method, a procedure to determine the splitting criterion that "best" partitions the data tuples into individual classes. This criterion consists of a 'splitting_attribute' and, possibly, either a split point or splitting subset.
Output:
A decision tree
Method:
create a node N; if tuples in D are all of the same class C then return N as a leaf node labeled with the class C; if attribute_list is empty then return N as a leaf node labeled with the majority class in D; // majority voting apply Attribute_selection_method(D, attribute_list) to find the "best" splitting_criterion; label node N with splitting_criterion; if splitting_attribute is discrete-valued and multiway splits allowed then // not restricted to binary trees attribute_list ← attribute_list - splitting_attribute; // remove splitting_attribute for each outcome j of splitting_criterion // partition the tuples and grow subtrees for each partition let Dj be the set of data tuples in D satisfying outcome j; // a partition if Dj is empty then attach a leaf labeled with the majority class in D to node N; else attach the node returned by Generate_decision_tree(Dj, attribute_list) to node N; endfor return N;