Implementation of Decision Tree Algorithm in Machine Learning

Types of Decision Tree Algorithm

1. ID3: The ID3 (Iterative Dichotomiser 3) algorithm was invented by Ross Quinlan to create trees from datasets. By calculating the entropy for every attribute in the dataset, this could be split into subsets based on the minimum entropy value. After the set had a decision tree node created, all that was required was to recursively go through the remaining attributes in the set. It uses the method of information gain. It is the measure of the difference in entropy before and after an attribute is split—to decide on the root node.

2. C4.5: Quinlan came back for an encore with the C4.5 algorithm. It’s also based on the information gain method, but it enables the trees to be used for classification. This is a widely used algorithm in that many users run in Weka with the open-source Java version of C4.5, the J48 algorithm.

For example, with a list of values like the following:

[c]85,80,83,70,68,65,64,72,69,75,75,72,81,71
[/c]
C4.5 will work out a split point for the attribute (a) and give a simple decision criterion of:
[c]a <= 80 or a > 80
[/c]

C4.5 can work despite missing attribute values. The missing values are marked with a question mark [c](?)[/c]. The gain and entropy calculations are simply skipped when there is no data available.

3. CHAID: The CHAID (Chi-squared Automatic Interaction Detection) technique was developed by Gordon V. Kass in 1980. Its main use of it was within marketing, but it was also used within medical and psychiatric research.

4. MARS: For numerical data, it might be worth investigating the MARS (Multivariate Adaptive Regression Splines) algorithm. You might see this as an open-source alternative called “Earth,” as MARS is trademarked by Salford Systems.