Wednesday 15 March 2017

Lets Start with R-Decision Tree-

Install R Package :


Use the below command in R console to install the package. 

install.packages("party")

The package "party" has the function ctree() which is used to create and analyze decison tree.

Syntax :

The basic syntax to create a decision tree is −
ctree(formula, data)

Where
  • formula describing the predictor and response variables.
  • data is the name of the data set used.


Input Data :


We will use the R in-built data set named readingSkills to create a decision tree. It describes the score of someone's readingSkills if we know the variables "age","shoesize","score" and whether the person is a native speaker or not.
Here is the sample data.
# Load the party package.
library(party)

# Print some records from data set readingSkills.
print(head(readingSkills))

When we execute the above code, it produces the following result and chart −

  nativeSpeaker   age   shoeSize      score
1           yes     5   24.83189   32.29385
2           yes     6   25.95238   36.63105
3            no    11   30.42170   49.60593
4           yes     7   28.66450   40.28456
5           yes    11   31.88207   55.46085
6           yes    10   30.07843   52.83124
Loading required package: methods
Loading required package: grid
...............................
...............................


Example:

We will use the ctree() function to create the decision tree and see its graph.

library(party)

# Create the input data frame.
input.dat <- readingSkills[c(1:105),]

# Give the chart file a name.
png(file = "decision_tree.png")

# Create the tree.
  output.tree <- ctree(
  nativeSpeaker ~ age + shoeSize + score, 
  data = input.dat)

# Plot the tree.
plot(output.tree)

# Save the file.
dev.off()
When we execute the above code, it produces the following result −
null device 
          1 
Loading required package: methods
Loading required package: grid
Loading required package: mvtnorm
Loading required package: modeltools
Loading required package: stats4
Loading required package: strucchange
Loading required package: zoo

Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

    as.Date, as.Date.numeric

Loading required package: sandwich
Decision Tree using R


Conclusion:

From the above tree we can conclude that anyone whose readingSkills score is less than 38.3 and age is more than 6 is not a native Speaker.

Thats it !
Thank you and keep visiting.