Quantcast
Viewing all articles
Browse latest Browse all 209

Growing some Trees

(This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers)

Consider here the dataset used in a previous post, about visualising a classification (with more than 2 features),

> MYOCARDE=read.table(
+ "http://freakonometrics.free.fr/saporta.csv",
+ header=TRUE,sep=";")

The default classification tree is

> arbre = rpart(factor(PRONO)~.,data=MYOCARDE)
> rpart.plot(arbre,type=4,extra=6)

Image may be NSFW.
Clik here to view.

We can change the options here, such as the minimum number of observations, per node

> arbre = rpart(factor(PRONO)~.,data=MYOCARDE,
+       control=rpart.control(minsplit=10))
> rpart.plot(arbre,type=4,extra=6)

Image may be NSFW.
Clik here to view.

or

> arbre = rpart(factor(PRONO)~.,data=MYOCARDE,
+        control=rpart.control(minsplit=5))
> rpart.plot(arbre,type=4,extra=6)

Image may be NSFW.
Clik here to view.

To visualize that classification, use the following code (to get a projection on the first two components)

> library(FactoMineR) # ACP (sur les var continues)
> X=MYOCARDE[,1:7]
> acp=PCA(X,ncp=ncol(X))
> M=acp$var$coord
> m=apply(X,2,mean)
> s=apply(X,2,sd)
> 
> arbre = rpart(factor(PRONO)~.,data=MYOCARDE)
> pred2=function(d1,d2,Mat,tree){
+   z=Mat %*% c(d1,d2,rep(0,ncol(X)-2))
+   newd=data.frame(t(z*s+m))
+   names(newd)=names(X)
+   predict(tree,newdata=newd,
+           type="prob")[2] }
> p=function(d1,d2) pred2(d1,d2,Minv,arbre)

> Outer <- function(x,y,fun) {
+   mat <- matrix(NA, length(x), length(y))
+   for (i in seq_along(x)) {
+     for (j in seq_along(y)) 
+       mat[i,j]=fun(x[i],y[j])}
+   return(mat)}

> xgrid=seq(-5,5,length=251)
> ygrid=seq(-5,5,length=251)
> zgrid=Outer(xgrid,ygrid,p)
> bluereds=c(
+   rgb(1,0,0,(10:0)/25),rgb(0,0,1,(0:10)/25))

> acp2=PCA(MYOCARDE,quali.sup=8,graph=TRUE)
> plot(acp2, habillage = 8,col.hab=c("red","blue"))
> image(xgrid,ygrid,zgrid,add=TRUE,col=bluereds)
> contour(xgrid,ygrid,zgrid,add=TRUE,levels=.5)

Image may be NSFW.
Clik here to view.

It is also possible to consider the case where

> arbre = rpart(factor(PRONO)~.,data=MYOCARDE,
+        control=rpart.control(minsplit=5))

Image may be NSFW.
Clik here to view.

Finaly, one can also grow more trees, obtained by sampling. This is the idea of bagging: we boostrap our observations, we grow some trees, and then, we aggregate the predicted values. On the grid

> xgrid=seq(-5,5,length=201)
> ygrid=seq(-5,5,length=201)

the code is the following,

> Z = matrix(0,201,201)
> for(i in 1:200){
+ indice = sample(1:nrow(MYOCARDE),
+          size=nrow(MYOCARDE),
+          replace=TRUE)
+ ECHANTILLON=MYOCARDE[indice,]
+ arbre_b = rpart(factor(PRONO)~.,
+   data=ECHANTILLON)
+ p2 = function(d1,d2) pred2(d1,d2, Minv,arbre_b)
+ zgrid2_b = Outer(xgrid,ygrid,p2)
+ Z = Z+zgrid2_b }
> Zgrid = Z/200

To visualize it, use

> plot(acp2, habillage = 8,
+ col.hab=c("red","blue"))
> image(xgrid,ygrid,Zgrid,add=TRUE,
+ col=bluereds)

Image may be NSFW.
Clik here to view.

> contour(xgrid,ygrid,Zgrid,add=TRUE,
+ levels=.5,lwd=3)

Image may be NSFW.
Clik here to view.

Last, but not least, it is possible to use some random forrest algorithm. The method combines Breiman’s bagging idea (mentioned previously) and the random selection of features.

> library(randomForest)
> foret = randomForest(factor(PRONO)~.,
+          data=MYOCARDE)
> pF=function(d1,d2) pred2(d1,d2,Minv,foret)
> zgridF=Outer(xgrid,ygrid,pF)
 
> acp2=PCA(MYOCARDE,quali.sup=8,graph=TRUE)
> plot(acp2, habillage = 8,col.hab=c("red","blue"))
> image(xgrid,ygrid,Zgrid,add=TRUE,
+ col=bluereds)
> contour(xgrid,ygrid,zgridF,
+ add=TRUE,levels=.5,lwd=3)

Image may be NSFW.
Clik here to view.

To leave a comment for the author, please follow the link and comment on his blog: Freakonometrics » R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Viewing all articles
Browse latest Browse all 209

Trending Articles