środa, 15 grudnia 2010

Decision trees with R and i-Class

One again the dataset ,,A Week in the Life of a Browser'' is in use. This time the in-database decision tree procedure is presented.

Here, decision tree is used to find variables which discriminate two class of Firefox users: those who use only Firefox and those who use Firefox and some other browser (question Q2 in the survey, see the dataset description https://testpilot.mozillalabs.com/testcases/a-week-life-2/aggregated-data.html).

There is in-database DECTREE procedure available in i-Class. Below its R wrapper is presented.  It is called nzDecTree() and is available in the nza package. Note that the tree model is build in database, and only the model description is downloaded to R.

There are two advantages of that: the model is build in parallel and also there is no need for sending large dataset from database to R.

> nzConnect("user","password","10.1.1.74","witl") 
> nzSurvey = nz.data.frame("survey")
> # build decision tree in NPS and download it to R
> treeT = nzDecTree(Q2~Q1+Q5+Q6+Q7, nzSurvey, id="IUSER_ID", minsplit=10)
> # modification of nodes and levels labels, the original raw variable names,
> #   Q2, Q5, Q8 are not very human-readable
> levels(treeT$frame$yval) = c("only Firefox","other browsers")
> levels(treeT$frame$var) = c("", "How long use Firefox", "How old are you", 
+            "How much time do you spend on the Web")
> # generic function for decision tree plotting and printing 
> plot(treeT)
> treeT
node), split, n, deviance, yval, (yprob)
      * denotes terminal node

 1) root 4081 0 other browsers ( 0.3416 0.6584 )  
   2) How long use Firefox=0 24 0 other browsers ( 0.1667 0.8333 )  
     4) How much time do you spend on the Web=0 20 0 other browsers ( 0.0000 1.0000 ) *
     5) How much time do you spend on the Web < >0 4 0 only Firefox ( 0.7500 0.2500 ) *
   3) How long use Firefox < >0 4057 0 other browsers ( 0.3379 0.6621 )  
     6) How old are you=0 450 0 only Firefox ( 0.5067 0.4933 )  
      12) How long use Firefox=0 33 0 other browsers ( 0.2424 0.7576 ) *
      13) How long use Firefox < >0 417 0 only Firefox ( 0.5276 0.4724 ) *
     7) How old are you < >0 3607 0 other browsers ( 0.3169 0.6831 ) *