In following example the dataset ,,A Week in the Life of a Browser'' is used to show how to prepare simple data visualization with iClass and NPS (Netezza Performance Server, see www.netezza.com). The i-Class is used in the version LA1 (Limited availability v1). There is no guarantee that this code will work in next releases of i-Class.
The dataset ,,A Week in the Life of a Browser'' was published by Mozilla Labs and are available under the terms of the Creative Commons License. The dataset and it's description is available here: https://testpilot.mozillalabs.com/testcases/a-week-life-2/aggregated-data.html
We use only the R interface, so there is no need for SQL code to prepare following figures. Note that data are stores in remote server, and are too large to download it into R client. Thus all data aggregates are computed in NPS and only small summaries are downloaded to R client in order to prepare following figures.
Two R packages are used to connect with database and compute in-database aggregates. These are: nzr and nza. The nzConnect() function is used to connect with database witl. There are three tables in this databases, namely: users, events and survey.
They are described here https://testpilot.mozillalabs.com/testcases/a-week-life-2/aggregated-data.html. Then the nzTable() function is used to compute contingency table in NPS, this table is downloaded and used to prepare plot.
The dataset ,,A Week in the Life of a Browser'' was published by Mozilla Labs and are available under the terms of the Creative Commons License. The dataset and it's description is available here: https://testpilot.mozillalabs.com/testcases/a-week-life-2/aggregated-data.html
We use only the R interface, so there is no need for SQL code to prepare following figures. Note that data are stores in remote server, and are too large to download it into R client. Thus all data aggregates are computed in NPS and only small summaries are downloaded to R client in order to prepare following figures.
Two R packages are used to connect with database and compute in-database aggregates. These are: nzr and nza. The nzConnect() function is used to connect with database witl. There are three tables in this databases, namely: users, events and survey.
They are described here https://testpilot.mozillalabs.com/testcases/a-week-life-2/aggregated-data.html. Then the nzTable() function is used to compute contingency table in NPS, this table is downloaded and used to prepare plot.
# simple function which summarizes single qualitative variable
# form is an R formula with one variable
# nzdf is a nz.data.frame object
# names is a vector of names for corresponding values of considered variable
# title is a main title for produced figure
getSimpleSummary <- function(form, nzdf, names, title) {
# get the contingency table from NPS
tmp = nzTable(form, nzdf, F)
tmp = nzSparse2matrix(na.omit(as.data.frame(tmp$tab)))
tmp = tmp[order(as.numeric(names(tmp)))]
# add names
names(tmp) = names
# prepare graphical layout
par(xpd=F, mar=c(5,2,2,2))
layout(1:2, widths=c(1,1), heights=c(4,1))
# plot the variable summaries
dotchart(tmp, xlim=c(0,2000), pch=19, main=title, xlab="number of surveys")
par(xpd=F, mar=c(1,2,2,2))
barplot(as.matrix(tmp), horiz=T, xaxt="n")
par(xpd=NA)
text(cumsum(tmp),1.5,names(tmp), adj=c(1,1), cex=0.8)
}
# connect to database and create pointer to table survey
nzConnect("user","password","10.1.1.74","witl")
nzSurvey = nz.data.frame("survey")
# show summaries for question 6 from survey
getSimpleSummary(~Q6, nzSurvey,
c("Under 18", "18-25", "26-35", "36-45", "46-55", "Older than 55"),
"How old are you?")
# show summaries for question 7 from survey
getSimpleSummary(~Q7, nzSurvey,
c("Less than 1 hour", "1-2 hours", "2-4 hours", "4-6 hours", "6-8 hours",
"8-10 hours", "More than 10 hours"),
"How much time do you spend on the Web each day?")
#
# for two variables
# relations between sex and computer skill level
tt = nzTable(Q5~Q8, nzSurvey, T)$mat
rownames(tt) = c("male","female")
mosaicplot(tt, col=rainbow(10),main="",ylab="Computer/web skill level",xlab="Gender")
plot(ecdf(as.numeric(rep(names(tt[1,]), tt[1,]))), main="", xlab="Computer/web skill level")
plot(ecdf(as.numeric(rep(names(tt[2,]), tt[2,]))), add=T, col="red")
legend("left", c("male","female"), col=c(1,2), lwd=2)
Resulting plots are presented below
Figure for question 6
Figure for question 7


