In order to perform PCA the dot product will be used. The dot product is computed in parallel fashion in NPS. The aggregate is downloaded to R client. Then mean values are subtracted from the dot product to obtain covariance matrix. The covariance matrix is then rescaled to correlation matrix. This matrix is passed to princomp() function.
Two R packages are used to connect with database and compute in-database aggregates. These are: nzr and nza. The nzConnect() function is used to connect with database witl. Then the nzDotProduct() function is used to compute dot product in parallel in NPS. Below an example usage of PCA is presented.
> nzConnect("user","password","10.1.1.74","witl")
> nzSurvey = nz.data.frame("survey")
> tmp = nzDotProduct(~Q8+factor(Q2)+factor(Q5), nzSurvey, TRUE)
> p = ncol(tmp$mat)
> means = tmp$mat[p,-p] / tmp$mat[p,p]
> # centering the dot product matrix to covariance matrix
> centered = tmp$mat[-p,-p] - outer(means, means) * tmp$mat[p,p]
> # scaling covariance matrix to correlation matrix
> scaled = cov2cor(centered)
> # PCA with use of princomp()
> (pca = princomp(covmat = scaled))
Call:
princomp(covmat = scaled)
Standard deviations:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7
1.481848e+00 1.403128e+00 1.155283e+00 9.222890e-01 8.062642e-01 2.980232e-08 0.000000e+00
7 variables and NA observations.
> plot(pca)
> loadings(pca)
Loadings:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7
Q8 0.323 -0.243 0.860 0.312
Q2.1 -0.304 0.559 0.755 -0.142
Q2.2 -0.222 -0.662 -0.127 -0.697
Q2.3 0.282 0.644 -0.702
Q5.1 -0.392 0.146 0.454 0.459 -0.506 -0.390
Q5.2 0.583 -0.269 0.248 -0.175 -0.700
Q5.3 -0.427 0.220 -0.587 0.242 -0.598
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7
SS loadings 1.000 1.000 1.000 1.000 1.000 1.000 1.000
Proportion Var 0.143 0.143 0.143 0.143 0.143 0.143 0.143
Cumulative Var 0.143 0.286 0.429 0.571 0.714 0.857 1.000
> par(mar=c(5,5,2,2))
> plot(-1:1, -1:1, col="white", xlab="PC1", ylab="PC2")
> arrows(0,0,pca$loadings[,1], pca$loadings[,2])
> text(1.2* pca$loadings[,1], 1.2* pca$loadings[,2], rownames(pca$loadings))
Below the output figures are presented.
To perform PCA also the nzPCA() function from nza package may be used.

