Soil chemical variables in samples on a Tectona grandis plantation

Wagner H. Bonat Walmes M. Zeviani

`wbonat@ufpr.br` `walmes@ufpr.br`

LEG/DEST/UFPR LEG/DEST/UFPR

Wagner H. Bonat	Walmes M. Zeviani
`wbonat@ufpr.br`	`walmes@ufpr.br`
LEG/DEST/UFPR	LEG/DEST/UFPR

62^a RBras & 17^o SEAGRO
July 24–28, 2017
UFLA, Lavras/MG

1 Data description and goals
2 Multivariate linear model (MLM)
3 Modelling the covariance structure using the mcglm package
4 References
5 Session information

1 Data description and goals

The dataset contains the values of chemical variables of 3 layers of the soil in 50 sites cultivated with teak (Tectona grandis). The study was carried out in 2015, in Teca plantations belonging to two farms located in the western region of the state of Mato Grosso. The selection of study areas within the farms was done by means of free walking, covering the entire cultivated area of 1869 ha, making field observations and delimiting plots within the stands from the characteristics of the soils, position in the landscape and development of culture. Fifty plots with 600 m² (20 x 30 m) each were allocated. As a selection criterion, only fields with an area of more than seven hectares (7 ha) were selected, using only the areas with the same planting density, management practices aged 13-14 years.

The main goal of the study is to conduct a profile analysis to verify if there is a general pattern of cation concentration (K⁺, Ca²⁺ and Mg²⁺) in the soil layers.

This dataset is part of data set with more tables (teca_arv, teca_crapar, teca_cra, teca_gran and teca_qui). These data are available on the package EACS (http://leg.ufpr.br/~walmes/pacotes/EACS/index.html). Please do not use datasets without the researchers’ permission because the results of these researches are being submitted to journals at the moment. If you want to use the data to propose statistical methodologies, contact the researchers as well.

2 Multivariate linear model (MLM)

2.1 Loading the dataset

#-----------------------------------------------------------------------
# Packages.

library(latticeExtra)
library(car)
library(reshape2)
library(corrplot)
library(grid)
library(gridExtra)
library(mcglm)
library(Matrix)

#-----------------------------------------------------------------------
# Getting the dataset.

# Online documentation of the EACS::teca_qui dataset.
u <- "http://leg.ufpr.br/~walmes/pacotes/EACS/reference/teca_qui.html"
browseURL(u)

csv <- "https://raw.githubusercontent.com/walmes/EACS/master/data-raw/teca_qui.csv"
teca <- read.csv2(file = csv, dec = ".")
str(teca)

## 'data.frame':    150 obs. of  15 variables:
##  $ loc: int  1 1 1 2 2 2 3 3 3 4 ...
##  $ cam: Factor w/ 3 levels "[0, 5)","[40, 80)",..: 1 3 2 1 3 2 1 3 2 1 ...
##  $ ph : num  6.8 6.7 6.7 4.7 4.7 4.9 7.6 6.8 6.9 6.6 ...
##  $ p  : num  22.51 0.83 0.01 3.89 0.69 ...
##  $ k  : num  72.24 13.42 7.23 48.13 12.34 ...
##  $ ca : num  8.27 2.91 2.33 0.97 0.76 0.21 7.63 3.22 2.22 5.54 ...
##  $ mg : num  1.7 1.77 0.51 0.16 0.14 0 1.53 1.31 1.09 1.77 ...
##  $ al : num  0 0 0 0.3 0.6 0.6 0 0 0 0 ...
##  $ ctc: num  12.47 6.57 4.52 5.3 4.17 ...
##  $ sat: num  81.4 71.7 63.2 23.7 22.4 ...
##  $ mo : num  72.2 25.6 9.7 34.4 8.7 9.7 50.4 15.6 9.5 50.2 ...
##  $ arg: num  184 215 286 232 213 ...
##  $ are: num  770 750 674 741 775 ...
##  $ cas: num  1.8 2.2 3.7 0.4 1.1 1.7 8.4 19 14.3 5.5 ...
##  $ acc: num  770 750 676 741 775 ...

teca$cam <- factor(teca$cam, labels = 1:3)

2.2 Exploratory data analysis

#-----------------------------------------------------------------------
# Graphical exploratory analysis.

# The cations.
v <- c("k", "ca", "mg")
summary(teca[v])

##        k                ca               mg        
##  Min.   :  4.13   Min.   : 0.210   Min.   :0.0000  
##  1st Qu.: 15.64   1st Qu.: 1.305   1st Qu.:0.3825  
##  Median : 31.47   Median : 2.445   Median :1.1300  
##  Mean   : 44.46   Mean   : 3.567   Mean   :1.2361  
##  3rd Qu.: 59.25   3rd Qu.: 4.835   3rd Qu.:1.8025  
##  Max.   :188.96   Max.   :14.410   Max.   :4.9200

# Only the cations.
densityplot(~k + ca + I(mg + 0.1),
            outer = TRUE,
            groups = cam,
            scales = list(relation = "free",
                          x = list(log = 10)),
            as.table = TRUE,
            data = teca)

scatterplotMatrix(~log(k) + log(ca) + log(mg + 0.1) | cam,
                  data = teca,
                  gap = 0,
                  smooth = FALSE,
                  reg.line = FALSE,
                  ellipse = TRUE,
                  by.groups = TRUE,
                  diagonal = "qqplot")

# Transformed copies of the variables.
teca <- transform(teca,
                  lk = log(k),
                  lca = log(ca),
                  lmg = log(mg + 0.1))

# Names of the transformed variables.
v <- tail(names(teca), n = 3)
v

## [1] "lk"  "lca" "lmg"

#-----------------------------------------------------------------------
# Repeated measures design.

# Outer factor: none.
# Inner factor: soil layer (`cam`)
# Responses: 3 cations x 3 layers = 9 conditions.

# Long format.
tecal <- melt(data = teca[c("loc", "cam", v)],
              measure.vars = v,
              variable.name = "res")
str(tecal)

## 'data.frame':    450 obs. of  4 variables:
##  $ loc  : int  1 1 1 2 2 2 3 3 3 4 ...
##  $ cam  : Factor w/ 3 levels "1","2","3": 1 3 2 1 3 2 1 3 2 1 ...
##  $ res  : Factor w/ 3 levels "lk","lca","lmg": 1 1 1 1 1 1 1 1 1 1 ...
##  $ value: num  4.28 2.6 1.98 3.87 2.51 ...

bwplot(value ~ cam | res,
       pch = "|",
       # scales = list(y = list(relation = "free")),
       layout = c(NA, 1),
       xlab = "Soil layer",
       ylab = "Observed values in the transformed scale",
       data = tecal) +
    layer(panel.xyplot(x = x,
                       y = y,
                       jitter.x = TRUE,
                       type = c("p", "a")))

# Combine 3 responses x 3 layers = 9 response variables.
tecal$res.cam <- with(tecal, paste(res, cam, sep = "."))

# Wide format.
tecaw <- dcast(data = tecal,
               formula = loc ~ res.cam,
               value = "value",
               stringsAsFactors = FALSE)

# Presented order of the responses.
ord <- c(t(outer(Y = levels(tecal$cam),
                 X = v,
                 FUN = paste,
                 sep = ".")))

tecaw <- tecaw[, c("loc", ord)]
str(tecaw)

## 'data.frame':    50 obs. of  10 variables:
##  $ loc  : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ lk.1 : num  4.28 3.87 4.74 3.99 4.04 ...
##  $ lk.2 : num  1.98 2.16 3.29 2.61 2.61 ...
##  $ lk.3 : num  2.6 2.51 3.25 2.41 3.1 ...
##  $ lca.1: num  2.1126 -0.0305 2.0321 1.712 1.0953 ...
##  $ lca.2: num  0.8459 -1.5606 0.7975 -0.2877 0.0198 ...
##  $ lca.3: num  1.0682 -0.2744 1.1694 0.8109 -0.0726 ...
##  $ lmg.1: num  0.588 -1.347 0.489 0.626 0.798 ...
##  $ lmg.2: num  -0.494 -2.303 0.174 -1.347 -1.966 ...
##  $ lmg.3: num  0.626 -1.427 0.344 -1.171 -0.478 ...

2.3 Repeated measures analysis

#-----------------------------------------------------------------------
# Repeated measures analysis.

dput(names(tecaw)[-1])

## c("lk.1", "lk.2", "lk.3", "lca.1", "lca.2", "lca.3", "lmg.1", 
## "lmg.2", "lmg.3")

# Multivariate linear model for 9 responses.
m0 <- lm(as.matrix(tecaw[, 2:10]) ~ 1)
m0

## 
## Call:
## lm(formula = as.matrix(tecaw[, 2:10]) ~ 1)
## 
## Coefficients:
##              lk.1     lk.2     lk.3     lca.1    lca.2    lca.3    lmg.1  
## (Intercept)   4.2521   2.9425   3.1651   1.5684   0.3443   0.7683   0.4740
##              lmg.2    lmg.3  
## (Intercept)  -0.4681  -0.1056

# summary(m0)
# summary.aov(m0)
anova(m0)

## Analysis of Variance Table
## 
##             Df  Pillai approx F num Df den Df    Pr(>F)    
## (Intercept)  1 0.99413   771.53      9     41 < 2.2e-16 ***
## Residuals   49                                             
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Extract the raw residuals.
r <- residuals(m0)

# Checking the models assumptions on the residuals.
scatterplotMatrix(r,
                  gap = 0,
                  smooth = FALSE,
                  reg.line = FALSE,
                  ellipse = TRUE,
                  diagonal = "qqplot")

corrplot(cor(r),
         type = "upper",
         tl.pos = "d",
         outline = TRUE,
         method = "ellipse")

# Inner factors data design.
# DANGER this order changes de SS.
idata <- expand.grid(cam = levels(tecal$cam),
                     res = v,
                     KEEP.OUT.ATTRS = FALSE)
str(idata)

## 'data.frame':    9 obs. of  2 variables:
##  $ cam: Factor w/ 3 levels "1","2","3": 1 2 3 1 2 3 1 2 3
##  $ res: Factor w/ 3 levels "lk","lca","lmg": 1 1 1 2 2 2 3 3 3

Anova(m0,
      idata = idata,
      idesign = ~res * cam)

## Note: model has only an intercept; equivalent type-III tests substituted.

## 
## Type III Repeated Measures MANOVA Tests: Pillai test statistic
##             Df test stat approx F num Df den Df    Pr(>F)    
## (Intercept)  1   0.86275   308.00      1     49 < 2.2e-16 ***
## res          1   0.97606   978.44      2     48 < 2.2e-16 ***
## cam          1   0.86876   158.87      2     48 < 2.2e-16 ***
## res:cam      1   0.32750     5.60      4     46  0.000929 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

2.4 Testing linear hypotheses on the responses

#-----------------------------------------------------------------------

# Regression parameters.
B <- coef(m0)
colnames(B)

## [1] "lk.1"  "lk.2"  "lk.3"  "lca.1" "lca.2" "lca.3" "lmg.1" "lmg.2" "lmg.3"

# The L matrix is only a scalar.
L <- matrix(1, nrow = 1, ncol = 1)

# Check the orders.
cbind(idata, res.cam = cbind(colnames(B)))

##   cam res res.cam
## 1   1  lk    lk.1
## 2   2  lk    lk.2
## 3   3  lk    lk.3
## 4   1 lca   lca.1
## 5   2 lca   lca.2
## 6   3 lca   lca.3
## 7   1 lmg   lmg.1
## 8   2 lmg   lmg.2
## 9   3 lmg   lmg.3

# To test effect of responses.
M_res <- cbind(c(-1, -1, -1, -1, -1, -1, 2, 2, 2),
               c(-1, -1, -1,  1,  1,  1, 0, 0, 0))

linearHypothesis(m0,
                 test = "Pillai",
                 hypothesis.matrix = L,
                 P = M_res)

## 
##  Response transformation matrix:
##       [,1] [,2]
## lk.1    -1   -1
## lk.2    -1   -1
## lk.3    -1   -1
## lca.1   -1    1
## lca.2   -1    1
## lca.3   -1    1
## lmg.1    2    0
## lmg.2    2    0
## lmg.3    2    0
## 
## Sum of squares and products for the hypothesis:
##          [,1]     [,2]
## [1,] 8764.849 5083.320
## [2,] 5083.320 2948.156
## 
## Sum of squares and products for error:
##           [,1]      [,2]
## [1,] 254.64335  60.57668
## [2,]  60.57668 175.97410
## 
## Multivariate Test: 
##        Df test stat approx F num Df den Df     Pr(>F)    
## Pillai  1 0.9760584 978.4396      2     48 < 2.22e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# To test effect of soil layer.
M_cam <- cbind(c(-1, -1, 2, -1, -1, 2, -1, -1, 2),
               c(-1,  1, 0, -1,  1, 0, -1,  1, 0))

linearHypothesis(m0,
                 test = "Pillai",
                 hypothesis.matrix = L,
                 P = M_cam)

## 
##  Response transformation matrix:
##       [,1] [,2]
## lk.1    -1   -1
## lk.2    -1    1
## lk.3     2    0
## lca.1   -1   -1
## lca.2   -1    1
## lca.3    2    0
## lmg.1   -1   -1
## lmg.2   -1    1
## lmg.3    2    0
## 
## Sum of squares and products for the hypothesis:
##          [,1]     [,2]
## [1,] 106.2188 253.3122
## [2,] 253.3122 604.1031
## 
## Sum of squares and products for error:
##           [,1]     [,2]
## [1,] 108.22479 19.95788
## [2,]  19.95788 94.89703
## 
## Multivariate Test: 
##        Df test stat approx F num Df den Df     Pr(>F)    
## Pillai  1 0.8687589 158.8695      2     48 < 2.22e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# GOOD: To easy get the M matrices, creates a model matrix.
M <- model.matrix(~res * cam,
                  data = idata,
                  contrasts = list(res = contr.sum,
                                   cam = contr.sum))
a <- attr(M, "assign")

# This design matrix has all the slices needed.
M

##   (Intercept) res1 res2 cam1 cam2 res1:cam1 res2:cam1 res1:cam2 res2:cam2
## 1           1    1    0    1    0         1         0         0         0
## 2           1    1    0    0    1         0         0         1         0
## 3           1    1    0   -1   -1        -1         0        -1         0
## 4           1    0    1    1    0         0         1         0         0
## 5           1    0    1    0    1         0         0         0         1
## 6           1    0    1   -1   -1         0        -1         0        -1
## 7           1   -1   -1    1    0        -1        -1         0         0
## 8           1   -1   -1    0    1         0         0        -1        -1
## 9           1   -1   -1   -1   -1         1         1         1         1
## attr(,"assign")
## [1] 0 1 1 2 2 3 3 3 3
## attr(,"contrasts")
## attr(,"contrasts")$res
##     [,1] [,2]
## lk     1    0
## lca    0    1
## lmg   -1   -1
## 
## attr(,"contrasts")$cam
##   [,1] [,2]
## 1    1    0
## 2    0    1
## 3   -1   -1

# ATTENTION: The order or lines in the `idesign` and in the `B`
# parameter matrix must match.

linearHypothesis(m0,
                 test = "Pillai",
                 hypothesis.matrix = L,
                 P = M[, a == 3])

## 
##  Response transformation matrix:
##       res1:cam1 res2:cam1 res1:cam2 res2:cam2
## lk.1          1         0         0         0
## lk.2          0         0         1         0
## lk.3         -1         0        -1         0
## lca.1         0         1         0         0
## lca.2         0         0         0         1
## lca.3         0        -1         0        -1
## lmg.1        -1        -1         0         0
## lmg.2         0         0        -1        -1
## lmg.3         1         1         1         1
## 
## Sum of squares and products for the hypothesis:
##           res1:cam1  res2:cam1  res1:cam2  res2:cam2
## res1:cam1 12.875668  5.5941310  3.5528360 -1.5596193
## res2:cam1  5.594131  2.4304993  1.5436116 -0.6776126
## res1:cam2  3.552836  1.5436116  0.9803486 -0.4303522
## res2:cam2 -1.559619 -0.6776126 -0.4303522  0.1889154
## 
## Sum of squares and products for error:
##           res1:cam1 res2:cam1 res1:cam2 res2:cam2
## res1:cam1 36.581045 28.060288  9.006497  9.776567
## res2:cam1 28.060288 40.915091  7.408952 15.453836
## res1:cam2  9.006497  7.408952 16.163552  9.554192
## res2:cam2  9.776567 15.453836  9.554192 25.045473
## 
## Multivariate Test: 
##        Df test stat approx F num Df den Df     Pr(>F)    
## Pillai  1 0.3275004 5.600383      4     46 0.00092895 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

2.5 Univariate specification

#-----------------------------------------------------------------------
# The univariate regression model assumes independent observations
# (between responses and between soil layers) inside each location (a
# strong assumption in this case).

an0 <- lm(value ~ res * cam, data = tecal)
anova(an0)

## Analysis of Variance Table
## 
## Response: value
##            Df Sum Sq Mean Sq  F value Pr(>F)    
## res         2 978.30  489.15 890.2577 <2e-16 ***
## cam         2 106.58   53.29  96.9932 <2e-16 ***
## res:cam     4   3.75    0.94   1.7044 0.1479    
## Residuals 441 242.31    0.55                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# The point estimates for means are the same.
doBy::LSmeans(an0, effect = c("cam", "res"))

##     estimate       se  df    t.stat       p.value cam res
## 1  4.2521180 0.104828 441 40.562826 6.381831e-151   1  lk
## 2  2.9425024 0.104828 441 28.069825 3.424403e-100   2  lk
## 3  3.1650635 0.104828 441 30.192934 2.185772e-109   3  lk
## 4  1.5683802 0.104828 441 14.961469  3.182203e-41   1 lca
## 5  0.3442527 0.104828 441  3.283978  1.105057e-03   2 lca
## 6  0.7683067 0.104828 441  7.329216  1.113877e-12   3 lca
## 7  0.4740344 0.104828 441  4.522023  7.882329e-06   1 lmg
## 8 -0.4681484 0.104828 441 -4.465874  1.014363e-05   2 lmg
## 9 -0.1055624 0.104828 441 -1.007006  3.144841e-01   3 lmg

cbind(c(coef(m0)))

##             [,1]
##  [1,]  4.2521180
##  [2,]  2.9425024
##  [3,]  3.1650635
##  [4,]  1.5683802
##  [5,]  0.3442527
##  [6,]  0.7683067
##  [7,]  0.4740344
##  [8,] -0.4681484
##  [9,] -0.1055624

3 Modelling the covariance structure using the `mcglm` package

3.1 Package instalation from GitHub

# Installing the package from the github repository.
devtools::install_github("wbonat/mcglm", ref = "devel")

# Load the package.
library(mcglm)
# packageVersion("mcglm")

# Sort the lines per subject (`loc`) followed by `cam`.
teca <- plyr::arrange(teca, loc, cam)

# Keep only the variables that will be used.
teca <- teca[, c(1, 2, 16, 17, 18)]
str(teca)

## 'data.frame':    150 obs. of  5 variables:
##  $ loc: int  1 1 1 2 2 2 3 3 3 4 ...
##  $ cam: Factor w/ 3 levels "1","2","3": 1 2 3 1 2 3 1 2 3 1 ...
##  $ lk : num  4.28 1.98 2.6 3.87 2.16 ...
##  $ lca: num  2.1126 0.8459 1.0682 -0.0305 -1.5606 ...
##  $ lmg: num  0.588 -0.494 0.626 -1.347 -2.303 ...

# Head and tail of the data.frame.
head(teca)

##   loc cam       lk         lca        lmg
## 1   1   1 4.279994  2.11263451  0.5877867
## 2   1   2 1.978239  0.84586827 -0.4942963
## 3   1   3 2.596746  1.06815308  0.6259384
## 4   2   1 3.873906 -0.03045921 -1.3470736
## 5   2   2 2.156403 -1.56064775 -2.3025851
## 6   2   3 2.512846 -0.27443685 -1.4271164

tail(teca)

##     loc cam       lk      lca         lmg
## 145  49   1 4.750914 2.534490  0.55961579
## 146  49   2 2.749832 1.211941  0.25464222
## 147  49   3 3.155297 1.691939  0.66782937
## 148  50   1 4.541165 2.431857  0.73716407
## 149  50   2 2.931727 1.501853 -0.01005034
## 150  50   3 3.491343 2.129421  0.30748470

3.2 Covariance structures for the repeated measures

#-----------------------------------------------------------------------
# Covariance structures for the inner subject factor `cam`.

# Independent covariance structure.
Z0 <- mc_id(teca)
Z0[[1]][1:6,1:6]

## 6 x 6 diagonal matrix of class "ddiMatrix"
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    .    .    .    .    .
## [2,]    .    1    .    .    .    .
## [3,]    .    .    1    .    .    .
## [4,]    .    .    .    1    .    .
## [5,]    .    .    .    .    1    .
## [6,]    .    .    .    .    .    1

# Unstructured model for covariance among cam.
Z_ns <- mc_ns(teca, id = "loc")
Z_ns[[1]][1:6,1:6]

## 6 x 6 sparse Matrix of class "dgCMatrix"
##                 
## [1,] . 1 . . . .
## [2,] 1 . . . . .
## [3,] . . . . . .
## [4,] . . . . 1 .
## [5,] . . . 1 . .
## [6,] . . . . . .

Z_ns[[2]][1:6,1:6]

## 6 x 6 sparse Matrix of class "dgCMatrix"
##                 
## [1,] . . 1 . . .
## [2,] . . . . . .
## [3,] 1 . . . . .
## [4,] . . . . . 1
## [5,] . . . . . .
## [6,] . . . 1 . .

Z_ns[[3]][1:6,1:6]

## 6 x 6 sparse Matrix of class "dgCMatrix"
##                 
## [1,] . . . . . .
## [2,] . . 1 . . .
## [3,] . 1 . . . .
## [4,] . . . . . .
## [5,] . . . . . 1
## [6,] . . . . 1 .

# Moving average first order structure.
Z_ma1 <- mc_ma(id = "loc",
               time = "cam",
               data = teca,
               order = 1)
Z_ma1[[1]][1:6,1:6]

## 6 x 6 sparse Matrix of class "nsCMatrix"
##                 
## [1,] . | . . . .
## [2,] | . | . . .
## [3,] . | . . . .
## [4,] . . . . | .
## [5,] . . . | . |
## [6,] . . . . | .

# Distance based covariance structure.
Z_dist <- mc_dist(id = "loc",
                  time = "cam",
                  data = teca,
                  method = "euclidean")
Z_dist[[1]][1:6,1:6]

## 6 x 6 sparse Matrix of class "dsCMatrix"
##                         
## [1,] .   1 0.5 .   . .  
## [2,] 1.0 . 1.0 .   . .  
## [3,] 0.5 1 .   .   . .  
## [4,] .   . .   .   1 0.5
## [5,] .   . .   1.0 . 1.0
## [6,] .   . .   0.5 1 .

# Random walk covariance structure.
Z_rw <- mc_rw(id = "loc",
              time = "cam",
              data = teca,
              order = 1,
              proper = TRUE)
Z_rw[[1]][1:6,1:6]

## 6 x 6 sparse Matrix of class "dtCMatrix"
##                 
## [1,] 1 . . . . .
## [2,] . 2 . . . .
## [3,] . . 1 . . .
## [4,] . . . 1 . .
## [5,] . . . . 2 .
## [6,] . . . . . 1

Z_rw[[2]][1:6,1:6]

## 6 x 6 sparse Matrix of class "dsCMatrix"
##                       
## [1,]  0 -1  .  .  .  .
## [2,] -1  0 -1  .  .  .
## [3,]  . -1  0  .  .  .
## [4,]  .  .  .  0 -1  .
## [5,]  .  .  . -1  0 -1
## [6,]  .  .  .  . -1  0

#-----------------------------------------------------------------------

3.3 Model fitting

#-----------------------------------------------------------------------
# Fitting several competing models.

# Linear predictors (the same RHS for all responses).
form <- list(lk ~ cam,
             lca ~ cam,
             lmg ~ cam)

# Standard MANOVA (ignores the correlation among soil layers).
fit1 <- mcglm(linear_pred = form,
              matrix_pred = list(Z0, Z0, Z0),
              data = teca)

## Automatic initial values selected.

# MANOVA + repeated measures using unstructured matrix.
fit2 <- mcglm(linear_pred = form,
              matrix_pred = list(c(Z0, Z_ns),
                                 c(Z0, Z_ns),
                                 c(Z0, Z_ns)),
              control_algorithm = list(tunning = 0.9,
                                       max_iter = 100),
              data = teca)

## Automatic initial values selected.

# MANOVA + repeated measures using moving average first order.
fit3 <- mcglm(linear_pred = form,
              matrix_pred = list(c(Z0, Z_ma1),
                                 c(Z0, Z_ma1),
                                 c(Z0, Z_ma1)),
              control_algorithm = list(tunning = 0.8),
              data = teca)

## Automatic initial values selected.

# MANOVA + repeated measures using distance based.
fit4 <- mcglm(linear_pred = form,
              matrix_pred = list(c(Z0, Z_dist),
                                 c(Z0, Z_dist),
                                 c(Z0, Z_dist)),
              control_algorithm = list(tunning = 0.8),
              data = teca)

## Automatic initial values selected.

# MANOVA + repeated measures using distance based + expm covariance link
# function.
fit5 <- mcglm(linear_pred = form,
              matrix_pred = list(c(Z0, Z_dist),
                                 c(Z0, Z_dist),
                                 c(Z0, Z_dist)),
              covariance = c("expm",
                             "expm",
                             "expm"),
              control_algorithm = list(tunning = 0.8),
              data = teca)

## Automatic initial values selected.

# MANOVA + repeated measures using random walk + inverse covariance link
# function.
fit6 <- mcglm(linear_pred = form,
              matrix_pred = list(c(Z_rw),
                                 c(Z_rw),
                                 c(Z_rw)),
              covariance = c("inverse",
                             "inverse",
                             "inverse"),
              control_algorithm = list(tunning = 0.5, max_iter = 100),
              data = teca)

## Automatic initial values selected.

3.4 Comparing the models

#-----------------------------------------------------------------------
# Comparing fitted models.

m <- c("Indep",
       "Unstr",
       "MA(1)",
       "Eucl",
       "Eucl expm",
       "RW expm")
meas <- rbind(gof(fit1),
              gof(fit2),
              gof(fit3),
              gof(fit4),
              gof(fit5),
              gof(fit6))
meas <- cbind(model = m,
              as.data.frame(meas),
              stringsAsFactors = FALSE)
meas$model <- with(meas,
                   factor(model,
                          levels = model[order(Df)]))
meas <- plyr::arrange(meas, Df)
meas

##       model plogLik Df   pAIC    pKLIC     pBIC
## 1     Indep -422.22 15 874.44 885.3233 936.0787
## 2     MA(1) -373.19 18 782.38 860.1365 856.3465
## 3      Eucl -362.21 18 760.42 857.5303 834.3865
## 4 Eucl expm -355.59 18 747.18 803.5611 821.1465
## 5   RW expm -380.48 18 796.96 886.0974 870.9265
## 6     Unstr -343.01 24 734.02 710.6130 832.6419

barchart(plogLik + pAIC + pBIC + pKLIC~ model,
         outer = TRUE,
         layout = c(1, NA),
         as.table = TRUE,
         ylab = "Measures of model fitting",
         xlab = "Fitted models",
         data = meas,
         scales = "free") +
    layer(grid.text(x = unit(1:nlevels(x), "native"),
                    y = unit(0.03, "npc"),
                    label = meas$Df))

3.5 Hotelling-Lawley multivariate hypotheses tests for `McGLM`

#-----------------------------------------------------------------------
# Multivariate hypotheses tests for McGLM.

source("../review/functions.R")
library(Matrix)

manova.mcglm(fit1)

## Note: method with signature 'Matrix#numLike' chosen for function '%*%',
##  target signature 'dgTMatrix#numeric'.
##  "TsparseMatrix#ANY" would also be valid

## Note: method with signature 'sparseMatrix#matrix' chosen for function '%*%',
##  target signature 'dgTMatrix#matrix'.
##  "TsparseMatrix#ANY" would also be valid

##     Effects Df Hotelling.Lawley Qui.square      p_value
## 1 Intercept  3           20.449   3067.386 0.000000e+00
## 2       cam  6            0.992    148.753 1.419538e-29

manova.mcglm(fit2)

##     Effects Df Hotelling.Lawley Qui.square      p_value
## 1 Intercept  3           17.218   2582.655 0.000000e+00
## 2       cam  6            3.074    461.140 1.963509e-96

3.6 Estimated correlations

#--------------------------------------------
# Correlations.

# Correlation between responses.
summary(fit2, print = "Correlation")

## Correlation matrix:
##   Parameters Estimates  Std.error  Z value
## 1      rho12 0.2474244 0.06883107 3.594661
## 2      rho13 0.3700137 0.08349973 4.431316
## 3      rho23 0.3229868 0.08022769 4.025877
## 
## Algorithm: chaser
## Correction: TRUE
## Number iterations: 29

Sigma_b <- matrix(NA,ncol = 3, nrow = 3)
Sigma_b[lower.tri(Sigma_b)] <- fit2$Covariance[1:3]
diag(Sigma_b) <- 1
Sigma_b <- forceSymmetric(Sigma_b, uplo = FALSE)

# Correlation among `cam` for the `lk` response.
COR_lk <- matrix(NA, 3, 3)
COR_lk[lower.tri(COR_lk)] <- fit2$Covariance[5:7]
COR_lk[upper.tri(COR_lk)] <- fit2$Covariance[5:7]
diag(COR_lk) <- fit2$Covariance[4]
cov2cor(COR_lk)

##           [,1]      [,2]      [,3]
## [1,] 1.0000000 0.4560899 0.6310219
## [2,] 0.4560899 1.0000000 0.7648163
## [3,] 0.6310219 0.7648163 1.0000000

Sigma_r1 <- COR_lk

# Correlation among `cam` for the `lca` response.
COR_lca <- matrix(NA, 3, 3)
COR_lca[lower.tri(COR_lca)] <- fit2$Covariance[9:11]
COR_lca[upper.tri(COR_lca)] <- fit2$Covariance[9:11]
diag(COR_lca) <- fit2$Covariance[8]
cov2cor(COR_lca)

##           [,1]      [,2]      [,3]
## [1,] 1.0000000 0.8082929 0.8286790
## [2,] 0.8082929 1.0000000 0.7943085
## [3,] 0.8286790 0.7943085 1.0000000

Sigma_r2 <- COR_lca

# Correlation among `cam` for the `lmg` response.
COR_lmg <- matrix(NA, 3, 3)
COR_lmg[lower.tri(COR_lmg)] <- fit2$Covariance[13:15]
COR_lmg[upper.tri(COR_lmg)] <- fit2$Covariance[13:15]
diag(COR_lmg) <- fit2$Covariance[12]
cov2cor(COR_lmg)

##           [,1]      [,2]      [,3]
## [1,] 1.0000000 0.4039425 0.4442434
## [2,] 0.4039425 1.0000000 0.7645949
## [3,] 0.4442434 0.7645949 1.0000000

Sigma_r3 <- COR_lmg

# Joint variance covariance matrix
part1 <- bdiag(t(chol(Sigma_r1)),t(chol(Sigma_r2)),t(chol(Sigma_r3)))
image(part1)

I <- Diagonal(3,1)
part2 <- kronecker(Sigma_b,I)
image(part2)

C <- as.matrix(part1%*%part2%*%t(part1))
corrplot(cov2cor(C), 
         tl.pos = "d",
         outline = TRUE,
         method = "ellipse")

4 References

5 Session information

# devtools::session_info()
Sys.time()

## [1] "2017-07-26 19:31:42 BRT"

cbind(Sys.info())

##                [,1]                                          
## sysname        "Linux"                                       
## release        "4.4.0-87-generic"                            
## version        "#110-Ubuntu SMP Tue Jul 18 12:55:35 UTC 2017"
## nodename       "class"                                       
## machine        "x86_64"                                      
## login          "unknown"                                     
## user           "walmes"                                      
## effective_user "walmes"

sessionInfo()

## R version 3.4.1 (2017-06-30)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.2 LTS
## 
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
## 
## locale:
##  [1] LC_CTYPE=pt_BR.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=pt_BR.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=pt_BR.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=pt_BR.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=pt_BR.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  base     
## 
## other attached packages:
##  [1] Matrix_1.2-10       mcglm_0.4.0         gridExtra_2.2.1    
##  [4] corrplot_0.77       reshape2_1.4.2      car_2.1-4          
##  [7] latticeExtra_0.6-28 RColorBrewer_1.1-2  lattice_0.20-35    
## [10] knitr_1.15.1        rmarkdown_1.3      
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.9        compiler_3.4.1     nloptr_1.0.4      
##  [4] plyr_1.8.4         methods_3.4.1      tools_3.4.1       
##  [7] digest_0.6.12      lme4_1.1-12        evaluate_0.10     
## [10] nlme_3.1-131       gtable_0.2.0       mgcv_1.8-17       
## [13] yaml_2.1.14        parallel_3.4.1     SparseM_1.74      
## [16] stringr_1.2.0      MatrixModels_0.4-1 rprojroot_1.2     
## [19] nnet_7.3-12        minqa_1.2.4        magrittr_1.5      
## [22] backports_1.0.5    htmltools_0.3.5    MASS_7.3-47       
## [25] splines_3.4.1      assertthat_0.2.0   pbkrtest_0.4-6    
## [28] quantreg_5.29      stringi_1.1.2      doBy_4.5-15

Multivariate Covariance Generalized Linear Models for the Analysis of Experimental Data

`github.com/leg-ufpr/mcglm4aed`

Soil chemical variables in samples on a Tectona grandis plantation

Wagner H. Bonat Walmes M. Zeviani

`wbonat@ufpr.br` `walmes@ufpr.br`

LEG/DEST/UFPR LEG/DEST/UFPR

62^a RBras & 17^o SEAGRO
July 24–28, 2017
UFLA, Lavras/MG

1 Data description and goals

2 Multivariate linear model (MLM)

2.1 Loading the dataset

2.2 Exploratory data analysis

2.3 Repeated measures analysis

2.4 Testing linear hypotheses on the responses

2.5 Univariate specification

3 Modelling the covariance structure using the `mcglm` package

3.1 Package instalation from GitHub

3.2 Covariance structures for the repeated measures

3.3 Model fitting

3.4 Comparing the models

3.5 Hotelling-Lawley multivariate hypotheses tests for `McGLM`

3.6 Estimated correlations

4 References

5 Session information

Multivariate Covariance Generalized Linear Models for the Analysis of Experimental Data

github.com/leg-ufpr/mcglm4aed

Soil chemical variables in samples on a Tectona grandis plantation

Wagner H. Bonat Walmes M. Zeviani wbonat@ufpr.br walmes@ufpr.br LEG/DEST/UFPR LEG/DEST/UFPR

62a RBras & 17o SEAGRO July 24–28, 2017 UFLA, Lavras/MG

1 Data description and goals

2 Multivariate linear model (MLM)

2.1 Loading the dataset

2.2 Exploratory data analysis

2.3 Repeated measures analysis

2.4 Testing linear hypotheses on the responses

2.5 Univariate specification

3 Modelling the covariance structure using the mcglm package

3.1 Package instalation from GitHub

3.2 Covariance structures for the repeated measures

3.3 Model fitting

3.4 Comparing the models

3.5 Hotelling-Lawley multivariate hypotheses tests for McGLM

3.6 Estimated correlations

4 References

5 Session information

`github.com/leg-ufpr/mcglm4aed`

Wagner H. Bonat Walmes M. Zeviani

`wbonat@ufpr.br` `walmes@ufpr.br`

LEG/DEST/UFPR LEG/DEST/UFPR

62^a RBras & 17^o SEAGRO
July 24–28, 2017
UFLA, Lavras/MG

3 Modelling the covariance structure using the `mcglm` package

3.5 Hotelling-Lawley multivariate hypotheses tests for `McGLM`