Wagner H. Bonat | Walmes M. Zeviani |
---|---|
wbonat@ufpr.br |
walmes@ufpr.br |
LEG/DEST/UFPR | LEG/DEST/UFPR |
62a RBras & 17o SEAGRO
July 24–28, 2017
UFLA, Lavras/MG
Often, two situations occurs
The results are pratically the same for all responses.
Table of means followed by letters from a pairwise multiple comparisons test to a 3-level factor in a RBD.
Wtr yield tg w100 Kconc
1 37.5 23.93 a 163.40 a 14.86 a 17.72 a
2 50 28.69 b 209.40 b 13.80 a 19.01 a
3 62.5 29.83 b 215.60 b 13.83 a 18.32 a
There is not sufficient evidence in each response.
ANOVA tables for 4 responses in a RBD.
Response yield :
Df Sum Sq Mean Sq F value Pr(>F)
blk 4 4.7999 1.2000 0.3937 0.80798
Wtr 2 19.2544 9.6272 3.1585 0.09749
Residuals 8 24.3842 3.0480
Response tg :
Df Sum Sq Mean Sq F value Pr(>F)
blk 4 456.4 114.10 0.7591 0.57990
Wtr 2 1462.9 731.47 4.8667 0.04142
Residuals 8 1202.4 150.30
Response w100 :
Df Sum Sq Mean Sq F value Pr(>F)
blk 4 2.6981 0.67452 1.3088 0.3447
Wtr 2 0.2942 0.14709 0.2854 0.7591
Residuals 8 4.1230 0.51538
Response Kconc :
Df Sum Sq Mean Sq F value Pr(>F)
blk 4 2.5354 0.6339 0.2837 0.88053
Wtr 2 14.5229 7.2615 3.2506 0.09263
Residuals 8 17.8711 2.2339
Its is difficult to know
Probably, someone asked himself
soybean
: water and potassium in the plant productioncotton
: impact of defoliation and growth stagewrc
: soil management in the water retention curveteak
: concentration of cations in the soil layersThe model for each observation \(i = 1, \ldots, n\), is \[ \begin{aligned} y_i &\sim \text{Normal}(\mu_i, \sigma^2)\\ \mu_i &= \mathbf{X}_{i.}\boldsymbol{\beta}, \end{aligned} \]
where
The model written for the response (column) vector is \[ \begin{aligned} \mathbf{y} &= \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} \\ \boldsymbol{\epsilon} &\sim \text{Normal}_n(\mathbf{0}, \sigma^2 \mathbf{I}) \Rightarrow \epsilon_i \overset{iid}{\sim} \text{Normal}(0, \sigma^2), \end{aligned} \]
where
The likelihood function for \(\boldsymbol{\beta}\) and \(\sigma^2\) is
\[ \begin{aligned} L(\boldsymbol{\beta}, \sigma^2) &= \prod_{i=1}^n \left[ (2\pi\sigma^2)^{-1/2} \exp\left\{-\frac{ (y_i - \mathbf{X}_{i.}\boldsymbol{\beta})^2}{2\sigma^2} \right\} \right ]\\ &= (2\pi\sigma^2)^{-n/2} \exp\left\{-\frac{ (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})' (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})}{2\sigma^2} \right\}\\ &= (2\pi)^{-n/2} |\sigma^2 \mathbf{I}|^{-1/2} \exp\left\{-\frac{1}{2} (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})' (\sigma^2 \mathbf{I})^{-1} (\mathbf{y} - \mathbf{X}\boldsymbol{\beta}) \right\}. \end{aligned} \]
The MLE estimators are
\[ \begin{aligned} \boldsymbol{\hat\beta} &= (\mathbf{X}' \mathbf{X})^{-1} \mathbf{X}' \mathbf{y}\\ \hat{\sigma}^2 &= (\mathbf{y} - \mathbf{\hat y})' (\mathbf{y} - \mathbf{\hat y})/n, \quad \mathbf{\hat y} = \mathbf{X} \boldsymbol{\hat\beta},\\ \end{aligned} \]
The maximized likelihood is
\[ \begin{aligned} \max_{\beta, \sigma^2} L(\boldsymbol{\beta}, \sigma^2) &= L(\boldsymbol{\hat\beta}, \hat{\sigma}^2) \\ &= (2\pi)^{-n/2} (\hat{\sigma}^2)^{-n/2} \exp\left\{-n/2 \right\}. \end{aligned} \]
The soybean
dataset: 3 x 5 factorial experiment in a randomized block design (RBD). Analysing the yield of grains (yield
). To make the things simpler for now, we subset the data for the potassium level 0. So the model corresponds to the one-way anova.
The model, in terms of the sources of variation or effects is \[ \text{mean}_{ij} = \text{intercept} + \text{BLK}_i + \text{WTR}_j \] where \(i = 1, \ldots, 5;\) and \(j = 1, 2, 3;\).
In greek letters notation \[ \mu_{ij} = \mu_0 + \gamma_i + \eta_j. \]
The \(F\) distribution plays a central role in hypotheses tests for univariate Gaussian models.
It can be defined to be distribution of the ratio \[ \frac{\chi^2_{a}/a}{\chi^2_{b}/b} \sim F_{a;b} \] when \(\chi^2_{a}\) and \(\chi^2_{b}\) are independent.
A linear hypothesis can be expressed using different notations:
Linear hypothesis on \(\boldsymbol{\beta}\) can be defined in terms of matrices \[ H_0: \mathbf{A}\boldsymbol{\beta} = \mathbf{c} \quad \text{vs} \quad H_a: \mathbf{A}\boldsymbol{\beta} \neq \mathbf{c}, \] where \(\mathbf{A}\) is a known matrix of order \(h \times k\) (rank \(h \leq k\)) and \(\mathbf{c}\) is a vector of \(h\) elements.
For example, lets to construct the \(\mathbf{A}\) to test the effect of treatments (4 levels) in a randomized block design (3 blocks). The expected mean for each \(ij\) condition is
\[ \mu_{ij} = \mu_0 + \gamma_i + \eta_j. \] where \(i = 1, \ldots, 3;\) and \(j = 1, \ldots, 4;\). To generate a full rank design matrix, \(\gamma_1\) and \(\eta_1\) are set to zero.
The \(H_0\) states that there is no effect of treatments and can be expressed as
\[ \begin{bmatrix} 0 & 0 & 0 & \textcolor{red}{1} & 0 & 0 \\ 0 & 0 & 0 & 0 & \textcolor{red}{1} & 0 \\ 0 & 0 & 0 & 0 & 0 & \textcolor{red}{1} \\ \end{bmatrix} \begin{bmatrix} \mu_0 \\ \gamma_{2} \\ \gamma_{3} \\ \textcolor{red}{\eta_{2}} \\ \textcolor{red}{\eta_{3}} \\ \textcolor{red}{\eta_{4}} \\ \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ \end{bmatrix} \quad \Rightarrow \quad\begin{bmatrix} \textcolor{red}{\eta_{2}} \\ \textcolor{red}{\eta_{3}} \\ \textcolor{red}{\eta_{4}} \\ \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ \end{bmatrix} .\]
This hypothesis, in particular, can be tested by
Consider the ANOVA table based on the reduction in the sum of squares approach (sequential ANOVA).
Define as the sum of squares of the reduced model model \[ \text{RSS}_0 = \text{RSS} + \text{HSS} \] where \(\text{HSS}\) is the treatment sum of squares, that is the sum of squares of the term under the hypothesis \(H_0\).
DF | SS | MS | F | |
---|---|---|---|---|
Blocks | \(l\) | \(R(\gamma|\mu_0)\) | \(\phantom{0}\) | |
Treatments | \(h\) | \(\text{HSS} = R(\eta|\gamma,\mu_0)\) | \(\text{HSS}/h\) | \(\frac{\text{HSS}/h}{\text{RSS}/(n - k)}\) |
Residuals | \(n-k\) | \(\text{RSS}\) | \(\text{RSS}/(n - k)\) |
\[ F = \frac{\text{HSS}}{\text{RSS}} \cdot \frac{n - k}{h} = \frac{\text{RSS}_0 - \text{RSS}}{\text{RSS}} \cdot \frac{n - k}{h} \sim F_{h; n - k}. \]
Note that
The maximized likelihood for a univariate LM is \[ \begin{aligned} \max_{\beta, \sigma^2} L(\boldsymbol{\beta}, \sigma^2) &= L(\boldsymbol{\hat\beta}, \hat{\sigma}^2) \\ &= (2\pi)^{-n/2} (\hat{\sigma}^2)^{-n/2} \exp\left\{-n/2 \right\}. \end{aligned} \]
The likelihood ratio for two nested models is \[ \begin{aligned} \Lambda &= \frac{ \max_{H_1} L(\textit{full model})}{ \max_{H_0} L(\textit{reduced model})}\\ &= \frac{ L(\boldsymbol{\hat\beta}, \hat{\sigma}^2)}{ L(\boldsymbol{\hat\beta}_0, \hat{\sigma}_0^2)}\\ &= \left(\frac{\hat{\sigma}^2}{\hat{\sigma}_0^2} \right)^{-n/2}\\ &= \left(\frac{n\hat{\sigma}^2 + n(\hat{\sigma}_0^2 - \hat{\sigma}^2)}{ n\hat{\sigma}^2}\right)^{n/2}\\ &\propto \frac{\text{RSS}_0 - \text{RSS}}{\text{RSS}} \end{aligned} \]
The deviance \[ n \log \left(\frac{\hat{\sigma}_0^2}{\hat{\sigma}^2} \right) \] approximates to \(\chi^2_{k - q}\) under \(H_0\).
It can be demonstrated that the quantities \[ \begin{aligned} \frac{n(\hat{\sigma}_0^2 - \hat{\sigma}^2)}{\sigma^2} &\sim \chi^2_{h}\\ \frac{n\hat{\sigma}^2}{\sigma^2} &\sim \chi^2_{n - k}\\ \end{aligned} \] are independent and \[ F = \frac{n(\hat{\sigma}_0^2 - \hat{\sigma}^2)}{ n\hat{\sigma}^2}\cdot \frac{n - k}{h} = \frac{\text{RSS}_0 - \text{RSS}}{\text{RSS}} \cdot \frac{n - k}{h} \sim F_{h;n - k} \]
Under \(H_0\), \[ \frac{1}{\sigma^2} (\mathbf{A}\boldsymbol{\hat\beta}- \mathbf{c})' [\mathbf{A} (\mathbf{X}'\mathbf{X})^{-1} \mathbf{A}']^{-1} (\mathbf{A}\boldsymbol{\hat\beta}- \mathbf{c}) \sim \chi^2_h. \] Follows that under \(H_0\), the statistic \[ \begin{aligned} F &= \frac{(\mathbf{A}\boldsymbol{\hat\beta}- \mathbf{c})' [\mathbf{A} (\mathbf{X}'\mathbf{X})^{-1} \mathbf{A}']^{-1} (\mathbf{A}\boldsymbol{\hat\beta}- \mathbf{c})}{h\hat{\sigma}^2}\\ &= \frac{(\mathbf{A}\boldsymbol{\hat\beta}- \mathbf{c})' [\mathbf{A} \hat{\Sigma}_{\beta} \mathbf{A}']^{-1} (\mathbf{A}\boldsymbol{\hat\beta}- \mathbf{c})}{h},\\ \end{aligned} \] has the \(F\) distribution with \(h\) and \(n-k\) degress of freedom. The matrix \(\hat{\Sigma}_{\beta} = \hat{\sigma}^2 (\mathbf{X}'\mathbf{X})^{-1}\).
We are now extend the previous notation to consider a vector of \(r\) responses measured in each sample unit.
The model for each sample unit \(i = 1, \ldots, n\), is \[ \begin{aligned} \mathbf{Y}_{i.} &\sim \text{Normal}_r(\boldsymbol{\mu}_i, \Sigma)\\ \boldsymbol{\mu}_i &= \mathbf{X}_{i.}\boldsymbol{B}, \end{aligned} \]
where
The model written for the response matrix is \[ \begin{aligned} \mathbf{Y} &= \mathbf{X}\boldsymbol{B} + \boldsymbol{E} \\ \boldsymbol{E}_{i.} &\sim \text{Normal}_r(\mathbf{0}, \Sigma) \end{aligned} \] where
The likehood for one sample unit is \[ \begin{aligned} L(\boldsymbol{B},\Sigma,\mathbf{Y}_{i.}) = (2\pi)^{-r/2} |\Sigma|^{-1/2} \exp\left\{ -\frac{1}{2} (\mathbf{Y}_{i.} - \mathbf{X}_{i.}\boldsymbol{B}) \Sigma^{-1} (\mathbf{Y}_{i.} - \mathbf{X}_{i.}\boldsymbol{B})' \right\} \end{aligned} \]
For all sample units is
\[ \begin{aligned} L(\boldsymbol{B},\Sigma,\mathbf{Y}) &= \prod_{i = 1}^n (2\pi)^{-r/2} |\Sigma|^{-1/2} \exp\left\{ -\frac{1}{2} (\mathbf{Y}_{i.} - \mathbf{X}_{i.}\boldsymbol{B}) \Sigma^{-1} (\mathbf{Y}_{i.} - \mathbf{X}_{i.}\boldsymbol{B})' \right\}\\ &= (2\pi)^{-nr/2} |\Sigma|^{-n/2} \exp\left\{ -\frac{1}{2} \sum_{i = 1}^n (\mathbf{Y}_{i.} - \mathbf{X}_{i.}\boldsymbol{B}) \Sigma^{-1} (\mathbf{Y}_{i.} - \mathbf{X}_{i.}\boldsymbol{B})' \right\}. \end{aligned} \]
Using the vectorize (\(\text{vec}\)) operation and kronecker product (\(\otimes\)), the observed data can be reshaped from wide to long format.
The same model is now represented by \[ \mathcal{Y} \sim \text{Normal}_{rn}(\mathcal{X}\boldsymbol{\beta}, \Omega), \] where \(\Omega = \Sigma \otimes \mathbf{I}\).
Also, it can be written as \[ \mathcal{Y} = \mathcal{X}\boldsymbol{\beta} + \mathcal{E}. \]
In the stacked data format, the likelihood function is defined as \[ L(\boldsymbol{\beta}, \Omega) = (2\pi)^{ -\frac{rn}{2} } |\Omega|^{-\frac{1}{2}} \exp \left\{ -\frac{1}{2} (\mathcal{Y} - \mathcal{X}\boldsymbol{\beta})^{'} \Omega^{-1} (\mathcal{Y} - \mathcal{X}\boldsymbol{\beta}) \right\}. \]
The maximized likelihood is \[ \begin{aligned} L(\boldsymbol{\hat\beta}, \hat\Omega) &= (2\pi)^{-rn/2} |\hat\Sigma \otimes \mathbf{I}|^{-1/2} \exp \left\{ -\frac{rn}{2} \right\}\\ &= (2\pi)^{-rn/2} |\hat\Sigma|^{-n/2} \exp \left\{ -\frac{rn}{2} \right\}. \end{aligned} \]
\[ \begin{aligned} {\boldsymbol{\hat B}} &= (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{Y}\\ {\boldsymbol{\hat B}}_{.j} &= (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{Y}_{.j}\\ {\boldsymbol{\hat \beta}} &= (\mathcal{X}'\mathcal{X})^{-1}\mathcal{X}'\mathcal{Y}\\ \text{Cov}({\boldsymbol{\hat\beta}}) &= {\hat\Sigma} \otimes (\mathbf{X}'\mathbf{X})^{-1} = \begin{bmatrix} \hat{\sigma}_{11}(\mathbf{X}'\mathbf{X})^{-1} & \cdots & \hat{\sigma}_{1r}(\mathbf{X}'\mathbf{X})^{-1} \\ \vdots & \ddots & \vdots \\ \hat{\sigma}_{r1}(\mathbf{X}'\mathbf{X})^{-1} & \cdots & \hat{\sigma}_{rr}(\mathbf{X}'\mathbf{X})^{-1} \end{bmatrix} \end{aligned} \]
A linear hypothesis can be defined in two equivalent ways.
\[ H_0: \mathbf{A}\boldsymbol{B}\mathbf{M} = \mathbf{C} \quad \textrm{vs} \quad H_1: \mathbf{A}\boldsymbol{B}\mathbf{M} \neq \mathbf{C} \]
where
Example of \(\mathbf{A}\boldsymbol{B}\mathbf{M} = \mathbf{C}\) to test \(H_0: \eta_{mj} = 0\) for all \(mj\), \(m = 1, 2; j = 2, \ldots, 4\). These matrices are
\[ \begin{aligned} \begin{bmatrix} 0 & 0 & 0 & \textcolor{red}{1} & 0 & 0 \\ 0 & 0 & 0 & 0 & \textcolor{red}{1} & 0 \\ 0 & 0 & 0 & 0 & 0 & \textcolor{red}{1} \\ \end{bmatrix} \begin{bmatrix} \mu_{10} & \mu_{20} \\ \gamma_{12} & \gamma_{22} \\ \gamma_{13} & \gamma_{23} \\ \textcolor{red}{\eta_{12}} & \textcolor{red}{\eta_{22}} \\ \textcolor{red}{\eta_{13}} & \textcolor{red}{\eta_{23}} \\ \textcolor{red}{\eta_{14}} & \textcolor{red}{\eta_{24}} \\ \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ \end{bmatrix} &= \begin{bmatrix} 0 & 0 \\ 0 & 0 \\ 0 & 0 \\ \end{bmatrix} \\\Rightarrow\begin{bmatrix} \textcolor{red}{\eta_{12}} & \textcolor{red}{\eta_{22}} \\ \textcolor{red}{\eta_{13}} & \textcolor{red}{\eta_{23}} \\ \textcolor{red}{\eta_{14}} & \textcolor{red}{\eta_{24}} \\ \end{bmatrix} &= \begin{bmatrix} 0 & 0 \\ 0 & 0 \\ 0 & 0 \\ \end{bmatrix} .\end{aligned} \]
The same linear hypothesis can be represented in using the vectorized form of \(\boldsymbol{B}\), \(\boldsymbol{\beta}\)
\[ H_0: \mathbf{L}\boldsymbol{\beta} = \mathbf{c} \quad \textrm{vs} \quad H_1: \mathbf{L}\boldsymbol{\beta} \neq \mathbf{c} \]
where
\[ \begin{bmatrix} 0 & 0 & 0 & \textcolor{red}{1} & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & \textcolor{red}{1} & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & \textcolor{red}{1} & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \textcolor{red}{1} & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \textcolor{red}{1} & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \textcolor{red}{1} \\ \end{bmatrix} \begin{bmatrix} \mu_{10} \\ \gamma_{12} \\ \gamma_{13} \\ \textcolor{red}{\eta_{12}} \\ \textcolor{red}{\eta_{13}} \\ \textcolor{red}{\eta_{14}} \\ \mu_{20} \\ \gamma_{22} \\ \gamma_{23} \\ \textcolor{red}{\eta_{22}} \\ \textcolor{red}{\eta_{23}} \\ \textcolor{red}{\eta_{24}} \\ \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ \end{bmatrix} .\]
Let \(\mathbf{R}\) be the residual sum of squares and cross products (SSP) of the full model \[ \mathbf{R} = (\mathbf{Y} - \mathbf{X}\boldsymbol{\hat{B}})' (\mathbf{Y} - \mathbf{X}\boldsymbol{\hat{B}}). \]
Defines \(\mathbf{H}\) as the hypothesis SSP matrix, \[ \begin{aligned} \mathbf{H} &= n(\hat{\Sigma}_0 - \hat{\Sigma})\\ &= (\mathbf{A}\boldsymbol{\hat{B}}\mathbf{M} - \mathbf{C})' [\mathbf{A} (\mathbf{X}'\mathbf{X})^{-1} \mathbf{A}']^{-1} (\mathbf{A}\boldsymbol{\hat{B}}\mathbf{M} - \mathbf{C}), \end{aligned} \]
The statistic \[ \text{tr} (\mathbf{H}\mathbf{R}^{-1}) = \sum_{i = 1}^{s} \lambda_i \] is known as Hotteling-Lawley trace. The values \(\lambda_1 \geq \lambda_2 \geq \ldots \lambda_s\) are the nonzero eigen values of \(\mathbf{H}\mathbf{R}^{-1}\).
Berndt e Savin (1977) show that the same value is obtained as the result of a Wald test using the vectorized version of the components
\[ \text{tr} (\mathbf{H}\mathbf{R}^{-1}) = (\mathbf{L} \boldsymbol{\hat\beta} - \mathbf{c})' [\mathbf{L} ( (\mathbf{M}' \mathbf{R} \mathbf{M}) \otimes (\mathbf{X}'\mathbf{X})^{-1}) \mathbf{L}']^{-1} (\mathbf{L} \boldsymbol{\hat\beta} - \mathbf{c}). \]
Under \(H_0\), the statistic \[ n \text{tr} (\mathbf{H}\mathbf{R}^{-1}), \] has a limiting \(\chi^2\) distribution with \(vh\) degrees of freedom.
A test based on the likelihood ratio is written in terms of generalized variances \[ \begin{aligned} \Lambda &= \frac{\max_{H_0} L(\beta, \Sigma)}{\max L(\beta, \Sigma)} \\ &= \left(\frac{|\hat{\Sigma}_0|}{|\hat{\Sigma}|}\right)^{-n/2},\\ \end{aligned} \] and the Wilks’ statistic is \[ \Lambda^{2/n} = \frac{|\hat{\Sigma}|}{|\hat{\Sigma}_0|}. \]
Under \(H_0\), \(n\hat{\Sigma} \sim W_{r;n-k}\) independently of \(n(\hat{\Sigma}_0 - \hat{\Sigma}) \sim W_{r;h}\). The likelihood ratio test of \(H_0\) is equivalent to reject \(H_0\) for large values of the deviance \[ -2 \log \Lambda = -n \log \left( \frac{|n\hat{\Sigma}|}{ |n\hat{\Sigma} + n(\hat{\Sigma}_0 - \hat{\Sigma})|} \right). \]
Under \(H_0\), the above statistic has a limiting \(\chi^2\) distribution with \(vh\) degrees of freedom.
For the same hypothesis, there are four multivariate tests.
\[ \begin{aligned} \text{Wilk's lambda} &= \frac{|\mathbf{R}|}{|\mathbf{R} + \mathbf{H}|} = \prod_{i=1}^s \frac{1}{1+\lambda_i}\\ \text{Hotelling-Lawley trace} &= \text{tr}(\mathbf{H}\mathbf{R}^{-1}) = \sum_{i=1}^s \lambda_i\\ \text{Pillai's trace} &= \text{tr}(\mathbf{H}(\mathbf{H} + \mathbf{R})^{-1}) = \sum_{i=1}^s \frac{\lambda_i}{1+\lambda_i}\\ \text{Roy's greatest root} &= \max(\lambda_1,\ldots,\lambda_s) = \lambda_1 \end{aligned} \]
We did a small simulation study to access the limiting \(\chi^2\) and \(F\) approximation for each of this tests.
Berndt, Ernst R., e N. Eugene Savin. 1977. “Conflict among criteria for testing hypotheses in the multivariate linear regression model”. Econometrica 45 (5). JSTOR: 1263. doi:10.2307/1914072.