Chapter 4 Models specified by second-moment assumptions

In Chapter 2, we presented four statistical models to deal with count data and in the Chapter 3 the method of maximum likelihood was introduced to estimate the model’s parameters. As discussed in Chapter 3 the method of maximum likelihood assumes that the true distribution of the count random variable \(Y\) is known apart of the values of a finite number of unknown parameters. In this Chapter, we shall present a different approach for model specification, estimation and inference based only on second-moment assumptions.

4.1 Extended Poisson-Tweedie model

The Poisson-Tweedie distribution as presented in subsection 2.3 provides a very flexible family of count distributions, however, such a family has two main drawbacks: it cannot deal with underdispersed count data and its probability mass function is given by an intractable integral, which implies that estimation based on the maximum likelihood method is computational demanding for practical data analysis.

In spite of these issues Jørgensen and Kokonendji (2015) showed using factorial cumulant function that for \(Y \sim PTw_p(\mu, \phi)\), \(\mathrm{E}(Y) = \mu\) and \(\mathrm{var}(Y) = \mu + \phi \mu^p\). This fact motivates Bonat et al. (2016) to specify a model by using only second-moment assumptions, i.e. mean and variance.

Thus, consider a cross-section dataset, \((y_i, \boldsymbol{x}_i)\), \(i = 1, \ldots, n\), where \(y_i\)’s are i.i.d. realizations of \(Y_i\) according to an unspecified distribution, whose expectation and variance are given by \[\begin{align} \mathrm{E}(Y_i) = \mu_i = g^{-1}(\boldsymbol{x}_i^{\top} \boldsymbol{\beta}) \nonumber \\ \mathrm{var}(Y_i) = C_i = \mu_i + \phi \mu_i^p, \tag{4.1} \end{align}\]

where as before \(\boldsymbol{x}_i\) and \(\boldsymbol{\beta}\) are (\(p \times 1\)) vectors of known covariates and unknown regression parameters and \(g\) is the logarithm link function. The regression model specified in (4.1) is parametrized by \(\boldsymbol{\theta} = (\boldsymbol{\beta}^\top, \boldsymbol{\lambda}^\top )^\top\), where \(\boldsymbol{\lambda} = (\phi, p)\).

Note that, based on second-moment assumptions, the only restriction to have a proper model is that \(\mathrm{var}(Y_i) > 0\), thus \[\phi > - \mu^{(1-p)}_i,\] which shows that at least at some extent negative values for the dispersion parameter are allowed. Consequently, the Poisson-Tweedie model can be extended to deal with underdispersed count data, however, in doing so the associated probability mass functions do not exist. However, in a regression modelling framework as discussed in this material, we are in general interested in the regression coefficient effects, thus such an issue does not imply any loss of interpretation and applicability. The formulation of the extended Poisson-Tweedie model is exactly the same of the quasi-binomial and quasi-Poisson models popular in the context of generalized linear models, see (Wedderburn 1974, Nelder and Wedderburn (1972)) for details. Furthermore, note that for \(p = 1\) the extended Poisson-Tweedie regression model corresponds to a reparametrization of the popular quasi-Poisson regression model.

It is also important to highlight that in this case the relationship between mean and variance is proportional to the dispersion parameter \(\phi\) as in the Gamma-Count and COM-Poisson distributions. Thus, we expect for \(\phi < 0\) and \(p = 1\) the extended Poisson-Tweedie model presents results in terms of regression coefficients really similar the ones from the Gamma-Count and COM-Poisson regression models.

4.2 Estimation and Inference

Since the model presented in (4.1) is based only on second-moment assumptions the method of maximum likelihood cannot be employed. Bonat et al. (2016) based on ideas of Jørgensen and Knudsen (2004) and Bonat and Jørgensen (2016) proposed an estimating function approach for estimation and inference for the extended Poisson-Tweedie regression model. Bonat et al. (2016) combined the quasi-score and Pearson estimating functions for estimation of the regression and dispersion parameters respectively. Following Bonat et al. (2016) the quasi-score function for \(\boldsymbol{\beta}\) has the following form,

\[\begin{equation*} \psi_{\boldsymbol{\beta}}(\boldsymbol{\beta}, \boldsymbol{\lambda}) = \left (\sum_{i=1}^n \frac{\partial \mu_i}{\partial \beta_1}C^{-1}_i(Y_i - \mu_i), \ldots, \sum_{i=1}^n \frac{\partial \mu_i}{\partial \beta_p}C^{-1}_i(Y_i - \mu_i) \right )^\top, \end{equation*}\]

where \(\partial \mu_i/\partial \beta_j = \mu_i x_{ij}\) for \(j = 1, \ldots, p\). The sensitivity matrix is defined as the expectation of the first derivative of the estimating function with respect to the model parameters. Thus, the entry \((j,k)\) of the \(p \times p\) sensitivity matrix for \(\psi_{\boldsymbol{\beta}}\) is given by

\[\begin{equation} \mathrm{S}_{\boldsymbol{\beta}_{jk}} = \mathrm{E}\left ( \frac{\partial}{\partial \beta_k} \psi_{\boldsymbol{\beta}_j}(\boldsymbol{\beta}, \boldsymbol{\lambda}) \right ) = -\sum_{i=1}^n \mu_i x_{ij} C^{-1}_i x_{ik} \mu_i. \tag{4.2} \end{equation}\] In a similar way, the variability matrix is defined as the variance of the estimating function. In particular, for the quasi-score function the entry \((j,k)\) of the \(p \times p\) variability matrix is given by \[\begin{equation*} \label{Vbeta} \mathrm{V}_{\boldsymbol{\beta}_{jk}} = \mathrm{Cov}(\psi_{\boldsymbol{\beta}_j}(\boldsymbol{\beta}, \boldsymbol{\lambda}),\psi_{\boldsymbol{\beta}_k}(\boldsymbol{\beta}, \boldsymbol{\lambda})) = \sum_{i=1}^n \mu_i x_{ij} C^{-1}_i x_{ik} \mu_i. \end{equation*}\]

The Pearson estimating function for the dispersion parameters has the following form,

\[\begin{equation*} \label{Pearson} \psi_{\boldsymbol{\lambda}}(\boldsymbol{\lambda}, \boldsymbol{\beta}) = \left (-\sum_{i=1}^n \frac{\partial C^{-1}_i}{\partial \phi} \left [ (Y_i - \mu_i)^2 - C_i \right ], -\sum_{i=1}^n \frac{\partial C^{-1}_i}{\partial p} \left [ (Y_i - \mu_i)^2 - C_i \right ] \right )^\top. \end{equation*}\]

Note that, the Pearson estimating functions are unbiased estimating functions for \(\boldsymbol{\lambda}\) based on the squared residuals \((Y_i - \mu_i)^2\) with expected value \(C_i\).

The entry \((j,k)\) of the \(2 \times 2\) sensitivity matrix for the dispersion parameters is given by \[\begin{equation} \mathrm{S}_{\boldsymbol{\lambda}_{jk}} = \mathrm{E}\left ( \frac{\partial}{\partial \lambda_k}\psi_{\boldsymbol{\lambda}_j}(\boldsymbol{\lambda}, \boldsymbol{\beta}) \right ) = -\sum_{i=1}^n \frac{\partial C^{-1}_i}{\partial \lambda_j} C_i \frac{\partial C^{-1}_i}{\partial \lambda_k}C_i, \tag{4.3} \end{equation}\]

where \(\lambda_1\) and \(\lambda_2\) denote either \(\phi\) or \(p\).

Similarly, the cross entries of the sensitivity matrix are given by \[\begin{equation} \mathrm{S}_{\boldsymbol{\beta}_j \boldsymbol{\lambda}_k} = \mathrm{E}\left ( \frac{\partial}{\partial \lambda_k}\psi_{\boldsymbol{\beta}_j}(\boldsymbol{\beta}, \boldsymbol{\lambda}) \right ) = 0 \tag{4.4} \end{equation}\] and \[\begin{equation} \mathrm{S}_{\boldsymbol{\lambda}_j \boldsymbol{\beta}_k} = \mathrm{E}\left ( \frac{\partial}{\partial \beta_k}\psi_{\boldsymbol{\lambda}_j}(\boldsymbol{\lambda}, \boldsymbol{\beta}) \right ) = -\sum_{i=1}^n \frac{\partial C_i^{-1}}{\partial \lambda_j} C_i \frac{\partial C_i^{-1}}{\partial \beta_k} C_i. \tag{4.5} \end{equation}\] Finally, the joint sensitivity matrix for the parameter vector \(\boldsymbol{\theta}\) is given by \[\begin{equation*} \mathrm{S}_{\boldsymbol{\theta}} = \begin{pmatrix} \mathrm{S}_{\boldsymbol{\beta}} & \boldsymbol{0} \\ \mathrm{S}_{\boldsymbol{\lambda}\boldsymbol{\beta}} & \mathrm{S}_{\boldsymbol{\lambda}} \end{pmatrix}, \end{equation*}\]

whose entries are defined by equations (4.2), (4.3), (4.4) and (4.5).

We now calculate the asymptotic variance of the estimating function estimators denoted by \(\boldsymbol{\hat{\theta}}\), as obtained from the inverse Godambe information matrix, whose general form for a vector of parameter \(\boldsymbol{\theta}\) is \(\mathrm{J}^{-1}_{\boldsymbol{\theta}} = \mathrm{S}^{-1}_{\boldsymbol{\theta}} \mathrm{V}_{\boldsymbol{\theta}} \mathrm{S}^{-\top}_{\boldsymbol{\theta}}\), where \(-\top\) denotes inverse transpose. The variability matrix for \(\boldsymbol{\theta}\) has the form \[\begin{equation} \mathrm{V}_{\boldsymbol{\theta}} = \begin{pmatrix} \mathrm{V}_{\boldsymbol{\beta}} & \mathrm{V}_{\boldsymbol{\beta}\boldsymbol{\lambda}} \\ \mathrm{V}_{\boldsymbol{\lambda}\boldsymbol{\beta}} & \mathrm{V}_{\boldsymbol{\lambda}} \end{pmatrix}, \tag{4.6} \end{equation}\]

where \(\mathrm{V}_{\boldsymbol{\lambda}\boldsymbol{\beta}} = \mathrm{V}^{\top}_{\boldsymbol{\beta}\boldsymbol{\lambda}}\) and \(\mathrm{V}_{\boldsymbol{\lambda}}\) depend on the third and fourth moments of \(Y_i\), respectively. In order to avoid this dependence on higher-order moments, we use the empirical versions of \(\mathrm{V}_{\boldsymbol{\lambda}}\) and \(\mathrm{V}_{\boldsymbol{\lambda}\boldsymbol{\beta}}\) as given by

\[\begin{equation*} \tilde{\mathrm{V}}_{\boldsymbol{\lambda}_{jk}} = \sum_{i=1}^n \psi_{\boldsymbol{\lambda}_j}(\boldsymbol{\lambda}, \boldsymbol{\beta})_i\psi_{\boldsymbol{\lambda}_k}(\boldsymbol{\lambda}, \boldsymbol{\beta})_i \quad \text{and} \quad \tilde{\mathrm{V}}_{\boldsymbol{\lambda}_j \boldsymbol{\beta}_k} = \sum_{i=1}^n \psi_{\boldsymbol{\lambda}_j}(\boldsymbol{\lambda}, \boldsymbol{\beta})_i \psi_{\boldsymbol{\beta}_k}(\boldsymbol{\lambda}, \boldsymbol{\beta})_i. \end{equation*}\] Finally, the well known asymptotic distribution of \(\boldsymbol{\hat{\theta}}\) (Jørgensen and Knudsen 2004) is given by \[\begin{equation*} \boldsymbol{\hat{\theta}} \sim \mathrm{N}(\boldsymbol{\theta}, \mathrm{J}_{\boldsymbol{\theta}}^{-1}), \quad \text{where} \quad \mathrm{J}^{-1}_{\boldsymbol{\theta}} = \mathrm{S}^{-1}_{\boldsymbol{\theta}} \mathrm{V}_{\boldsymbol{\theta}} \mathrm{S}^{-\top}_{\boldsymbol{\theta}}. \end{equation*}\] To solve the system of equations \(\psi_{\boldsymbol{\beta}} = \boldsymbol{0}\) and \(\psi_{\boldsymbol{\lambda}} = \boldsymbol{0}\) Jørgensen and Knudsen (2004) proposed the modified chaser algorithm, defined by \[\begin{eqnarray*} \label{chaser} \boldsymbol{\beta}^{(i+1)} &=& \boldsymbol{\beta}^{(i)} - \mathrm{S}_{\boldsymbol{\beta}}^{-1} \psi_{\boldsymbol{\beta}}(\boldsymbol{\beta}^{(i)}, \boldsymbol{\lambda}^{(i)}) \nonumber \\ \boldsymbol{\lambda}^{(i+1)} &=& \boldsymbol{\lambda}^{(i)} - \alpha \mathrm{S}_{\boldsymbol{\lambda}}^{-1} \psi_{\boldsymbol{\lambda}}(\boldsymbol{\beta}^{(i+1)}, \boldsymbol{\lambda}^{(i)}). \end{eqnarray*}\]

The modified chaser algorithm uses the insensitivity property (4.4), which allows us to use two separate equations to update \(\boldsymbol{\beta}\) and \(\boldsymbol{\lambda}\). We introduce the tuning constant, \(\alpha\), to control the step-length. This algorithm is a special case of the flexible algorithm presented by Bonat and Jørgensen (2016) in the context of multivariate covariance generalized linear models. Hence, estimation for the extended Poisson-Tweedie model is easily implemented in R through the mcglm (Bonat 2016) package.