(p.377) Appendix 5 Theory on Regression Models for Proportions
(p.377) Appendix 5 Theory on Regression Models for Proportions
The approach for fitting generalized linear models is based on work by Nelder and Wedderburn (1) and McCullagh and Nelder (2), which provides a powerful synthesis of regression methodology, thus unifying various techniques that had previously been considered separately. The elements of a generalized linear model that need to be identified in order to obtain maximum likelihood estimates are the distribution of the response and the relationship between the mean, μ, and the linear predictor, η, which introduces the regressor variables into the analysis. These elements are then used to derive the necessary information for statistical inference on the model parameters.
Distribution for Binary Responses
As we saw in Chapter 3, the total number of responders in a group of n individuals with a common probability of the outcome has a binomial distribution in which
A final term that is often used when formulating likelihoodbased inference for binary responses is the deviance, which is defined as twice the logarithm of the ratio of the likelihood evaluated at the observed response to that of the expected response. For the ith subject this becomes
Functions of the Linear Predictor
The relationship between the proportion responding and the linear predictor is defined by the link function, η = g(π), which gives the transformation of the probability of the response that leads to the linear portion of the model. It is also necessary to be able to reverse the process by determining π as a function of the linear predictor, π = g ^{−1}(η), called the inverse link. Some software also requires that we specify the derivative of the linear predictor with respect to π,
Example AS–1. Suppose we are interested in fitting a model in which the odds for disease is a linear function of the regression parameters. Let π represent the
Table A5–1. Commonly used link functions, inverse links, and derivatives for binary response data
Model 
Link function g(π) 
Inverse link g ^{−1}(η) 
Derivative 

Logit 

Complementary loglog 
log[−log(1 − π)] 
(1 − exp[−exp{η}]) 

Probit 
Ф ^{−1}(π) 
n·Ф(η) 

Linear odds 

Power odds (γ ≠ 0) (γ = 0; see Logit) 
Using Results to Conduct Inference
Typical output from statistical software includes computations involving the functions defined here, which are evaluated at various values of the underlying model parameters. Let us now consider how these elements are typically used in conducting statistical inference.
First, we are given the maximum likelihood estimates of the model parameters, β, along with the estimated covariance matrix, calculated from
An alternative approach for constructing a significance test is to use the log likelihood or the scaled deviance to construct a likelihood ratio test of whether a set of regression parameters has no effect, H _{0}: β = 0, which yields the fitted mean, . Typically, we are given the log likelihood, so that the test is given by
Finally, we consider the construction of interval estimates of the model parameters. The simplest approach is to make use of the parameter estimates and their standard errors, so that the 100(1 − α)% confidence interval for the pth parameter is
A model is always proposed as a possible candidate for describing a set of data, hence, it is important that we be sure that it provides an adequate description of the data. One way in which this can be accomplished is to conduct a goodness of fit test. When the number of subjects at each of the I independent samples is reasonably large, say about 10 or more, then we already noted that the deviance can be used for this purpose, comparing it to a chisquare distribution with I − p df, where p is the number of parameters estimated. Alternatively, we can use the Pearson chisquare statistic:
While this approach will indeed determine the maximum likelihood estimates of the parameters, the variance estimators provided by the software will not take into account the fact that γ was estimated from the data and is not a known constant. Hence, we will need to correct the covariance matrix for the parameters by considering the resulting information matrix, which can be partitioned as