Implementation of Logistic GWR

As described here in the overview of logistic GWR, the goal of this statistical tool is to parameterize a local non-linear relationship between one or more independent variables, and the probability that a binary variable will be coded as 1 rather than 0 (e.g., the data value is yes rather than no, or diseased rather than without disease). This statistical tool uses the logit as its link function - the logit link function constrains the probability values (predictions of y) to a range from 0 - 1.

Obtaining the parameter estimates

For logistic regression in SpaceStat, the regression formulation is carried out in terms of maximum likelihood (L) estimation. A "likelihood" is a probability (and must have a value within the range of 0 - 1); in this case the probability that the dependent variable can be predicted from the independent variables. As indicated in the equation below, the maximum likelihood estimator uses a Bernoulli distribution to define a joint probability distribution from the individual dependent variable observations. In the following equations, the brackets around the beta, which symbolizes the regression coefficients, indicate that we are estimating two or more regression coefficients.

The goal of maximum likelihood estimation is to maximize the Log-Likelihood (lnL), which has a value between 0 and negative infinity (negative, because you are taking the log of a value that is less than 1). Maximum likelihood estimation is an iterative process. Recall from our overview that we need to account for both geographic and non-geographic weighting factors (w) in our estimation. The weighted log-likelihoods for logistic regression can be obtained by raising each individual probability to the power of a weight factor, taking the product over observations, and then taking the logarithm.

To estimate the regression coefficients, SpaceStat uses the Taylor expansion of the equation above, and then the maximum likelihood algorithm determines the direction and sign of changes in the regression coefficients which will increase the lnL. After starting from an arbitrary set of coefficient estimates, the initial function is estimated and the residuals are evaluated. From these results, the algorithm modifies the coefficient values, and generates a new set of residuals which are compared to previous values. This process continues until there is little change in the lnL. There is a possibility that this process will not lead to convergence due to what is called a "ridge-effect"; in this case, the Log-likelihood remains constant as coefficients are varied.

Evaluating the global model

Overall model quality is assessed for logistic regression models using negative 2 times the natural log of the likelihood function (-2lnL); in general, as the model fit improves, -2lnL will decrease in magnitude. Other names for -2lnL include the likelihood ratio, deviance chi-square, and the model's measure of goodness of fit. Values of -2lnL have approximately a chi-square distribution, and as a result this distribution is used for significance testing of the overall model. In effect -2lnL plays the same role as the residual sum of squares (RSS) in linear regression (i.e., it reflects the unexplained variance between predicted and observed values). The -2lnL value forms the basis of the likelihood ratio test (described below), which as the name suggests, is a significance test based on the difference between the likelihood ratios of two forms of a model. Likelihood ratios are not calculated for local models, but we recommend that you start with the aspatial version of logistic models --in part because the GWR versions often do not converge--prior to working with GWR models.

Measures of global model fit and significance

For aspatial logistic regression, SpaceStat presents two calculations that play the role of R-squared in linear regression, and a full model chi-square test, which are described here in aspatial logistic regression. Again, we suggest working with an aspatial logistic version first, so it will likely be useful to review this related information.

Significance of individual terms in the local regression model

For logistic GWR, SpaceStat presents the parameter estimates, parameter standard errors and p-values (using a chi-squared distribution). As described above for evaluating the significance of the entire aspatial logistic regression model, likelihood ratio tests are used to evaluate the significance of individual parameters in the model. The basic idea of these significance tests is the same as the test of significance of the full model, except in this case the test is based on the difference in -2lnL for an overall model and a nested model where one independent variable has been dropped. If the test for a particular parameter is not significant, this means that the coefficient for that variable can be considered not significantly different from zero, and that you can drop this variable from your model without a reduction in model performance. Note that you can't use this approach to compare two non-nested models.