About Logistic Geographically Weighted Regression

Logistic regression is a form of generalized linear models that has been developed specifically for predicting the value of dependent variables that are binary (categorical with two possible values) such as yes/no, male/female, presence/absence, or diseased/healthy response variables. Independent variables in a logistic regression can be continuous and/or categorical, and you can also include interaction effects and weights in your model. Note however that some combinations of categorical variables lead to overspecified models, so read that page so that you know what to avoid when defining your model. Assumptions of logistic regression models include (1) independent observations; and that (2) independent variables are linearly related to the log odds (logit) of the dependent variable. Geographically weighted logistic regression can be thought of as the local version of aspatial logistic regression: Instead of just one global regression equation for the entire dataset, this technique generates parameter estimates for each neighborhood within your spatial dataset. Calculation of local relationships requires choice of a spatial weighting factor, a value that determines how strongly values measured at nearby locations influence the regression equation calculation, and your definition of neighborhood, which will define how many other points are used to estimate the local regression lines. In aspatial logistic regression the weighting factor is set to one for all values in the dataset, and the neighborhood is extended to the entire geography, so values from all locations contribute equally to the regression equation.

To perform logistic regression, data must be coded as 0 or 1 (of type numeric, not string). The mean of the sample is equal to the proportion of 1s in the sample, and this mean is also the probability of drawing a case labeled as 1 at random from your data. Examples of applications include a model to predict probability of disease incidence (yes or no on a per-individual basis), with groundwater contamination level as an independent variable, or a model to predict the sex of birds (for species where it is not readily apparent) based on measurements of weight, and wing and bill length.

As is true for simple linear regression, the goal is to model a relationship between the independent (predictor) variable(s) and the dependent (response) variable. More specifically, the goal for logistic regression is to predict the category for the dependent variable from the suite of independent variables that provides the best balance between model performance and model complexity. However, rather than directly estimating the category of the response variable at a given value of the predictor, for logistic regression, you estimate the probability that an observation has a code of 1. Specifically, this tool predicts the log odds that an observation will be a 1 (probability = P) rather than a 0 (probability = 1 - P). Because all of the y values in your dataset are either 0 or 1, scatterplots such as the one shown in the description of linear regression are not very useful for binary data. Instead, you might do a histogram of the frequency of "1s" in your dataset, and a curve fit to your graph would have an expected shape similar to the logistic curve shown below.

A general equation for logistic regression

In SpaceStat' logistic regression, the relationship between the independent variable(s) and the probability that observations of your dependent variable will be coded 1 is described with a logistic curve of the form shown above. Note that the regression coefficients, which are the parameters that you are trying to estimate during the regression procedure, are part of a power function that controls the shape of the logistic curve. Next, SpaceStat will use maximum likelihood estimation to this transformed variable (from the 1s and 0s to a logit variable - the natural log of the odds of the dependent variable being coded 1).

Click here to compare this formula to the one for linear regression.

Output for logistic regression

Similar to aspatial Poisson regression, SpaceStat output for GWR logistic regression includes calculation of parameter estimates, and significance values (p-values) for individual terms in the model. The output is described in more detail here, including examples from the SpaceStat log view.

To find out more about how GWR logistic regression is implemented in SpaceStat, click here.

To skip the details and learn about how to perform GWR, click here.