About Aspatial Logistic Regression

Logistic regression is a form of generalized linear models that has been developed specifically for predicting the value of dependent variables that are binary (categorical with two possible values) such as yes/no, male/female, presence/absence, or diseased/healthy response variables. Independent variables in a logistic regression can be continuous and/or categorical, and you can also include interaction effects and weights in your model. Note however that some combinations of categorical variables lead to overspecified models, so read the categorical variables page so that you know what to avoid when defining your model. Assumptions of logistic regression models include (1) independent observations; and that (2) independent variables are linearly related to the log odds (logit) of the dependent variable.

To perform logistic regression, data must be coded as 0 or 1 (of type numeric, not string). The mean of the sample is equal to the proportion of 1s in the sample, and this mean is also the probability of drawing a case labeled as 1 at random from your data. Examples of applications include a model to predict probability of disease incidence (yes or no on a per-individual basis), with groundwater contamination level as an independent variable, or a model to predict the sex of birds (for species where it is not readily apparent) based on measurements of weight, and wing and bill length.

As is true for simple linear regression, the goal is to model a relationship between the independent (predictor) variable(s) and the dependent (response) variable. More specifically, the goal for logistic regression is to predict the category for the dependent variable from the suite of independent variables that provides the best balance between model performance and model complexity. However, rather than directly estimating the category of the response variable at a given value of the predictor, for logistic regression, you estimate the probability that an observation has a code of 1. Specifically, this tool predicts the log odds that an observation will be a 1 (probability = P) rather than a 0 (probability = 1 - P). Because all of the y values in your dataset are either 0 or 1, scatterplots such as the one shown in the description of linear regression are not very useful for binary data. Instead, you might do a histogram of the frequency of "1s" in your dataset, and a curve fit to your graph would have an expected shape similar to the logistic curve shown below.

A general equation for logistic regression

In SpaceStat' logistic regression, the relationship between the independent variable(s) and the probability that observations of your dependent variable will be coded 1 is described with a logistic curve of the form shown above. Note that the regression coefficients, which are the parameters that you are trying to estimate during the regression procedure, are part of a power function that controls the shape of the logistic curve. SpaceStat uses maximum likelihood estimation to work with this transformed variable (transformed from the 1s and 0s to a logit variable - the natural log of the odds of the dependent variable being coded 1).

Click here to compare this formula to the one for linear regression.

Output for logistic regression

Similar to linear regression, SpaceStat output for logistic regression includes calculation of two pseudo R-squared values that describe the strength of the model as a whole, and significance values (p-values) for the whole model, and for individual terms in the model. The output is described in more detail here, including examples from the SpaceStat log view.

To find out more about how logistic regression is implemented in SpaceStat, click here.

To skip the details and learn about how to perform aspatial logistic regression, click here.