Regression in Vesta

Regression methods are a set of tools for assessing variation in one variable (the dependent variable, y) at set levels of another variable or variables (independent, or x variables). Unlike measures of correlation, like those that also accompany the scatter plots in Vesta, these tools assume that there is a functional dependence of values of the dependent variable on the level of the independent variable(s).

Currently Vesta is limited to fitting aspatial linear regression models for continuous independent variables, and to determine their relative importance in predicting y. Aspatial regression means that the geographical coordinates of the observations are ignored in the analysis.

Aspatial Linear Regression

In traditional linear regression, a statistical model is fit to a set of N observations such that a dependent variable y can be expressed in terms of one or more independent variables, and a residual, or error, term. Assumptions of linear regression models include:

Independent observations
Normally distributed variables, and
Homoscedasticity, or similar variances in the dependent variable across different values or levels of the independent variable(s)

The figure below shows a dataset of N =23 points, plotted so that y, the variable we would like to be able to predict, is shown on the vertical axis, and a single independent variable (x) is shown on the horizontal axis. The goal of the linear regression modeling exercise is to find the linear function that provides the best prediction of y; that is, it produces the smallest errors, measured as the squared difference between the observed value of y, and the value of y from the regression line at the same value of x.

Note that the term "linear" refers to the linear combination of parameters in the model, and that the graph that results from a linear regression model does not have to be a straight line.

Screenshot

Regression Equation

In the linear regression model, the independent variable observations are regarded as fixed, and all other variables (y, and the error term) are considered random. One way of expressing a linear regression model with an unspecified number of independent variables is shown below. The x part of the β_jx_ij component drops out when j =0, leading to the y-intercept, β₀.

Screenshot

i = 1 .... N (number of observations)
j = 0 ... M (number of independent variables)
y_i = dependent variable
β_j = regression coefficients
x_ij = the j'th variable at observation i
ε_i = residual variable

Regression Process

Select "Regression" in the Methods panel
Select "Start" to open the regression dialog
In the "Dataset" field, select the desired dataset from the Data panel you wish to visualize
In the "Dependent Variable" field, select the variable you wish to predict
In the "Potentially Independent variable" field, left click to highlight the variable you want to use as predictor
Left click on the + icon to add this variable to the list of “Independent variables”
- Repeat for each variable you want to include in the regression model
Select "Run" to run the regression analysis
- Select “Switch Method” to choose another method
If you selected “Run”, check the Data panel to find the Regression Folder with the results (estimated mean, standard error, and report).

Regression Output

Vesta output from linear regression is saved in a Regression Folder under the Data panel and includes:

Two new variables (estimated mean and standard error for each observation).
A report that lists the R-squared and the value of each of the regression coefficients and the intercept.

Note that for datasets with both spatial (geographic) and temporal (time) components, the regression model is fitted for each time interval and the corresponding regression coefficients and R-squared will be reported.