Regression in Vesta
Regression methods are a set of tools for assessing variation in one variable (the dependent variable, y) at set levels of another variable or variables (independent, or x variables). Unlike measures of correlation, like those that also accompany the scatter plots in Vesta, these tools assume that there is a functional dependence of values of the dependent variable on the level of the independent variable(s).
Currently Vesta is limited to fitting aspatial linear regression models for continuous independent variables, and to determine their relative importance in predicting y. Aspatial regression means that the geographical coordinates of the observations are ignored in the analysis.
Aspatial Linear Regression
In traditional linear regression, a statistical model is fit to a set of N observations such that a dependent variable y can be expressed in terms of one or more independent variables, and a residual, or error, term. Assumptions of linear regression models include:
- Independent observations
- Normally distributed variables, and
- Homoscedasticity, or similar variances in the dependent variable across different values or levels of the independent variable(s)
The figure below shows a dataset of N =23 points, plotted so that y, the variable we would like to be able to predict, is shown on the vertical axis, and a single independent variable (x) is shown on the horizontal axis. The goal of the linear regression modeling exercise is to find the linear function that provides the best prediction of y; that is, it produces the smallest errors, measured as the squared difference between the observed value of y, and the value of y from the regression line at the same value of x.
Note that the term "linear" refers to the linear combination of parameters in the model, and that the graph that results from a linear regression model does not have to be a straight line.
Regression Equation
In the linear regression model, the independent variable observations are regarded as fixed, and all other variables (y, and the error term) are considered random. One way of expressing a linear regression model with an unspecified number of independent variables is shown below. The x part of the βjxij component drops out when j =0, leading to the y-intercept, β0.
- i = 1 .... N (number of observations)
- j = 0 ... M (number of independent variables)
- yi = dependent variable
- βj = regression coefficients
- xij = the j'th variable at observation i
- εi = residual variable
Regression Process
- Select "Regression" in the Methods panel
- Select "Start" to open the regression dialog
- In the "Dataset" field, select the desired dataset from the Data panel you wish to visualize
- In the "Dependent Variable" field, select the variable you wish to predict
- In the "Potentially Independent variable" field, left click to highlight the variable you want to use as predictor
- Left click on the + icon to add this variable to the list of “Independent variables”
- Repeat for each variable you want to include in the regression model
- Select "Run" to run the regression analysis
- Select “Switch Method” to choose another method
- If you selected “Run”, check the Data panel to find the Regression Folder with the results (estimated mean, standard error, and report).
Regression Output
Vesta output from linear regression is saved in a Regression Folder under the Data panel and includes:
- Two new variables (estimated mean and standard error for each observation).
- A report that lists the R-squared and the value of each of the regression coefficients and the intercept.
Note that for datasets with both spatial (geographic) and temporal (time) components, the regression model is fitted for each time interval and the corresponding regression coefficients and R-squared will be reported.