Showing posts with label ESRI. Show all posts
Showing posts with label ESRI. Show all posts

Saturday, May 30, 2009

ESRI Spatial Statistics - Six tests to perform on OLS results

I've been quite busy lately - so I haven't had time to post. I hope that this review of spatial statistics will make up for my recent absenteeism!

Earlier this week, I attended a Spatial Statistics seminar hosted by ESRI. What I expected to be a veiled attempt to get users to buy more ESRI extensions was really a review of functions already readily available in ArcView. I was pleasantly surprised and felt that the seminar was worthwhile.

Dr. Lauren Scott was the speaker. She works for ESRI Redlands and has a passion for spatial statistics. It's always refreshing to hear a speaker who has a passion for what she does.

The core spatial tools that were presented were the standard deviation ellipse, hot spot analysis and regression. Most of the seminar focused on spatial regression including ordinary least square (OLS) and geographical weighted regression.

Dr. Scott went over the six quick tests to perform to know if your OLS model is complete; that is, if the model is explaining the dependant variable in the most effective way possible. Once you perform OLS on a variable, these are the six things to examine.

a) coefficients have the correct sign. If the relationship between the explanatory variable and the dependant variable is positive, the coefficient for that explanatory variable should be positive in the final results table. If it is not, the results should be checked.

b) all variables are statistically significant (both probability and robust probability). If the variables are not statistically significant, they should be removed from the analysis because they are not working to explain the dependant variable. The Koenker (BP) test should be significant - this will mean that geographically weighted regression can be performed on the data.

c) the VIF should be below 7.5. If VIF is above 7.5, this means that some of the explanatory variables are redundant and there could be variables that are being double counted in the analysis. The lower the VIF, the better.

d) the adjusted R2 should be high (the closer to one the better), the AIC should be low (the AIC allows multiple models that have the same dependent variable to be compared to one another - if you have two models that for the same dependent variable, the one with the lower AIC is the better model).

e) Jarques-Bera test should NOT be statistically significant. If this value is statistically significant, you are missing an explanatory variable in the analysis.

f) the residuals from the regression should have a random distribution. The more clustered these variables, the poorer the model. Clustered residuals point to explanatory variables that are missing in the analysis.