[tech-spec] multicollinearity

  • From: tim hesselsweet <tim_hess1@xxxxxxxxx>
  • To: tech-spec@xxxxxxxxxxxxx
  • Date: Thu, 24 Feb 2005 21:56:39 -0800 (PST)

pairwise plots of variables are good for assessing
relationship between response and potential predictor
as well as pairwise correlations among predictors. 
but only expert chart readers (sogi, tufte) can spot
multivariate collinearity.  splus computes everything
you need to assess collinearity problems but there's
no single function one can call.  these two approaches
fit the bill:

VIF runs regressions with each of the explanatory
variables as a response. VIF values > 4 indicate a
problem.  the argument to the function is the set of
predictors.  you would also have to augment for
additional variables.  

> InflVIF <- function(dataf) {
+ fit1 <- lm(minh ~ minat)
+ sum1 <- summary(fit1)
+ r1 <- sum1$r.squared
+ vif1 <- 1/(1-r1)
+ fit2 <- lm(minat ~ minh)
+ sum2 <- summary(fit2)
+ r2 <- sum2$r.squared
+ vif2 <- 1/(1-r2)
+ vif <- data.frame(vif1,vif2)
+ print.data.frame(vif)
+ }

> InflVIF(q1.df)
     vif1    vif2 
1 1.03781 1.03781
     vif1    vif2 
1 1.03781 1.03781

Condition number greater than 15 indicates problem and
greater than 30 is big problem.  argument to function
is set of predictors. 

> InflConditionNum <- function(dataf) {
+ r <- cor(dataf)
+ eigenx <- eigen(r)
+ eigenval <- eigenx$value
+ lemdamax <- max(eigenval)
+ lemdamin <- min(eigenval)
+ conditionnum <- lemdamax/lemdamin
+ return(conditionnum)
+ }

> new.df <- data.frame(minh,minat)
> InflConditionNum(new.df)
[1] 1.471798

tim
 




   



                
__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - 250MB free storage. Do more. Manage less. 
http://info.mail.yahoo.com/mail_250

Other related posts:

  • » [tech-spec] multicollinearity