This term, I’ve been sitting in on Rene Carmona’s course on Modern Regression and Time Series Analysis. Much of the material on regression covered in the course was familiar to me already, but I’ve never felt that I had a real command of times series analysis methods.

When Carmona defined the AR(p) model in class a few weeks ago, it struck me that, though I’d seen the defining equation several times before, I’d never realized earlier that the AR(p) model subsumes all possible linear recurrence relations. Also, the AR(p) model has the nice property that, if you already know the correct value of p, fitting the AR(p) model can be done with an ordinary least squares regression.

With these observations in mind, I decided to see how well I could derive the formulas for simple recurrence relations from a small data set. The results I got on my 2.4 GHz Intel Core 2 Duo MacBook are a useful case study in the dangers of naively using the default methods for fitting AR(p) models, as well as a particularly clear example of the inevitable inaccuracies in floating point arithmetic.

I hope John D. Cook will forgive me for using the Fibonacci sequence as my example. While I totally agree with John that the Fibonacci sequence is not the ideal object to study if you’re interested in day-to-day programming tasks, its simplicity makes it perfect for understanding how recurrence relations work.

The workhouse for fitting an AR(p) model in R is, predictably, the `ar`

function. To see how well it would work for my purposes, I stored the first 15 Fibonacci terms in a vector and ran `ar`

using all of its defaults settings. Here’s the results:

1 2 3 4 5 6 7 8 9 10 11 12 | fibs <- c(1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610) ar(fibs) #Call: #ar(x = fibs) # #Coefficients: # 1 #0.5922 # #Order selected 1 sigma^2 estimated as 21590 |

These results are pretty terrible: the order for the model is chosen to be 1, which is clearly wrong. Given the wrong order, it’s no surprise that the estimated coefficient is off, though it’s strange that the result is so far off from the ideal coefficient for an order 1 model, which is `(1 + sqrt(5)) / 2`

, or 1.618034. Thankfully, you can force `ar`

to use the order you want by overriding some of the defaults:

1 2 3 4 5 6 7 8 9 10 | ar(fibs, order.max = 2, aic = FALSE) #Call: #ar(x = fibs, aic = FALSE, order.max = 2) # #Coefficients: # 1 2 # 0.6108 -0.0315 # #Order selected 2 sigma^2 estimated as 23366 |

You choose your preferred order using `order.max`

, but this will only be an upper bound if you allow the function to use AIC scores to determine the order of the AR(p) model.

To figure out what was going wrong, I decided to use `lm`

instead of `ar`

. To do that, I needed subsets of my input data:

1 2 3 | fibs.1 <- fibs[1:(length(fibs) - 2)] fibs.2 <- fibs[2:(length(fibs) - 1)] fibs.3 <- fibs[3:length(fibs)] |

Given these subset inputs, the call to `lm`

is simple:

1 2 3 4 5 6 7 8 | lm(fibs.3 ~ fibs.1 + fibs.2) #Call: #lm(formula = fibs.3 ~ fibs.1 + fibs.2) # #Coefficients: #(Intercept) fibs.1 fibs.2 # 2.365e-14 1.000e+00 1.000e+00 |

As you can see, `lm`

gets the right results, more or less. The non-zero intercept value is unfortunate, but suggests how easily floating point errors slip into these calculations.

Having gotten good results with `lm`

, I decided to review `ar`

a bit more: this led me to the conclusion that I should try using the `method = 'ols'`

setting instead of `method = 'yule-walker'`

.

1 2 3 4 5 6 7 8 9 10 11 12 | ar(fibs, order.max = 2, method = 'ols') #Call: #ar(x = fibs, order.max = 2, method = "ols") # #Coefficients: #1 2 #1 1 # #Intercept: 106.4 (6.715e-07) # #Order selected 2 sigma^2 estimated as 6.276e-17 |

This clearly works, though I find the output line about the intercept term confusing. I have to say that I’m a little surprised that the Yule-Walker method gives such bad results in this example: I’m not sure yet whether this is caused by the small sample size, a data set that can be fit without any error, intrinsic problems with the method, or something else I can’t even conceive of.

Knowing that `ar`

could work if the OLS method was enforced, I decided to try letting the AIC have its way again after reintroducing this method level constraint:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | ar(fibs, method = 'ols') #Call: #ar(x = fibs, method = "ols") # #Coefficients: #1 2 #1 1 # #Intercept: 106.4 (6.715e-07) # #Order selected 2 sigma^2 estimated as 6.276e-17 #Warning message: #In ar.ols(x, aic = aic, order.max = order.max, na.action = na.action, : # model order: 3singularities in the computation of the projection # matrixresults are only valid up to model order2 |

As you can see, this also works, though there’s an error that I assume is a result of having input data that can be perfectly fit. In short, I think the take away lesson is that you can easily find the formula for recurrence relations using `ar`

as long as you make sure you use ordinary least squares for fitting the various possible models.

Just to confirm that the OLS method would also find the ideal coefficient if forced to use an order 1 model, I ran `ar`

one last time:

1 2 3 4 5 6 7 8 9 10 11 12 | ar(fibs, order.max = 1, method = 'ols') #Call: #ar(x = fibs, order.max = 1, method = "ols") # #Coefficients: # 1 #1.6182 # #Intercept: 65.74 (0.05852) # #Order selected 1 sigma^2 estimated as 0.04309 |

That result is satisfying and further confirms that the problems I had at the start are entirely attributable to using the Yule-Walker method with this data set.

[EDIT 12.11.2009: Replaced unfortunate term ‘sparse’ with non-overloaded word ‘small’.]

I forgive you. :-)

You’re putting the Fibonacci sequence to good use.

The problem is that the recurrence relation is non-stationary, but the yw method assumes stationarity. So the coefficients can’t take the correct values. But using lm, you have no stationarity condition imposed and the coefficients are to take the correct values.

The intercept issue arises because ar() fits the model to (y-m) where m is the mean of the data. As the mean of fibs is 106.4, that is added back in as the intercept when method ols is used. See the help file on ar().

Thanks for such a complete explanation, Rob. I had foolishly assumed that the sequence would be demeaned and then manipulated to induce stationarity as preprocessing steps.

Now, what are the odds that this post was a project for one of my tutorees last year?

The intercept is a bit tricky in R. Other issues to pay attention when fitting time series model can be found here,

> http://www.stat.pitt.edu/stoffer/tsa2/Rissues.htm