/* Introduction to Time Series Analysis, DG ECFIN Laura Mayoral November 2021 */ /******************Regression with autocorrelated errors************** We're going to use the "Orange juice dataset", see Stock and Watson, Chapter 15. You can find this data in Stata format here: http://fmwww.bc.edu/ec-p/data/stockwatson/ *Goal: Analyze the effect of the number of freezing days in Florida on ***********************************************************************/ *Replace by your own path cd "/Users/lauramayoral/Dropbox/docum_dropbox/clases/ECFIN_time series/STATA/Lecture 2" *Load the data use oj.dta,clear *Declare your data is a time series. Data is monthly tsset date, monthly /*three variables: finished: overall price index; frozen: price index of orange juice; frzdys: number of freezing days in florida, each month The orange juice price data are the frozen orange juice component of processed foods and feeds group of the producer price index (PPI), collected by the U.S. Bureau of Labor Statistics (BLS series wpu02420301). The orange juice price series was divided by the overall PPI for finished goods to adjust for general price inflation. The freezing degree days series was constructed from daily minimum temperatures recorded at Orlando area airports, obtained from the National Oceanic and Atmospheric Administration (NOAA) of the U.S. Department of Commerce. */ *generate the adjusted price index gen price=frozen/finished *Step 1: plot the data, ALL VARIABLES in the regression should be stationary! tsline price ac price pac price *doesn't look stationary. Let's compute the pecentage change in price over that month: *%change price=100*(1-L)log(price) gen change_price=100*(log(price)-log(l.price)) *Plot variable, autocorrelation and partial autocorrelation function tsline change_price ac change_price pac change_price *Now, we plot the other variable: looks fine tsline frzdys ac frzdys pac frzdys *simple regression reg change_price frzdys,r /* Source | SS df MS Number of obs = 641 -------------+---------------------------------- F(1, 639) = 59.17 Model | 1366.56524 1 1366.56524 Prob > F = 0.0000 Residual | 14758.5978 639 23.0963971 R-squared = 0.0847 -------------+---------------------------------- Adj R-squared = 0.0833 Total | 16125.163 640 25.1955672 Root MSE = 4.8059 ------------------------------------------------------------------------------ change_price | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- frzdys | .4434407 .0576491 7.69 0.000 .3302361 .5566453 _cons | -.3996379 .1931161 -2.07 0.039 -.7788568 -.020419 ------------------------------------------------------------------------------ */ *same regression, now robust to heteroskedasticity in the residuals reg change_price frzdys,r /* Linear regression Number of obs = 641 F(1, 639) = 11.30 Prob > F = 0.0008 R-squared = 0.0847 Root MSE = 4.8059 ------------------------------------------------------------------------------ | Robust change_price | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- frzdys | .4434407 .1319075 3.36 0.001 .1844162 .7024652 _cons | -.3996379 .1841249 -2.17 0.030 -.7612009 -.0380748 ------------------------------------------------------------------------------ **Interpretation: an additional freezing degree day increases the price of orange juice concentrate over that month by 0.44% */ *is this enough? take a look at the residuals predict e, res *residuals look autocorrelated! ac e *We can test for this: Ljung-Box Q test wntestq e /*We test: sum of the first 40 autocorrelations square is equal to zero. We can reject e is white noise: Portmanteau test for white noise --------------------------------------- Portmanteau (Q) statistic = 64.9873 Prob > chi2(40) = 0.0075 */ *Implications: *1. OLS is still consistent (provided frzdys is exogenous!) *2. The standard errors in the previous table are incorrect. *Newey-West Standard errors************** newey change_price frzdys,lag(14) *NOTE: these standard errors are robust to autocorrelation AND heteroskedasticity /* ------------------------------------------------------------------------------ | Newey–West change_price | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- frzdys | .4434407 .1323339 3.35 0.001 .1835789 .7033026 _cons | -.3996379 .194574 -2.05 0.040 -.7817195 -.0175562 ------------------------------------------------------------------------------ */ **What if we include lags in the regression? * newey change_price frzdys l.frzdys ,lag(14) /* Regression with Newey–West standard errors Number of obs = 641 Maximum lag = 14 F( 2, 638) = 10.86 Prob > F = 0.0000 ------------------------------------------------------------------------------ | Newey–West change_price | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- frzdys | --. | .4399711 .1332496 3.30 0.001 .1783103 .7016319 L1. | .1389032 .0799698 1.74 0.083 -.0181327 .295939 | _cons | -.4830954 .19776 -2.44 0.015 -.8714346 -.0947562 ------------------------------------------------------------------------------ */ *we still need HAC standard errors: predict e2, res wntestq e2