Exemplo de dados da produção mensal média por vaca nos EUA, 01/1994 - 12/2005.
data(milk, package='TSA')
par(mfrow=c(1,1), mar=c(3,3,3,1), mgp=c(1.6,.6,0), pch=19)
plot(milk,xlab='Anos',main='Produção mensal média por vaca nos EUA entre 1994 e 2005',
ylab='Mililitro (mL)',type='l')
points(milk, cex = 0.5)
grid()
par(mfrow=c(1,1), mar=c(4,4,3,1), mgp=c(1.6,.6,0), pch=19)
plot(acf(c(milk),plot=F)[1:22],xlab='Defasagem',main=' ',ylab='Autocorrelação')
grid()
Observamos a evidente interferencia da tendência crescente desta série.
par(mfrow=c(1,1), mar=c(3,3,3,1), mgp=c(1.6,.6,0), pch=19)
plot(milk,xlab='Anos',main='Produção mensal média por vaca nos EUA entre 1994 e 2005',
ylab='Mililitro (mL)',type='l',ylim=c(1200,1750))
points(milk, cex = 0.5)
grid()
tempo=time(milk)
tendencia=ts(fitted(loess(milk~tempo)), start=c(1994,1),frequency=12)
lines(tendencia, col='turquoise', lwd=2)
leite=milk-tendencia
par(mfrow=c(1,1), mar=c(3,3,3,1), mgp=c(1.6,.6,0), pch=19)
plot(leite,xlab='Anos',main='Produção ajustada mensal média por vaca nos EUA entre 1994 e 2005',
ylab='Observado-TendĂŞncia (mL)',type='l')
points(leite, cex = 0.5)
abline(h = mean(leite), col = "red", lwd = 4)
grid()
par(mfrow=c(1,1), mar=c(3,3,3,1), mgp=c(1.6,.6,0), pch=19)
plot(acf(c(leite),plot=F)[1:22],xlab='Defasagem',main=' ',ylab='Autocorrelação')
grid()
De uma outra maneira.
tempo = time(milk)
modelo1 = lm(milk ~ tempo)
seg.milk = segmented::segmented(milk, seg.Z = ~ tempo, psi = c(1997,2001,2005))
summary(seg.milk)
##
## ***Regression Model with Segmented Relationship(s)***
##
## Call:
## segmented.numeric(obj = obj, seg.Z = seg.Z, psi = psi, npsi = npsi,
## fixed.psi = fixed.psi, control = control, model = model,
## keep.class = FALSE)
##
## Estimated Break-Point(s):
## Est. St.Err
## psi1.tempo 1997.833 1.372
## psi2.tempo 2000.025 1.061
## psi3.tempo 2004.750 0.902
##
## Coefficients of the linear terms:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -41009.269 15578.844 -2.632 0.00946 **
## tempo 21.247 7.806 2.722 0.00734 **
## U1.tempo 20.479 19.039 1.076 NA
## U2.tempo -25.649 18.264 -1.404 NA
## U3.tempo 40.403 46.939 0.861 NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 58.57 on 136 degrees of freedom
## Multiple R-Squared: 0.7054, Adjusted R-squared: 0.6902
##
## Boot restarting based on 10 samples. Last fit:
## Convergence attained in 3 iterations (rel. change 2.458e-06)
segmented::slope(seg.milk)
## $tempo
## Est. St.Err. t value CI(95%).l CI(95%).u
## slope1 21.247 7.8055 2.7220 5.8108 36.683
## slope2 41.725 17.3660 2.4028 7.3840 76.067
## slope3 16.077 5.6584 2.8412 4.8870 27.266
## slope4 56.479 46.5970 1.2121 -35.6680 148.630
segmented::intercept(seg.milk)
## $tempo
## Est.
## intercept1 -41009
## intercept2 -81922
## intercept3 -30624
## intercept4 -111620
par(mar=c(3,3,1,1)+.5, mgp=c(1.6,.6,0))
segmented::plot.segmented(seg.milk, lwd=4, xlab = "", col = "red",ylim=c(1200,1750))
points(milk, cex = 0.5)
lines(milk)
grid()
Retirando a tendência paramétrica:
leite = milk - seg.milk$fitted.values
forecast::auto.arima(leite)
## Series: leite
## ARIMA(1,0,2)(2,1,2)[12]
##
## Coefficients:
## ar1 ma1 ma2 sar1 sar2 sma1 sma2
## 0.5612 0.1324 0.2841 0.0821 -0.3762 -1.1067 0.5658
## s.e. 0.1510 0.1640 0.1344 0.1929 0.1231 0.2181 0.1999
##
## sigma^2 = 119.5: log likelihood = -510.14
## AIC=1036.27 AICc=1037.44 BIC=1059.33
A função auto.arima() do pacote forecast no
R automatiza a seleção de parâmetros para modelos ARIMA (SARIMA),
escolhendo o melhor modelo baseado no critério AIC, AICc ou BIC. Ela
seleciona automaticamente a diferenciação e ajusta para sazonalidade,
sendo ideal para modelagem rápida e previsões de séries temporais
univariadas.
Escolhido automaticamente o modelo vamos verificar a qualidade do ajuste.
astsa::sarima(leite,1,0,2,2,1,2,12)
## initial value 3.126022
## iter 2 value 2.809170
## iter 3 value 2.538438
## iter 4 value 2.477215
## iter 5 value 2.467607
## iter 6 value 2.453530
## iter 7 value 2.451350
## iter 8 value 2.450344
## iter 9 value 2.449855
## iter 10 value 2.449791
## iter 11 value 2.449455
## iter 12 value 2.449014
## iter 13 value 2.448766
## iter 14 value 2.448526
## iter 15 value 2.448497
## iter 16 value 2.448390
## iter 17 value 2.448335
## iter 18 value 2.448307
## iter 19 value 2.448305
## iter 20 value 2.448305
## iter 21 value 2.448305
## iter 22 value 2.448304
## iter 23 value 2.448304
## iter 23 value 2.448304
## iter 23 value 2.448304
## final value 2.448304
## converged
## initial value 2.480765
## iter 2 value 2.476010
## iter 3 value 2.467003
## iter 4 value 2.463668
## iter 5 value 2.459197
## iter 6 value 2.452679
## iter 7 value 2.446282
## iter 8 value 2.444528
## iter 9 value 2.444438
## iter 10 value 2.444155
## iter 11 value 2.444145
## iter 12 value 2.444143
## iter 13 value 2.444142
## iter 14 value 2.444140
## iter 15 value 2.444139
## iter 16 value 2.444139
## iter 16 value 2.444139
## iter 16 value 2.444139
## final value 2.444139
## converged
## <><><><><><><><><><><><><><>
##
## Coefficients:
## Estimate SE t.value p.value
## ar1 0.5578 0.1521 3.6677 0.0004
## ma1 0.1320 0.1648 0.8010 0.4247
## ma2 0.2820 0.1348 2.0924 0.0384
## sar1 0.0750 0.1904 0.3940 0.6943
## sar2 -0.3821 0.1226 -3.1159 0.0023
## sma1 -1.0962 0.2138 -5.1282 0.0000
## sma2 0.5638 0.1998 2.8215 0.0056
## constant 0.0597 0.0935 0.6377 0.5248
##
## sigma^2 estimated as 112.8529 on 124 degrees of freedom
##
## AIC = 7.862518 AICc = 7.871387 BIC = 8.059073
##
Uma outra forma de veriicarmos os resĂduos Ă© observando a função original:
modelo = arima(leite, order = c(1,0,2), seasonal = list(order = c(2,1,2), period = 12))
par(mar=c(3,3,1,1)+.5, mgp=c(1.6,.6,0))
tsdiag(modelo)
prev.milk = astsa::sarima.for(leite, n.ahead = 12, p = 1, d = 0, q = 2, P = 2, D = 1, Q = 2,
S = 12, no.constant = TRUE, pch =19)
grid()
Nesta situação consideramos que o comportamento futuro deve permanecer similar ao observado até o momento, sendo assim:
prev.milk.org = ts(c(leite,prev.milk$pred), start = c(1994,1), frequency = 12)
novo.tempo = time(prev.milk.org)
tendencia = c(segmented::intercept(seg.milk)$tempo[1]
+ segmented::slope(seg.milk)$tempo[1]*novo.tempo[novo.tempo < seg.milk$psi[1,2]],
segmented::intercept(seg.milk)$tempo[2]
+ segmented::slope(seg.milk)$tempo[2]*novo.tempo[seg.milk$psi[1,2] < novo.tempo
& novo.tempo < seg.milk$psi[2,2]],
segmented::intercept(seg.milk)$tempo[3]
+ segmented::slope(seg.milk)$tempo[3]*novo.tempo[seg.milk$psi[2,2] < novo.tempo
& novo.tempo < seg.milk$psi[3,2]],
segmented::intercept(seg.milk)$tempo[4]
+ segmented::slope(seg.milk)$tempo[4]*novo.tempo[seg.milk$psi[3,2] < novo.tempo])
tendencia.prev = ts(tendencia, start = c(1994,1), frequency = 12)
milk.prev = ts(c(milk,prev.milk$pred+tendencia.prev[novo.tempo>=2006]), start = c(1994,1), frequency = 12)
par(mfrow=c(1,1), mar=c(3,3,3,2)+.5, mgp=c(1.6,.6,0))
plot(milk.prev,
main = "Produção ajustada mensal média por vaca nos EUA entre 1994 e 2005 \n valores previstos para 2006",
ylab = "Mililitro (mL)", xlab = "", type = "l")
lines(tendencia.prev, col ="red")
U = milk.prev[novo.tempo>=2006]+qnorm(0.975)*prev.milk$se
L = milk.prev[novo.tempo>=2006]+qnorm(0.025)*prev.milk$se
xx = c(time(U), rev(time(U))); yy = c(L, rev(U))
polygon(xx, yy, border = 8, col = gray(.8, alpha = .2))
points(milk.prev, cex = 0.5, pch=19)
grid()
Valores preditos:
milk.prev[novo.tempo>=2006]
## [1] 1707.733 1590.783 1766.701 1733.380 1789.173 1710.839 1715.028 1697.718
## [9] 1636.824 1686.118 1645.335 1723.194
Nesta situação consideramos que a tendência deve permanecer similar ao ano de 2005, sendo assim:
prev.milk.org = ts(c(leite,prev.milk$pred), start = c(1994,1), frequency = 12)
novo.tempo = time(prev.milk.org)
tendencia = c(segmented::intercept(seg.milk)$tempo[1]
+ segmented::slope(seg.milk)$tempo[1]*novo.tempo[novo.tempo < seg.milk$psi[1,2]],
segmented::intercept(seg.milk)$tempo[2]
+ segmented::slope(seg.milk)$tempo[2]*novo.tempo[seg.milk$psi[1,2] < novo.tempo
& novo.tempo < seg.milk$psi[2,2]],
segmented::intercept(seg.milk)$tempo[3]
+ segmented::slope(seg.milk)$tempo[3]*novo.tempo[seg.milk$psi[2,2] < novo.tempo
& novo.tempo < seg.milk$psi[3,2]],
segmented::intercept(seg.milk)$tempo[4]
+ segmented::slope(seg.milk)$tempo[4]*novo.tempo[seg.milk$psi[3,2] < novo.tempo
& novo.tempo < 2006],
rep(segmented::intercept(seg.milk)$tempo[4]
+ segmented::slope(seg.milk)$tempo[4]*novo.tempo[novo.tempo == 2006],
length(novo.tempo>=2006)))
tendencia.prev = ts(tendencia, start = c(1994,1), frequency = 12)
milk.prev = ts(c(milk,prev.milk$pred+tendencia.prev[novo.tempo>=2006]), start = c(1994,1), frequency = 12)
par(mfrow=c(1,1), mar=c(3,3,3,2)+.5, mgp=c(1.6,.6,0))
plot(milk.prev,
main = "Produção ajustada mensal média por vaca nos EUA entre 1994 e 2005 \n valores previstos para 2006",
ylab = "Mililitro (mL)", xlab = "", type = "l")
lines(tendencia.prev, col ="red")
U = milk.prev[novo.tempo>=2006]+qnorm(0.975)*prev.milk$se
L = milk.prev[novo.tempo>=2006]+qnorm(0.025)*prev.milk$se
xx = c(time(U), rev(time(U))); yy = c(L, rev(U))
polygon(xx, yy, border = 8, col = gray(.8, alpha = .2))
points(milk.prev, cex = 0.5, pch=19)
grid()
Valores preditos:
milk.prev[novo.tempo>=2006]
## [1] 1707.733 1586.076 1757.288 1719.260 1770.347 1687.306 1686.788 1664.772
## [9] 1599.171 1643.758 1598.269 1671.422