R Language for Mixed Frequency Data Analysis--Example Analysis

Catalog

Example:

R Code Implementation

Loading packages

Generate qualified random numbers

Weight assignment: Exponential Almon polynomial constraint consistency factor

Low Frequency Sequence Simulation (e.g. Year)

MIDAS Regression Example Monthly and Quarterly Data Converted to Same Frequency

Linear model based on least squares

Mixed Regression Based on Unconstrained

Nonlinear Estimation Based on midas_r

Convergence Test

Other Weighting Forms

Sufficiency Test of Constraints

Optimal Model Selection

Select model manually

Example:

Based on the above functions and data requirements, an example of mixing regression is implemented using R code.The code is as follows:

R Code Implementation

Loading packages

library(midasr)

Generate qualified random numbers

set.seed(1001)  # Set random number seed
n <- 250#Low Frequency Data Sample Capacity
trend <- c(1:n)#Linear trend
x <- rnorm(4 * n)#High frequency explanatory variables, eg: quarterly data
z <- rnorm(12 * n)#High frequency explanatory variables, eg: monthly data

Weight assignment: Exponential Almon polynomial constraint consistency factor

Common forms of weighting functions:

#Weight assignment: exponential Almon lag polynomial nealmon
fn_x <- nealmon(p = c(1, -0.5), d = 8)
fn_z <- nealmon(p = c(2, 0.5, -0.1), d = 17)

Low Frequency Sequence Simulation (e.g. Year)

y <- 2 + 0.1 * trend + mls(x, 0:7, 4) %*% fn_x + mls(z, 0:16, 12) %*% fn_z + rnorm(n)

Plotting coefficient scatterplots

plot(fn_z, col = "red")
points(fn_x,col = "blue")

MIDAS Regression Example Monthly and Quarterly Data Converted to Same Frequency

Linear model based on least squares

eq_u1 <- lm(y ~ trend + mls(x, k = 0:7, m = 4) + mls(z, k = 0:16, m = 12))
summary(eq_u1)

eq_u1 results:

Call:
lm(formula = y ~ trend + mls(x, k = 0:7, m = 4) + mls(z, k = 0:16, 
    m = 12))

Residuals:
    Min      1Q  Median      3Q     Max 
-2.2651 -0.6489  0.1073  0.6780  2.7707 

Coefficients:
                                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)                     1.9694327  0.1261210  15.615  < 2e-16 ***
trend                           0.1000072  0.0008769 114.047  < 2e-16 ***
mls(x, k = 0:7, m = 4)X.0/m     0.5268124  0.0643322   8.189 2.07e-14 ***
mls(x, k = 0:7, m = 4)X.1/m     0.3782006  0.0641497   5.896 1.38e-08 ***
mls(x, k = 0:7, m = 4)X.2/m     0.1879689  0.0680465   2.762 0.006219 ** 
mls(x, k = 0:7, m = 4)X.3/m    -0.0052409  0.0658730  -0.080 0.936658    
mls(x, k = 0:7, m = 4)X.4/m     0.1504419  0.0627623   2.397 0.017358 *  
mls(x, k = 0:7, m = 4)X.5/m     0.0104345  0.0655386   0.159 0.873647    
mls(x, k = 0:7, m = 4)X.6/m     0.0698753  0.0692803   1.009 0.314270    
mls(x, k = 0:7, m = 4)X.7/m     0.1463317  0.0650285   2.250 0.025412 *  
mls(z, k = 0:16, m = 12)X.0/m   0.3671055  0.0618960   5.931 1.14e-08 ***
mls(z, k = 0:16, m = 12)X.1/m   0.3502401  0.0599615   5.841 1.83e-08 ***
mls(z, k = 0:16, m = 12)X.2/m   0.4514656  0.0659546   6.845 7.37e-11 ***
mls(z, k = 0:16, m = 12)X.3/m   0.3733747  0.0577702   6.463 6.42e-10 ***
mls(z, k = 0:16, m = 12)X.4/m   0.3609667  0.0700954   5.150 5.75e-07 ***
mls(z, k = 0:16, m = 12)X.5/m   0.2155748  0.0632036   3.411 0.000769 ***
mls(z, k = 0:16, m = 12)X.6/m   0.0648163  0.0630194   1.029 0.304828    
mls(z, k = 0:16, m = 12)X.7/m   0.0665581  0.0673284   0.989 0.323955    
mls(z, k = 0:16, m = 12)X.8/m  -0.0014853  0.0624569  -0.024 0.981048    
mls(z, k = 0:16, m = 12)X.9/m   0.0466486  0.0675116   0.691 0.490305    
mls(z, k = 0:16, m = 12)X.10/m  0.0384882  0.0664136   0.580 0.562824    
mls(z, k = 0:16, m = 12)X.11/m -0.0077722  0.0591409  -0.131 0.895564    
mls(z, k = 0:16, m = 12)X.12/m -0.0283221  0.0620632  -0.456 0.648589    
mls(z, k = 0:16, m = 12)X.13/m -0.0375062  0.0608348  -0.617 0.538179    
mls(z, k = 0:16, m = 12)X.14/m  0.0297271  0.0652273   0.456 0.649018    
mls(z, k = 0:16, m = 12)X.15/m  0.0184075  0.0588059   0.313 0.754558    
mls(z, k = 0:16, m = 12)X.16/m -0.0546460  0.0677214  -0.807 0.420574    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9383 on 222 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.9855,	Adjusted R-squared:  0.9838 
F-statistic: 579.3 on 26 and 222 DF,  p-value: < 2.2e-16

Mixed Regression Based on Unconstrained

eq_u2 <- midas_u(y ~ trend + mls(x, 0:7, 4) + mls(z, 0:16, 12))
summary(eq_u2)

eq_u2 results:

Call:
lm(formula = y ~ trend + mls(x, 0:7, 4) + mls(z, 0:16, 12), data = ee)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.2651 -0.6489  0.1073  0.6780  2.7707 

Coefficients:
                         Estimate Std. Error t value Pr(>|t|)    
(Intercept)             1.9694327  0.1261210  15.615  < 2e-16 ***
trend                   0.1000072  0.0008769 114.047  < 2e-16 ***
mls(x, 0:7, 4)X.0/m     0.5268124  0.0643322   8.189 2.07e-14 ***
mls(x, 0:7, 4)X.1/m     0.3782006  0.0641497   5.896 1.38e-08 ***
mls(x, 0:7, 4)X.2/m     0.1879689  0.0680465   2.762 0.006219 ** 
mls(x, 0:7, 4)X.3/m    -0.0052409  0.0658730  -0.080 0.936658    
mls(x, 0:7, 4)X.4/m     0.1504419  0.0627623   2.397 0.017358 *  
mls(x, 0:7, 4)X.5/m     0.0104345  0.0655386   0.159 0.873647    
mls(x, 0:7, 4)X.6/m     0.0698753  0.0692803   1.009 0.314270    
mls(x, 0:7, 4)X.7/m     0.1463317  0.0650285   2.250 0.025412 *  
mls(z, 0:16, 12)X.0/m   0.3671055  0.0618960   5.931 1.14e-08 ***
mls(z, 0:16, 12)X.1/m   0.3502401  0.0599615   5.841 1.83e-08 ***
mls(z, 0:16, 12)X.2/m   0.4514656  0.0659546   6.845 7.37e-11 ***
mls(z, 0:16, 12)X.3/m   0.3733747  0.0577702   6.463 6.42e-10 ***
mls(z, 0:16, 12)X.4/m   0.3609667  0.0700954   5.150 5.75e-07 ***
mls(z, 0:16, 12)X.5/m   0.2155748  0.0632036   3.411 0.000769 ***
mls(z, 0:16, 12)X.6/m   0.0648163  0.0630194   1.029 0.304828    
mls(z, 0:16, 12)X.7/m   0.0665581  0.0673284   0.989 0.323955    
mls(z, 0:16, 12)X.8/m  -0.0014853  0.0624569  -0.024 0.981048    
mls(z, 0:16, 12)X.9/m   0.0466486  0.0675116   0.691 0.490305    
mls(z, 0:16, 12)X.10/m  0.0384882  0.0664136   0.580 0.562824    
mls(z, 0:16, 12)X.11/m -0.0077722  0.0591409  -0.131 0.895564    
mls(z, 0:16, 12)X.12/m -0.0283221  0.0620632  -0.456 0.648589    
mls(z, 0:16, 12)X.13/m -0.0375062  0.0608348  -0.617 0.538179    
mls(z, 0:16, 12)X.14/m  0.0297271  0.0652273   0.456 0.649018    
mls(z, 0:16, 12)X.15/m  0.0184075  0.0588059   0.313 0.754558    
mls(z, 0:16, 12)X.16/m -0.0546460  0.0677214  -0.807 0.420574    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9383 on 222 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.9855,	Adjusted R-squared:  0.9838 
F-statistic: 579.3 on 26 and 222 DF,  p-value: < 2.2e-16

The eq_u1 and eq_u2 results based on least squares and unconstrained mixing regression at the same frequency are consistent!

Nonlinear Estimation Based on midas_r

eq_r <- midas_r(y ~ trend + mls(x, 0:7, 4, nealmon) + mls(z, 0:16, 12, nealmon), start = list(x = c(1,-0.5), z = c(2, 0.5, -0.1)))
summary(eq_r)

Convergence Test

The Euclidean norm of the gradient and the hessian eigenvalue of the gradient are calculated by calculating the gradient and the hessian matrix to test whether the necessary and sufficient conditions for convergence are satisfied.Then determine if the gradient norm is close to zero and the eigenvalue is positive

deriv_tests(eq_r, tol = 1e-06)

Result:

$`first`
[1] FALSE

$second
[1] TRUE

$gradient
[1]  0.005542527  0.173490055 -0.005754945  0.028655981 -0.030150807  0.019224017  0.053829528

$eigenval
[1] 1.047988e+07 5.891393e+04 3.664513e+02 1.221231e+02 8.116275e+01 5.148844e+01 4.598507e+01

Coefficient Output

coef(eq_r)#Returns the coefficient of significance for the t-test

Result:

(Intercept)       trend          x1          x2          z1          z2          z3 
 1.98820762  0.09988275  1.35336157 -0.50747744  2.26312231  0.40927986 -0.07293300 

 

coef(eq_r, midas = TRUE)#Return all coefficients

Result:

 (Intercept)        trend           x1           x2           x3           x4           x5           x6           x7 
1.988208e+00 9.988275e-02 5.480768e-01 3.299490e-01 1.986333e-01 1.195797e-01 7.198845e-02 4.333793e-02 2.608997e-02 
          x8           z1           z2           z3           z4           z5           z6           z7           z8 
1.570648e-02 3.346806e-01 4.049071e-01 4.233810e-01 3.826120e-01 2.988388e-01 2.017282e-01 1.176921e-01 5.934435e-02 
          z9          z10          z11          z12          z13          z14          z15          z16          z17 
2.586203e-02 9.740853e-03 3.170901e-03 8.921121e-04 2.169239e-04 4.558760e-05 8.280130e-06 1.299807e-06 1.763484e-07

On the basis of eq_r, the function amweights is used to form several standard periodic function constraints

amweights(p = c(1, -0.5), d = 8, m = 4, weight = nealmon, type = "C")
nealmon(p = c(1, -0.5), d = 4)

eq_r1 <- midas_r(y ~ trend + mls(x, 0:7, 4, amweights, nealmon, "C") + mls(z, 0:16, 12, nealmon), start = list(x = c(1, -0.5), z = c(2, 0.5, -0.1)))
summary(eq_r1)

Other Weighting Forms

fn <- gompertzp#Weight setting for probability density function
eq_r2 <- midas_r(y ~ trend + mls(x, 0:7, 4, nealmon) + mls(z, 0:16, 12, fn), start = list(x = c(1,-0.5), z = c(1, 0.5, 0.1)))
summary(eq_r2)
eq_r3 <- midas_r(y ~ trend + mls(x, 0:7, 4) + mls(z, 0:16, 12, nealmon), start = list(z = c(1,-0.5)))
summary(eq_r3)

eq_r4 <- midas_r(y ~ trend + mls(y, 1:2, 1) + mls(x, 0:7, 4, nealmon), start = list(x = c(1,-0.5)))                                                                                         
summary(eq_r4)

eq_r5 <- midas_r(y ~ trend + mls(y, 1:2, 1, "*") + mls(x, 0:7, 4, nealmon), start = list(x = c(1,-0.5)))                                                                                               
summary(eq_r5)

eq_r6 <- midas_r(y ~ trend + mls(y, 1:4, 1, nealmon) + mls(x, 0:7, 4, nealmon), start = list(y = c(1, -0.5), x = c(1, -0.5)))                                                                                           
summary(eq_r6)

eq_r7 <- midas_r(y ~ trend + mls(x, 0:7, 4, amweights, nealmon, "B"), start = list(x = c(1, 1, -0.5)))                                                                                         summary(eq_r7)

eq_r8 <- midas_r(y ~ trend + mls(x, 0:7, 4, amweights, nealmon, "C"), start = list(x = c(1, -0.5)))                                                                                         summary(eq_r8)

eq_r9 <- midas_r(y ~ trend + mls(x, 0:7, 4, amweights, nealmon, "A"), start = list(x = c(1, 1, 1, -0.5)))                                                                                      
summary(eq_r9)
fn <- function(p, d) {
  p[1] * c(1:d)^p[2]
}
eq_r10 <- midas_r(y ~ trend + mls(x, 0:101, 4, fn), start = list(x = rep(0, 2)))
summary(eq_r10)

Sufficiency Test of Constraints

As long as the errors are independent and identically distributed, the hAh_test test can be used, while the hAhr_test is a HAC-Roust robust version test.As long as no significant hac is observed in the residuals, the hAh_test test test is more accurate in small samples.

eq_r12 <- midas_r(y ~ trend + mls(x, 0:7, 4, nealmon) + mls(z, 0:16, 12, nealmon), start = list(x = c(1, -0.5), z = c(2, 0.5, -0.1)))
summary(eq_r12)
#Sufficiency Test of Constraints
hAh_test(eq_r12)
hAhr_test(eq_r12)

Result:

hAh restriction test

data:  
hAh = 16.552, df = 20, p-value = 0.6818

hAh restriction test (robust version)

data:  
hAhr = 14.854, df = 20, p-value = 0.7847

Contrast: Reduce the number of parameters constrained by the function of variable z from 17 to 2

eq_r13 <- midas_r(y ~ trend + mls(x, 0:7, 4, nealmon) + mls(z, 0:12, 12, nealmon), start = list(x = c(1, -0.5), z = c(2, -0.1)))                                                                                                
hAh_test(eq_r13)
hAhr_test(eq_r13)
summary(eq_r13)

Result:

	hAh restriction test

data:  
hAh = 36.892, df = 17, p-value = 0.00348


	hAh restriction test (robust version)

data:  
hAhr = 32.879, df = 17, p-value = 0.01168

Reject the original assumption that there are insufficient constraints

Optimal Model Selection

step1: The function expand_weights_lags defines the potential set of models for each model

set_x <- expand_weights_lags(weights = c("nealmon", "almonp"), from = 0, to = c(5, 10), m = 1, start = list(nealmon = rep(1, 2), almonp = rep(1, 3)))


set_z <- expand_weights_lags(weights = c("nealmon", "nbeta"), from = 1, to = c(2, 3), m = 1, start = list(nealmon = rep(0,2), nbeta = rep(0.5, 3)))

step2: Estimates for all possible models

The function midas_r_ic_table returns a summary table of all models, along with the corresponding values of common information criteria and the empirical size of the parameter constraint adequacy test (default is the hAh_test test test test).The results of the derivative test and the convergence state of the optimization function are given.

eqs.ic <- midas_r_ic_table(y ~ trend + mls(x, 0, m = 4) + fmls(z, 0, m = 12), table = list(z = set_z,x = set_x), start = c(`(Intercept)` = 0, trend = 0))

mod <- modsel(eqs.ic, IC = "AIC", type = "restricted")

step3: View a candidate model to fine-tune its convergence and update the results:

eqs_ic$candlist[[5]] <- update(eqs_ic$candlist[[5]], Ofunction = "nls")

eqs_ic <- update(eqs_ic)

Select model manually

step1: training set (in-sample prediction), test set (out-of-sample prediction) segmentation

datasplit <- split_data(list(y = y, x = x, z = z, trend = trend), insample = 1:200, outsample = 201:250)

step2: Fit the selected models separately

mod1 <- midas_r(y ~ trend + mls(x, 4:14, 4, nealmon) + mls(z, 12:22, 12, nealmon), data = datasplit$indata, start = list(x = c(10, 1, -0.1), z = c(2, -0.1)))

mod2 <- midas_r(y ~ trend + mls(x, 4:20, 4, nealmon) + mls(z, 12:25, 12, nealmon), data = datasplit$indata, start = list(x = c(10, 1, -0.1), z = c(2, -0.1)))

step3:Calculation

avgf <- average_forecast(list(mod1, mod2), data = list(y = y, x = x, z = z, trend = trend), insample = 1:200, outsample = 201:250, type = "fixed", measures = c("MSE", "MAPE", "MASE"), fweights = c("EW", "BICW", "MSFE", "DMSFE"))

In-sample and out-of-sample prediction accuracy output:

avgf$accuracy

Result comparison:

$`individual`
                                                              Model MSE.out.of.sample MAPE.out.of.sample
1 y ~ trend + mls(x, 4:14, 4, nealmon) + mls(z, 12:22, 12, nealmon)          1.848474           4.474242
2 y ~ trend + mls(x, 4:20, 4, nealmon) + mls(z, 12:25, 12, nealmon)          1.763432           4.391594
  MASE.out.of.sample MSE.in.sample MAPE.in.sample MASE.in.sample
1          0.8155477      1.768720       14.05880      0.6992288
2          0.7976764      1.726365       13.56287      0.6897242

$average
  Scheme      MSE     MAPE      MASE
1     EW 1.803640 4.432918 0.8066121
2   BICW 1.763433 4.391595 0.7976766
3   MSFE 1.802640 4.431945 0.8064017
4  DMSFE 1.801787 4.431112 0.8062216

mod1,mod2 prediction output:

avgf$forecast

Result comparison:

          [,1]     [,2]
 [1,] 22.24057 22.38300
 [2,] 22.12774 22.13778
 [3,] 22.22678 22.21942
 [4,] 22.44535 22.61872
 [5,] 22.36448 22.46157
 [6,] 22.49847 22.66663
 [7,] 22.70192 22.70818
 [8,] 22.49411 22.53326
 [9,] 22.47978 22.54235
[10,] 22.43439 22.46522
[11,] 22.61563 22.57788
[12,] 22.87686 22.88271
[13,] 22.59329 22.64370
[14,] 23.39675 23.47606
[15,] 23.25864 23.27932
[16,] 22.97588 23.09488
[17,] 23.13196 23.21956
[18,] 24.05489 24.04334
[19,] 24.11690 24.16742
[20,] 23.79328 23.91429
[21,] 24.58047 24.58925
[22,] 24.33794 24.45825
[23,] 23.96658 24.03107
[24,] 24.00510 24.12939
[25,] 23.96473 23.99645
[26,] 24.02198 24.04416
[27,] 25.01521 24.91966
[28,] 24.47063 24.59918
[29,] 24.58748 24.51964
[30,] 24.69662 24.72768
[31,] 25.19092 25.22659
[32,] 25.57452 25.72345
[33,] 25.06312 25.17678
[34,] 24.99302 25.09359
[35,] 25.13005 25.16293
[36,] 25.85870 25.85633
[37,] 25.43593 25.46734
[38,] 25.80690 25.88224
[39,] 26.80900 26.82394
[40,] 26.44056 26.63095
[41,] 26.04786 26.17996
[42,] 26.09545 26.29647
[43,] 25.99309 25.98112
[44,] 25.89755 26.03643
[45,] 25.99814 26.18442
[46,] 26.25727 26.38797
[47,] 26.79563 26.92593
[48,] 26.57034 26.68217
[49,] 26.53808 26.51001
[50,] 26.72772 26.76183

Combination prediction effect output:

avgf$avgforecast

Result comparison:

Among them, EW: weighted combination based on eigenvalues; BICW: weighted combination based on BIC information criteria; MSFE: weighted combination based on mean square error of prediction; DMSFE: discounted MSFE

         EW     BICW     MSFE    DMSFE
 [1,] 22.31178 22.38299 22.31346 22.31489
 [2,] 22.13276 22.13778 22.13288 22.13298
 [3,] 22.22310 22.21942 22.22301 22.22294
 [4,] 22.53204 22.61872 22.53408 22.53582
 [5,] 22.41303 22.46157 22.41417 22.41515
 [6,] 22.58255 22.66663 22.58453 22.58622
 [7,] 22.70505 22.70818 22.70512 22.70519
 [8,] 22.51369 22.53326 22.51415 22.51454
 [9,] 22.51107 22.54235 22.51180 22.51244
[10,] 22.44980 22.46522 22.45017 22.45048
[11,] 22.59675 22.57788 22.59631 22.59593
[12,] 22.87979 22.88271 22.87986 22.87991
[13,] 22.61849 22.64370 22.61909 22.61960
[14,] 23.43641 23.47606 23.43734 23.43814
[15,] 23.26898 23.27932 23.26922 23.26943
[16,] 23.03538 23.09488 23.03678 23.03798
[17,] 23.17576 23.21956 23.17679 23.17767
[18,] 24.04912 24.04334 24.04898 24.04887
[19,] 24.14216 24.16742 24.14276 24.14326
[20,] 23.85379 23.91429 23.85521 23.85643
[21,] 24.58486 24.58925 24.58497 24.58505
[22,] 24.39809 24.45825 24.39951 24.40072
[23,] 23.99882 24.03107 23.99958 24.00023
[24,] 24.06724 24.12938 24.06870 24.06996
[25,] 23.98059 23.99645 23.98096 23.98128
[26,] 24.03307 24.04416 24.03334 24.03356
[27,] 24.96744 24.91967 24.96631 24.96535
[28,] 24.53491 24.59918 24.53642 24.53771
[29,] 24.55356 24.51964 24.55277 24.55208
[30,] 24.71215 24.72768 24.71252 24.71283
[31,] 25.20876 25.22659 25.20918 25.20954
[32,] 25.64898 25.72345 25.65074 25.65224
[33,] 25.11995 25.17678 25.12129 25.12244
[34,] 25.04331 25.09359 25.04449 25.04550
[35,] 25.14649 25.16293 25.14688 25.14721
[36,] 25.85751 25.85633 25.85749 25.85746
[37,] 25.45163 25.46734 25.45200 25.45232
[38,] 25.84457 25.88224 25.84546 25.84622
[39,] 26.81647 26.82394 26.81665 26.81680
[40,] 26.53575 26.63095 26.53799 26.53991
[41,] 26.11391 26.17996 26.11546 26.11679
[42,] 26.19596 26.29647 26.19833 26.20035
[43,] 25.98711 25.98112 25.98696 25.98684
[44,] 25.96699 26.03642 25.96862 25.97002
[45,] 26.09128 26.18442 26.09348 26.09535
[46,] 26.32262 26.38797 26.32416 26.32548
[47,] 26.86078 26.92593 26.86232 26.86363
[48,] 26.62626 26.68217 26.62757 26.62870
[49,] 26.52404 26.51001 26.52371 26.52343
[50,] 26.74478 26.76183 26.74518 26.74552

 

Posted by Jabop on Wed, 18 Sep 2019 20:32:14 -0700