Post-Doctoral Fellow-Indian Council of Social Science Research
Department of Commerce,
Aligarh Muslim University, Aligarh.
One of the emerging branches for economic analysis is econometrics. It is a conglomeration of mathematical economics, statistics, and economic theory. Macroeconomic variables are largely used by researchers for myriad analysis and for that matter regression (OLS) is widely used. However, what is generally ignored by a large section of researchers are the classical assumptions of regression. In the present study, an empirical investigation is taken up with respect to selected macroeconomic variables in terms of the OLS classical assumptions. The objective remains to understand the internal dynamics of the time series and to argue the prerequisite of the classical assumptions for any time series analysis in econometrics. The study may be used for understanding the default properties of macroeconomic time series variables.
Keywords: Macroeconomic variables, Classical Assumptions, Gauss-Markov Theorem, Econometrics
Econometrics as a branch of study is the conglomeration of mathematical economics, economic theory and statistics. Its development is credited to Ragnar Frisch in 1930s. Today, econometrics is widely used as a data analysis tool in most of the areas related to economics, international trade, behavioural economics etc. Out of the classification of data, the most popular classification are that of time series data, panel data and cross sectional data. Though it is true that regression (Ordinary Least Squares) is the basis of analysis for time series, it does not deny the use of regression in techniques relevant to panel and cross sectional data. Thus, OLS regression stands as widely used tool in econometric analysis. In the present body of knowledge, there are many research pieces, which have applied OLS on macroeconomic variables of India but without incorporating a discussion on finite sample properties under classical assumptions. There is a vacuum to be filled regarding finite sample properties and the macroeconomic variables of India such as export, import, GDP (gross domestic product) etc. The objective of the present study is to highlight the pre-requisite of OLS and to empirically evaluate the position of macroeconomic variables with respect to the sample properties. The study is divided into 7 sections. Section 2 discusses the approach followed to achieve the objective of the study by highlighting the conceptual framework. Section 3 captures the review of past studies that are relevant for the study. Section 4 and 5 deals with econometric models and data specification, respectively. While eventually Section 6 and 7 highlights the results and conclusion of the study, respectively.
The question posed in the study is “whether macroeconomic variables of India for the sample period adhere to the finite sample properties of OLS under classical assumptions?” In order to answer the question two primary steps are involved. First being the identification of macroeconomic variables and second being the ambit of finite sample properties along with its connotations. For easier understanding, the need is to proceed chronologically. The Oxford Economics dictionary (Black, Hashimzade & Myles, 2012) while explaining macro econometrics highlights that it deals with macroeconomic data. The usage of the term macroeconomic data is actually what is in common parlance referred to as macroeconomic variables such as export, import, GDP, inflation etc. In the present study, export, import, current account balance, foreign direct investment and GDP of India are used.
OLS regression needs to be understood along with its assumptions also known as finite sample properties. The finite sample properties vary based on type of data. Here the objective is to develop discussion along the lines of time series data. There has remained a difference of opinion regarding the number of finite sample properties. This may be due to the differences in theoretical econometrics and applied econometrics. Whatever the reason may be, in our times the standard assumptions are 6 (Wooldridge, 2009). The discussion on each assumption is developed in coming lines.
Finite Sample Property (now on referred to as FSP) 1: Linearity in Parameters
The random process follows the linear model where the sequence of errors or disturbances is. Here, n is the number of observations.
FSP 2: No perfect collinearity
In the time series, no independent variable is constant nor a perfect linear combination of the others. It allows the explanatory variables to be correlated but it rules out perfect correlation in the sample.
FSP 3: Zero conditional mean
For each t, the expected value of given the explanatory variable of all time periods, is zero. Symbolically,
It means random variable for consecutive time periods must not be correlated. More clearly it means that in a deterministic regression model, and are uncorrelated and is also uncorrelated with past and future values of
Conditional on the variance of is the same for all t. Symbolically,
and are independent.
FSP 5: No serial correlation/ autocorrelation
Conditional on the errors in two different time periods are uncorrelated; Symbolically,
for all or in simpler form,FSP 6: Normality
The errors are independent of and are independently distributed as normal
In the present study, the discussion revolves around FSP 1 to FSP 6 and objective remains to verify the assumptions on raw data.
The review of literature in this study can be taken up in two forms. One by studying studies utilising regression for macroeconomic variables and their comment or observations on the FSPs. The other way can be to search specifically for studies on the FSPs; how many they are?, what is their status?, which one are necessary and which sufficient? However, taking the former was impractical due to limited availability of time, cost and access to research papers. It is also to be noted that as per the search and access of researcher there is no such study available addressing unequivocally the issue of FSPs with respect to macroeconomic variables of India. Nonetheless, in the available body of knowledge, the discussions on FSPs and how they are ignored are largely available. This also appears to be a positive argument in favour of the study as it justifies the objective. A review of past studies clearly indicates that due emphasis has been put up on checking up the assumptions before using linear regression only with certain minimal variations (e.g. Colenutt, 1968; Johnston, 1963; Campillo, 1993, Osborne & Waters, 2002). However, this is not the case with a large section of the studies that are published. In several fields it has been found that the researchers applying linear regression do not either use the FSPs or do not present the results relating to FSPs (in which case it appears the same as the former). For example, it was reported by and found on the basis of a sample of psychological researchers data, that FSPs were rarely checked and their knowledge about them was poor (Hoekstra, Kiers& Johnson, 2012). An important study relevant to the discussion is that of Ottenbacher, Ottenbacher, Tooth & Ostir (2004), which reviewed research papers published in two journals, i.e. American Journal of Public Health and American Journal of Epidemiology. Out of the 348 articles over a period between 1970-1998, it was found that FSPs of the regression are not checked upon by the researchers. Out of 99 studies selected only 17% of studies discussed the FSP 2 (multicollinearity). Out of the 36 articles on logistic regression, 29 articles (81%) provided no information on the FSPs. These results have raised concern over using of regression analysis while ignoring the FSPs. Similarly in the field of geography, it has been observed late in 1970s that geographers have largely ignored and skipped the discussion on the assumptions i.e. FSPs. To add, historically, it was in 1968 that two researchers J. B. Cole and C. A. M. King warned about the usage of regression without checking for FSPs (Poole &O’Farrell, 1971).
The researchers are unanimous on the issue of FSPs, their adherence for stability of the model. Though few of the FSPs can be relaxed on the basis of objective. For example, if the objective of the model is prediction, the FSP 2 (multicollinearity) can be relaxed but if the objective is quantification of the parameter (point estimation), then FSP 2 (multicollinearity) cannot be relaxed. Thus, it would be befitting if a discussion is developed about FSPs with respect to selected variables. The present study is an attempt to identify FSPs for macroeconomic variables of India so that future researchers may benefit from them andmay assume the default nature of macroeconomic variables for India.
In this section, all the models would be specified as well as estimation methods would be elaborated upon from FSP-1 to FSP-6.
It discusses the linearity in the parameters. The question for us is how to estimate and check whether the univariate series or multivariate series has linearity in parameters. As Williams, Grajales and Kurkiewicz (2013) reports that “it is not possible to investigate these (FSPs) assumptions without estimating the actual regression model” simple default models would be used on the argument of parsimony to check for the assumptions. In the present venture 5 variables such as Current Account Balance (CAB), Export (EXT), Foreign Direct Investment Inflows (FDI), Import (IMT)and Gross Domestic Product (GDP) are used assuming GDP as the explained and others as explanatory variables (Appendix II). This is based on theoretical considerations and empirical justifications (Narayan & Prasad, 2008; Iqbal, Ahmad, Haider & Anwar, 2014).
Symbolically, ……
The linearity parameter restrictions are put on the variables using Wald Test and interpreting on the basis of t-statistic and F-statistic. Out of the 4 explanatory variables, all observations of CAB are negative values while the rest are positive. In order to test the linearity assumption, we take an opposite method of checking, that is instead of checking linearity the researcher checked for non-linearity condition. Moreover, if that condition is fulfilled the parameters are truly non-linear. At this stage, any non-linear parameter will suffice for inference. The following conditions will be simultaneously checked using single p values.
Condition 1:
Condition 2:
Condition 3:
Condition 4:
The null hypothesis will be “The parameters are non-linear”.
FSP-2: No perfect multicollinearity
According to Oxford Economics dictionary “perfect mulitcollinearity occurs when some of the explanatory variables are perfectly correlated” (Black, Hashimzade & Myles, 2012). There are multiple tools available to identify multicollinearity between independent variables. The most commonly used technique is Variance Inflation Factor (VIF). The specification model of VIF is as follows:
Where is the value obtained by regressing the kth predictor on the remaining predictors. As a rule of thumb, a VIF value below 10 is considered acceptable meaning there is no major problem of multicollinearity. This appears to be a liberal view; a more conservative view puts the bar on VIF to be four. However, O’brein (2007) has objected to such rule of thumb and has argued that further model specification is required to identify the problem as in certain cases even the values above 10, 20 and 40 can have no implications for inferences. To follow the objective of parsimony and adhering to a liberal approach rule of thumb of VIF 10 is selected to decide about the magnitude of multicollinearity.
FSP-3: Zero conditional mean of the error term
This property is concerned with the conditional mean of the error term in a given model. Using model 4.1, residual series would be generated and with the help of generated series the conditional mean of the series will be calculated with reference to mean dependent variance. Here, the command code to be used is important as conventionally it is seldom used.
Command code: where y is the name of series and x is the mean conditional variance of y.
FSP-4: Homoscedasticity/ No Heteroscedasticity
In order to test this particular assumption, the White’s Test (1980) is employed both due to its popularity and simplicity. In it the null hypothesis is of “no heteroscedasticity” using auxiliary regression where the squared residuals are regressed on all possible cross products of the regressors. According to our baseline model 4.1, the White’s model of heteroscedsticity is specified in the following manner:
FSP-5: No serial correlation/ autocorrelation
In order to test the autocorrelation, Breusch-Godfrey Serial Correlation LM Test is used due to parsimony and the good results it is used for. The baseline specification of error term used in B-G test is as follows:
The null hypothesis is: read as “no serial correlation of h order”. The order h can be specified at the time of analysis.
The normality assumption of the error term is the widely checked property but with a deviation. The deviation being that sometimes researchers have checked the normality of the variables instead of checking the normality of the error terms/ residuals. The normality of the error term can be checked through q-q plots or simply with the help of histogram and jarque-bera statistics. The study will utilize jarque-bera statistics.
The study used five macroeconomic variables of India expressed in US$ millions and the data is taken from UNCTAD database. The five variables are Current Account Balance (CAB), Exports (EXT), Imports (IMT), Gross Domestic Product (GDP) and Foreign Direct Investment Inflows (FDI). The time period for data is from 1980 to 2013. The UNCTAD database has not been updated for 2014 and 2015 with respect to one or more variables in the study. Thus in order to have a symmetry, data until 2013 is used for inferences. The data set is referred to Appendix I.
The analysis begins with FSP-1 to FSP-6on the basis of baseline model 4.1. The output of the OLS regression for model 4.1 is shown in Annexure III. On the basis of that model the individual results pertaining to finite sample properties are presented.
FSP-1 result: The wald test is used for testing the linearity in the parameters with a method where the null hypothesis is of non-linear parameters. The output is presented in Table 1. As per the output when all the four conditions with respect to coefficients (parameters) is identified, the null hypothesis of non-linearity is rejected as the probability value of both F-statistic and Chi-square is less than 0.05 (0.0000, 0.00000).
Test Statistic |
Value |
df |
Probability |
F-statistic |
3961.712 |
(4, 29) |
0.0000 |
Chi-square |
15846.85 |
4 |
0.0000 |
Null Hypothesis: C(1)=C(2)/C(3), C(2)=C(3)/C(4), |
|||
C(3)=C(4)/C(1), C(4)=C(1)/C(2) |
|||
Null Hypothesis Summary: |
|
||
|
|
|
|
Normalized Restriction (= 0) |
Value |
Std. Err. |
|
C(1) - C(2)/C(3) |
5.659183 |
2.557029 |
|
C(2) - C(3)/C(4) |
-1.806183 |
3.976922 |
|
C(3) - C(4)/C(1) |
-4.270869 |
2.197172 |
|
-C(1)/C(2) + C(4) |
8.615776 |
0.775793 |
|
|
|
|
|
On the basis of table 1 it is crystal clear that by default the macroeconomic variables of India are linear in parameters as supported by wald test.
FSP-2 result: Earlier in section 4, it was explained that a liberal approach towards VIF value will be used so as to check for “no perfect multicollinearity”. In line with that commitment VIF values are calculated and presented in table 2.
|
Coefficient |
Uncentered |
Centered |
Variable |
Variance |
VIF |
VIF |
CAB |
10.64757 |
92.24232 |
69.35394 |
EXT |
14.92668 |
6818.624 |
4173.886 |
FDI |
4.674395 |
16.50489 |
11.42620 |
IMT |
12.57888 |
8815.163 |
5432.172 |
C |
1.55E+08 |
2.267666 |
NA |
From table 2, look to the centered VIF because of the presence of intercept in the baseline model. The liberal approach accepts the VIF till 10 so as to have no problem of perfect multicollinearity. It is clear from the table that all the explanatory variables have a VIF more than 10. Therefore, there is a problem of multicollinearity between the macroeconomic variables (liberal approach criteria).
FSP-3 result: The assumption pertains to zero conditional mean of the error term. The conditional mean of the disturbance term is calculated with the help of command given in econometrics estimation methods section. The outcome is that conditional mean of residuals with respect to mean conditional variance is not zero. It came out to be -3.42E-11 which is other than zero. Thus, FSP 3 is not fulfilled. For residual series refer to Appendix IV.
FSP-4 result: For checking the assumption of no heteroscedasticity (homoscedasticity) White test (1980) is used and the output of the test is shown in table 3.
F-statistic |
6.861793 |
Prob. F(14,19) |
0.0001 |
Obs*R-squared |
28.38579 |
Prob. Chi-Square(14) |
0.0126 |
Scaled explained SS |
33.86424 |
Prob. Chi-Square(14) |
0.0022 |
As per the White’s output, the null hypothesis is “there is homoscedasticity” and as the probability value is less than 0.05 (0.0001, 0.0126, 0.0022), the null hypothesis is rejected. This means that the macroeconomic variables have no homoscedasticity (there is heteroscedasticity). Thus, FSP-4 is accepted and verified for the macroeconomic variables of India.
FSP-5 result: The fifth finite property is about no autocorrelation in the specified model. The output is shown in table 4.
F-statistic |
2.095741 |
Prob. F(2,27) |
0.1425 |
Obs*R-squared |
4.568888 |
Prob. Chi-Square(2) |
0.1018 |
The null hypothesis under B-G test was “there is no serial correlation”. As the prob. value is more than 0.05 in both F-statistic and Chi-square (0.1425, 0.1018), the null of no serial correlation is accepted. This means that the macroeconomic variables have no problem of autocorrelation.
FSP-6 result: In order to check the normality of the error term the residual series has been generated and using the histogram and jarque bera statistics decision regarding normality is taken. Remember if the prob. of the jarque-bera statistic is more than 0.05 then the data is supposed to be normal. The histogram is shown as Figure 1 and jarque-bera statistic as table 5.
Figure 1.Histogram for residuals form specified model
Mean |
-3.42E-11 |
Median |
5563.326 |
Jarque-Bera |
3.104849 |
Probability |
0.211734 |
Observations |
34 |
The probability value of jarque-bera statistics is 0.2117 which is more than 0.05, indicating that the residuals (error term) are normally distributed. This verifies the assumption. The summarized results are shown in table 6.
S.No. |
Finite Sample Property/ Assumption |
Symbol |
Status |
1 |
Linearity in parameters |
FSP-1 |
Accepted |
2 |
No perfect collinearity |
FSP-2 |
Rejected |
3 |
Zero conditional mean |
FSP-3 |
Rejected |
4 |
Homoscedasticity |
FSP-4 |
Rejected |
5 |
No serial correlation/ autocorrelation |
FSP-5 |
Accepted |
6 |
Normality |
FSP-6 |
Accepted |
In the sample of macroeconomic variables of India it has been eventually concluded that three of the finite sample properties were accepted (FSP-1, FSP-5, FSP-6) while three were rejected (FSP-2, FSP-3, FSP-4). This outcome can be inferred to state that the macroeconomic variables of India are linear, has no autocorrelation and the residuals are normally distributed. However, the other finite sample properties are not satisfied in the macroeconomic variables. In this sense, the study is conclusive. However, the study is inconclusive due to the limitations of the baseline model. Still it will be helpful for the researchers to bear caution for applying regression without checking for the finite sample properties. The results of the study can also be used by researchers as default characteristics of the macroeconomic variables of India.
Black, J., Hashimzade, N., & Myles, G. (Eds.). (2012). A dictionary of economics. OUP Oxford.
Campillo, C. (1993). Standardizing criteria for logistic regression models. Annals of internal medicine, 119(6), 540.
Colenutt, R. J. (1968). Building linear predictive models for urban planning. Regional Studies, 2(1), 139-143. Doi: http://dx.doi.org/10.1080/09595236800185111
Hoekstra, R., Kiers, H., & Johnson, A. (2012). Are assumptions of well-known statistical techniques checked, and why (not)?. Frontiers in psychology, 3, 137. Doi: http://dx.doi.org/10.3389/fpsyg.2012.00137
Iqbal, N., Ahmad, N., Haider, Z., & Anwar, S. (2014). Impact of foreign direct investment (FDI) on GDP: A Case study from Pakistan. International Letters of Social and Humanistic Sciences, 5, 73-80.
JOHNSTON, J. (1963). Econometric methods McGraw-IIi11 Book. Company Inc., New York.
Narayan, P. K., & Prasad, A. (2008). Electricity consumption–real GDP causality nexus: Evidence from a bootstrapped causality test for 30 OECD countries. Energy Policy, 36(2), 910-918.
O’brien, R. M. (2007). A caution regarding rules of thumb for variance inflation factors. Quality & Quantity, 41(5), 673-690.DOI 10.1007/s11135-006-9018-6
Osborne, J., & Waters, E. (2002). Four assumptions of multiple regression that researchers should always test. Practical assessment, research & evaluation, 8(2), 1-9.
Ottenbacher, K. J., Ottenbacher, H. R., Tooth, L., & Ostir, G. V. (2004). A review of two journals found that articles using multivariable logistic regression frequently did not report commonly recommended assumptions. Journal of clinical epidemiology, 57(11), 1147-1152.DOI:http://dx.doi.org/10.1016/j.jclinepi.2003.05.003
Poole, M. A., & O'Farrell, P. N. (1971). The assumptions of the linear regression model. Transactions of the Institute of British Geographers, 145-158.DOI: 10.2307/621706
White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica: Journal of the Econometric Society, 817-838.DOI: 10.2307/1912934
Williams, M. N., Grajales, C. A. G., & Kurkiewicz, D. (2013). Assumptions of multiple regression: correcting two misconceptions. Practical Assessment, Research & Evaluation, 18(11), 2.
Wooldridge, J. M. (2009). Introductory econometrics: A modern approach. Nelson Education.
Year |
FDI |
EXT |
IMT |
GDP |
CAB |
1980 |
79.16 |
11274.4 |
16927.95 |
181116.2 |
-1785.13 |
1981 |
91.92 |
11234.71 |
17397.43 |
193190.8 |
-2698.33 |
1982 |
72.08 |
12159.03 |
17517.74 |
197258.9 |
-2523.54 |
1983 |
5.64 |
13059.98 |
17572.63 |
215224.9 |
-1936.94 |
1984 |
19.24 |
13423.63 |
17857.8 |
213177.6 |
-2311.07 |
1985 |
106.09 |
12849.2 |
18984.13 |
221993.5 |
-4140.58 |
1986 |
117.73 |
13476.23 |
19631.83 |
243226.2 |
-4567.7 |
1987 |
212.32 |
15247.4 |
22290.08 |
269161.7 |
-5171.17 |
1988 |
91.25 |
17301.08 |
25412.6 |
297762.5 |
-7143.23 |
1989 |
252.1 |
20283.7 |
28127.95 |
294788.2 |
-6812.77 |
1990 |
236.69 |
22911.06 |
29526.65 |
320349.7 |
-7035.65 |
1991 |
75 |
23020.36 |
27031.88 |
283967.7 |
-4291.73 |
1992 |
252 |
24953.49 |
29665.6 |
285176.4 |
-4485.22 |
1993 |
532 |
27122.92 |
30604.96 |
278384 |
-1875.8 |
1994 |
974 |
31560.65 |
37872.37 |
318925.1 |
-1676.28 |
1995 |
2151 |
38013.22 |
48225.1 |
361957.2 |
-5563.23 |
1996 |
2525 |
40975.69 |
54960 |
381492.8 |
-5956.14 |
1997 |
3619 |
44812.71 |
58172.8 |
414237.5 |
-2965.2 |
1998 |
2633 |
45766.8 |
59367.9 |
416885.4 |
-6903.11 |
1999 |
2168 |
51386.3 |
62827.5 |
444434.8 |
-3228.02 |
2000 |
3587.99 |
59931.7 |
73075.2 |
458561.1 |
-4601.25 |
2001 |
5477.638 |
62130.2 |
71311.2 |
473441.7 |
1410.18 |
2002 |
5629.671 |
70619.3 |
75741.5 |
494986.7 |
7059.5 |
2003 |
4321.076 |
84795 |
92959.1 |
579668.7 |
8772.51 |
2004 |
5777.807 |
116219.6 |
131179.9 |
701347.4 |
780.196 |
2005 |
7621.769 |
154703.3 |
181978.5 |
820980 |
-10283.5 |
2006 |
20327.76 |
193498.1 |
225268.1 |
929215.2 |
-9299.06 |
2007 |
25349.89 |
240712.9 |
279416.3 |
1182321 |
-8075.69 |
2008 |
47102.42 |
305729 |
380088.5 |
1268588 |
-30972 |
2009 |
35633.94 |
260847.5 |
328257.5 |
1311852 |
-26186.4 |
2010 |
27417.08 |
348035 |
439059 |
1668768 |
-54515.9 |
2011 |
36190.46 |
446375 |
553062 |
1892420 |
-62517.6 |
2012 |
24195.77 |
443629.5 |
579405.919 |
1869210 |
-91471.2 |
2013 |
28199.45 |
464187.7 |
559767.3941 |
1936088 |
-49226 |
Name |
Measurement |
Symbol |
Current Account Balance |
US$ millions |
CAB |
Exports |
US$ millions |
EXT |
Imports |
US$ millions |
IMT |
Gross Domestic Product |
US$ millions |
GDP |
Foreign Direct Investment Flows |
US$ millions |
FDI |
Variable |
Coefficient |
Std. Error |
t-Statistic |
Prob. |
CAB |
6.368638 |
3.263062 |
1.951737 |
0.0607 |
EXT |
-2.369604 |
3.863507 |
-0.613330 |
0.5444 |
FDI |
-3.340036 |
2.162035 |
-1.544857 |
0.1332 |
IMT |
5.928138 |
3.546671 |
1.671465 |
0.1054 |
C |
192934.4 |
12451.83 |
15.49446 |
0.0000 |
R-squared |
0.993055 |
Mean dependent var |
630004.7 |
|
Adjusted R-squared |
0.992097 |
S.D. dependent var |
542359.5 |
|
S.E. of regression |
48215.12 |
Akaike info criterion |
24.53979 |
|
Sum squared resid |
6.74E+10 |
Schwarz criterion |
24.76425 |
|
Log likelihood |
-412.1764 |
Hannan-Quinn criter. |
24.61634 |
|
F-statistic |
1036.657 |
Durbin-Watson stat |
1.275324 |
|
Prob(F-statistic) |
0.000000 |
|
|
|
Source: Prepared by researcher through Eviews9.5