Beyond In-Sample Significance: Using k-fold cross-validation
Beyond In-Sample Significance: Using k-fold cross-validation
Monday, 13 October 2014: 4:30 PM
Typically, studies of the gravity model of bilateral trade have used in-sample data techniques to examine the significance and impact of various variables of interest. However, this ignores the implicit logic behind most studies on whether a variable is a significant predictor of bilateral trade. Examining the predictive significance of a variable requires out-of-sample data techniques. Furthermore, many studies impose regularities on the model, such as suggesting that the expected impact of a currency union is uniform for all countries. The purpose of my paper is two-fold: First, I give a side-by-side comparison of in-sample and out-of-sample data techniques, specifically k-fold cross-validation, to show the benefits of using out-of-sample data techniques when examining the gravity model of bilateral trade. This shifts the focus from sample uncertainty, which is limited within bilateral trade data, to model uncertainty, which poses a larger potential problem in this context. Second, I begin addressing the implicit regularities that are often imposed upon the gravity model by examining possible interaction terms and various non-linear specifications using the aforementioned k-fold cross-validation technique. My results indicate that the k-fold cross-validation provides more robust results and prevents over-fitting the model with practically and statistically insignificant variables. Moreover, I find strong evidence to suggest that the log specification of GDP and GDP per capita in the gravity model need to be loosened in order to give the best predictive model and help avoid omitted variable bias. This change reduces the suggested bilateral trade increasing effect of a currency union by almost fifty percent, suggesting a large previous bias within the model.