π§ Notebook 4 β Model Interpretability and Economic Drivers of Hotel Demand (2015 β 2026)ΒΆ
This notebook extends the forecasting framework by translating model outputs into economic insights.
Building on the predictions and feature sets from Notebook 3, it focuses on explainability β identifying which macroeconomic and behavioral factors drive hotel demand across EU countries.
It integrates machine-learning interpretability (SHAP) and econometric panel regressions to bridge data-driven forecasts with transparent, theory-based reasoning.
π― ObjectivesΒΆ
- Apply SHAP (Shapley Additive Explanations) to interpret feature contributions in XGBoost and LightGBM models.
- Quantify economic relationships between model forecasts and macro indicators using panel regressions with fixed effects.
- Visualize and compare the magnitude and direction of macroeconomic drivers across all model families (Naive, ARIMAX, SARIMAX, XGBoost, LightGBM).
- Export interpretable results and visual summaries for downstream reporting and policy applications.
Structure OverviewΒΆ
- Environment Setup
- Load Processed Data and Model Predictions
- Data Readiness and Alignment
- SHAP Explainability (XGBoost + LightGBM)
- Panel Regression Analysis (Model Explainability)
- Visualize Macroeconomic Drivers Across Models
- Export Regression Results and Visualizations
- Summary and Handoff
Inputs
π ../data/processed/hotel_predictions.csv
π ../outputs/models/pipe_xgb.pkl
π ../outputs/models/pipe_lgbm.pkl
π ../outputs/models/X_train_shap.parquet
π ../outputs/models/X_train_columns.json
Outputs
π ../outputs/figures/shap_summary_xgb.png
π ../outputs/figures/shap_summary_lgbm.png
π ../outputs/figures/shap_bar_xgb.png
π ../outputs/figures/shap_bar_lgbm.png
π ../outputs/figures/shap_dependence_xgb_log_gdp_lag1.png
π ../outputs/figures/shap_dependence_lgbm_log_gdp_lag1.png
π ../outputs/reports/driver_regression_summary.csv
Interpretation Insights β Model Explainability and Economic Drivers
This notebook connects the predictive power of machine-learning models with transparent economic interpretation.
By combining SHAP explainability (for XGBoost & LightGBM) and panel regressions with fixed effects, it uncovers how macroeconomic indicators shape hotel-demand forecasts across the EU.SHAP-based insights
- Both XGBoost and LightGBM identify
log_gdpandturnover_indexas the dominant positive drivers of hotel nights.- Lagged GDP and turnover variables capture persistence in demand dynamics.
weighted_stringency_indexshows a mild negative impact, consistent with post-pandemic recovery.- Currency and regional effects (e.g.,
eurusd,region_LU) have secondary but interpretable roles.Panel-regression validation
- Across Naive, ARIMAX, SARIMAX, and ML forecasts, GDP lag 1 remains a significant long-run determinant.
- Turnover growth explains short-run cyclical variation, while policy stringency remains mildly contractionary.
- Unemployment effects are weak once GDP and turnover are controlled for.
The convergence between SHAP patterns and regression coefficients confirms that machine-learning models internalize economically meaningful structures, not mere statistical artifacts.
This interpretability layer ensures theoretical consistency and strengthens the credibility of forecasts used in Notebook 5 β Scenario Forecasting and Policy Simulations (2025 β 2026).
Purpose:
To provide an integrated interpretability layer connecting machine-learning forecasts with macroeconomic theory, paving the way for Notebook 5 β Scenario Forecasting and Policy Simulations (2025 β 2026).
# %% ===============================================================
# STEP 0 β ENVIRONMENT SETUP
# ===============================================================
# ruff: noqa: E402
import sys
from pathlib import Path
# --- Project path setup ---
project_root = Path.cwd().parent
if str(project_root) not in sys.path:
sys.path.append(str(project_root))
# --- Imports ---
from utils.explainability import (
compute_shap_values,
summarize_shap,
dependence_plot,
run_panel_regression,
)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import shap # noqa: F401
import joblib
import json
import statsmodels.api as sm # noqa: F401
import warnings
# --- Visualization & warnings ---
plt.style.use("seaborn-v0_8-whitegrid")
sns.set_palette("viridis")
warnings.filterwarnings("ignore")
# --- Directories ---
BASE_DIR = Path("..")
DATA_PROCESSED = BASE_DIR / "data" / "processed"
OUTPUTS = BASE_DIR / "outputs"
FIGURES = OUTPUTS / "figures"
MODELS = OUTPUTS / "models"
REPORTS = OUTPUTS / "reports"
for path in [DATA_PROCESSED, FIGURES, MODELS, REPORTS]:
path.mkdir(parents=True, exist_ok=True)
print("β
Environment setup complete.")
β Environment setup complete.
# %% ===============================================================
# STEP 1 β LOAD PROCESSED DATA AND MODEL PREDICTIONS
# Purpose: Import the unified dataset and all model outputs needed for interpretability.
# ===============================================================
# type: ignore[name-defined]
# --- Load feature-engineered dataset ---
FEATURE_PATH = DATA_PROCESSED / "hotel_features.csv"
df_features = pd.read_csv(FEATURE_PATH, parse_dates=["month"])
print(f"β
Loaded feature dataset: {df_features.shape} | {FEATURE_PATH.name}")
# --- Load combined predictions (includes yhat_naive, yhat_arimax, yhat_sarimax, yhat_xgb, yhat_lgbm) ---
PRED_PATH = DATA_PROCESSED / "hotel_predictions.csv"
df = pd.read_csv(PRED_PATH, parse_dates=["month"])
print(f"β
Loaded master predictions: {df.shape} | {PRED_PATH.name}")
# --- Quick structure check ---
pred_cols = [c for c in df.columns if c.startswith("yhat_")]
print(f"[INFO] Available prediction columns: {pred_cols}")
# --- Merge features with predictions (ensuring consistent keys) ---
if not {"region", "month"}.issubset(df.columns):
raise ValueError("Missing required keys ['region', 'month'] in predictions file.")
df_full = (
df_features.merge(df, on=["region", "month"], how="left", suffixes=("", "_pred"))
)
print(f"β
Final merged dataset ready for interpretability: {df_full.shape}")
df_full.head()
β Loaded feature dataset: (3328, 30) | hotel_features.csv β Loaded master predictions: (3328, 35) | hotel_predictions.csv [INFO] Available prediction columns: ['yhat_naive', 'yhat_arimax', 'yhat_sarimax', 'yhat_xgb', 'yhat_lgbm'] β Final merged dataset ready for interpretability: (3328, 63)
| region | month | year | log_nights_spent | log_gdp_lag1 | log_gdp_lag2 | log_gdp_lag3 | unemployment_rate_lag1 | unemployment_rate_lag2 | unemployment_rate_lag3 | ... | unemployment_rate_pred | turnover_index_pred | weighted_stringency_index_pred | eurusd_pred | eurgbp_pred | yhat_naive | yhat_arimax | yhat_sarimax | yhat_xgb | yhat_lgbm | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | AT | 2015-01-01 | 2015 | 14.421982 | NaN | NaN | NaN | NaN | NaN | NaN | ... | 5.5 | 31.0 | 44.671611 | 1.128796 | 0.748800 | NaN | NaN | NaN | NaN | NaN |
| 1 | AT | 2015-02-01 | 2015 | 14.578970 | 11.021584 | NaN | NaN | 5.5 | NaN | NaN | ... | 5.9 | 31.0 | 44.671611 | 1.119796 | 0.726058 | 14.421982 | NaN | NaN | NaN | NaN |
| 2 | AT | 2015-03-01 | 2015 | 14.475429 | 11.028797 | 11.021584 | NaN | 5.9 | 5.5 | NaN | ... | 5.4 | 31.0 | 44.671611 | 1.083025 | 0.731200 | 14.578970 | NaN | NaN | NaN | NaN |
| 3 | AT | 2015-04-01 | 2015 | 14.199757 | 11.035960 | 11.028797 | 11.021584 | 5.4 | 5.9 | 5.5 | ... | 5.6 | 31.0 | 44.671611 | 1.111432 | 0.720400 | 14.475429 | NaN | NaN | NaN | NaN |
| 4 | AT | 2015-05-01 | 2015 | 14.399386 | 11.043071 | 11.035960 | 11.028797 | 5.6 | 5.4 | 5.9 | ... | 5.8 | 31.0 | 44.671611 | 1.096035 | 0.715400 | 14.199757 | NaN | NaN | NaN | NaN |
5 rows Γ 63 columns
# %% ===============================================================
# STEP 2 β DATA READINESS & ALIGNMENT
# Purpose: Prepare unified dataset for interpretability and driver analysis.
# ===============================================================
# type: ignore[name-defined]
# --- Copy main dataset for safety ---
df = df_full.copy()
# --- Check and enforce datetime consistency ---
df["month"] = pd.to_datetime(df["month"])
df["year"] = df["month"].dt.year
df["quarter"] = df["month"].dt.to_period("Q").astype(str)
# --- Drop rows with missing key identifiers ---
df = df.dropna(subset=["region", "month"])
print(f"β
Records after cleaning identifiers: {df.shape}")
# --- Identify available prediction columns ---
pred_cols = [c for c in df.columns if c.startswith("yhat_")]
print(f"[INFO] Found {len(pred_cols)} model prediction columns: {pred_cols}")
# --- Ensure numeric columns have proper dtype ---
num_cols = df.select_dtypes(include=[np.number]).columns.tolist()
df[num_cols] = df[num_cols].apply(pd.to_numeric, errors="coerce")
# --- Verify alignment between features and predictions ---
required_cols = ["region", "month", "log_nights_spent"]
missing_cols = [c for c in required_cols if c not in df.columns]
if missing_cols:
raise ValueError(f"Missing required columns: {missing_cols}")
print(f"β
Dataset aligned and ready β {df.shape[0]} rows Γ {df.shape[1]} columns")
# --- Optional quick check: one sample per region ---
df.groupby("region")["month"].min().head()
β Records after cleaning identifiers: (3328, 64) [INFO] Found 5 model prediction columns: ['yhat_naive', 'yhat_arimax', 'yhat_sarimax', 'yhat_xgb', 'yhat_lgbm'] β Dataset aligned and ready β 3328 rows Γ 64 columns
region AT 2015-01-01 BE 2015-01-01 BG 2015-01-01 CY 2015-01-01 CZ 2015-01-01 Name: month, dtype: datetime64[ns]
# %% ===============================================================
# STEP 3 β SHAP EXPLAINABILITY (XGBoost + LightGBM)
# ===============================================================
# Purpose:
# Compute SHAP values using the training matrix saved in Notebook 3,
# visualize feature importance (beeswarm + bar plots), and summarize
# key drivers of hotel demand.
# Inputs:
# - outputs/models/pipe_xgb.pkl
# - outputs/models/pipe_lgbm.pkl
# - outputs/models/X_train_shap.parquet
# - outputs/models/X_train_columns.json
# Outputs:
# - SHAP plots (XGB & LGBM)
# - feature importance tables
# ===============================================================
# type: ignore[name-defined]
MODELS = Path("../outputs/models")
# --- Load trained pipelines ---
pipe_xgb = joblib.load(MODELS / "pipe_xgb.pkl")
pipe_lgbm = joblib.load(MODELS / "pipe_lgbm.pkl")
# --- Load SHAP-ready matrix ---
X_train_shap = pd.read_parquet(MODELS / "X_train_shap.parquet")
with open(MODELS / "X_train_columns.json") as f:
feature_names = json.load(f)
X_train_shap = X_train_shap[feature_names] # ensure correct order
print(f"β
Data loaded: {X_train_shap.shape[0]} rows Γ {X_train_shap.shape[1]} features")
# --- Compute SHAP values using utility function ---
explainer_xgb, shap_values_xgb = compute_shap_values(pipe_xgb, X_train_shap)
explainer_lgbm, shap_values_lgbm = compute_shap_values(pipe_lgbm, X_train_shap)
# --- Summarize and visualize SHAP results ---
summarize_shap(shap_values_xgb, model_name="XGBoost")
summarize_shap(shap_values_lgbm, model_name="LightGBM")
# === Save SHAP plots ===
FIGURES = Path("../outputs/figures")
FIGURES.mkdir(parents=True, exist_ok=True)
# --- Align shapes (important if background sample used) ---
X_shap_plot = getattr(shap_values_xgb, "data", X_train_shap)
n_rows = min(len(X_shap_plot), len(X_train_shap))
X_shap_plot = X_train_shap.iloc[:n_rows, :]
# XGBoost
plt.figure()
shap.summary_plot(shap_values_xgb, X_shap_plot, max_display=15, show=False)
plt.title("XGBoost β SHAP Beeswarm Summary")
plt.tight_layout()
plt.savefig(FIGURES / "shap_summary_xgb.png", dpi=300, bbox_inches="tight")
plt.close()
plt.figure()
shap.summary_plot(shap_values_xgb, X_shap_plot, plot_type="bar", max_display=15, show=False)
plt.title("XGBoost β Global Feature Importance")
plt.tight_layout()
plt.savefig(FIGURES / "shap_bar_xgb.png", dpi=300, bbox_inches="tight")
plt.close()
# LightGBM
plt.figure()
shap.summary_plot(shap_values_lgbm, X_shap_plot, max_display=15, show=False)
plt.title("LightGBM β SHAP Beeswarm Summary")
plt.tight_layout()
plt.savefig(FIGURES / "shap_summary_lgbm.png", dpi=300, bbox_inches="tight")
plt.close()
plt.figure()
shap.summary_plot(shap_values_lgbm, X_shap_plot, plot_type="bar", max_display=15, show=False)
plt.title("LightGBM β Global Feature Importance")
plt.tight_layout()
plt.savefig(FIGURES / "shap_bar_lgbm.png", dpi=300, bbox_inches="tight")
plt.close()
# --- Feature-level dependence example (optional) ---
dependence_plot(shap_values_xgb, X_train_shap, "log_gdp_lag1", model_name="XGBoost")
dependence_plot(shap_values_lgbm, X_train_shap, "log_gdp_lag1", model_name="LightGBM")
# === Save dependence plots ===
# Align sample sizes between SHAP values and features
X_dep_xgb = getattr(shap_values_xgb, "data", X_train_shap)
n_xgb = min(len(X_dep_xgb), len(X_train_shap))
X_dep_xgb = X_train_shap.iloc[:n_xgb, :]
X_dep_lgb = getattr(shap_values_lgbm, "data", X_train_shap)
n_lgb = min(len(X_dep_lgb), len(X_train_shap))
X_dep_lgb = X_train_shap.iloc[:n_lgb, :]
plt.figure()
shap.dependence_plot("log_gdp_lag1", shap_values_xgb.values[:n_xgb, :], X_dep_xgb, show=False)
plt.title("XGBoost β SHAP Dependence: log_gdp_lag1")
plt.tight_layout()
plt.savefig(FIGURES / "shap_dependence_xgb_log_gdp_lag1.png", dpi=300, bbox_inches="tight")
plt.close()
plt.figure()
shap.dependence_plot("log_gdp_lag1", shap_values_lgbm.values[:n_lgb, :], X_dep_lgb, show=False)
plt.title("LightGBM β SHAP Dependence: log_gdp_lag1")
plt.tight_layout()
plt.savefig(FIGURES / "shap_dependence_lgbm_log_gdp_lag1.png", dpi=300, bbox_inches="tight")
plt.close()
# --- Compute and display feature ranking tables ---
def shap_feature_ranking(shap_values, X, model_name):
vals = np.abs(shap_values.values).mean(0)
importance = pd.DataFrame({"Feature": X.columns, "Mean(|SHAP|)": vals})
importance = importance.sort_values("Mean(|SHAP|)", ascending=False)
print(f"\nπ Top 10 features β {model_name}:")
display(importance.head(10))
return importance
fi_xgb = shap_feature_ranking(shap_values_xgb, X_train_shap, "XGBoost")
fi_lgb = shap_feature_ranking(shap_values_lgbm, X_train_shap, "LightGBM")
print("\nβ
SHAP analysis complete for both models.")
β Data loaded: 2808 rows Γ 53 features β οΈ TreeExplainer failed: could not convert string to float: '[1.3403345E1]' β Using fallback (PermutationExplainer, safe mode).
PermutationExplainer explainer: 501it [00:31, 11.79it/s]
β SHAP values computed successfully (safe mode). β TreeExplainer succeeded for LGBMRegressor. πΉ Plotting XGBoost SHAP summaries...
πΉ Plotting LightGBM SHAP summaries...
π Plotting SHAP dependence for 'log_gdp_lag1' β XGBoost
π Plotting SHAP dependence for 'log_gdp_lag1' β LightGBM
π Top 10 features β XGBoost:
| Feature | Mean(|SHAP|) | |
|---|---|---|
| 47 | log_gdp | 1.155631 |
| 49 | turnover_index | 0.238093 |
| 27 | log_gdp_lag1 | 0.165288 |
| 26 | year | 0.094964 |
| 16 | region_LU | 0.088025 |
| 28 | log_gdp_lag2 | 0.064996 |
| 45 | log_gdp_mom | 0.055016 |
| 51 | eurusd | 0.053878 |
| 36 | weighted_stringency_index_lag1 | 0.053622 |
| 33 | turnover_index_lag1 | 0.047180 |
π Top 10 features β LightGBM:
| Feature | Mean(|SHAP|) | |
|---|---|---|
| 47 | log_gdp | 1.258319 |
| 49 | turnover_index | 0.206234 |
| 16 | region_LU | 0.131491 |
| 27 | log_gdp_lag1 | 0.116105 |
| 26 | year | 0.084515 |
| 51 | eurusd | 0.053934 |
| 45 | log_gdp_mom | 0.048543 |
| 36 | weighted_stringency_index_lag1 | 0.040350 |
| 33 | turnover_index_lag1 | 0.037127 |
| 50 | weighted_stringency_index | 0.036592 |
β SHAP analysis complete for both models.
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
# %% ===============================================================
# STEP 4 β PANEL REGRESSION ANALYSIS (MODEL EXPLAINABILITY)
# ===============================================================
# Purpose:
# Quantify how key macroeconomic indicators explain the model predictions
# (e.g., log GDP growth, turnover, unemployment). Fixed effects (by region)
# control for unobserved heterogeneity.
# Inputs:
# - data/processed/hotel_predictions.csv (merged with macro features)
# Outputs:
# - Panel regression summaries (clustered by region)
# ===============================================================
# type: ignore[name-defined]
# --- Select evaluation subset (2024 only) ---
df_eval = df.query("month >= '2024-01-01' and month < '2025-01-01'").copy()
print(f"π
Evaluation subset: {df_eval.shape[0]} rows")
# --- Dependent (predicted) and explanatory variables ---
pred_vars = [c for c in df_eval.columns if c.startswith("yhat_")]
exog_vars = [
"log_gdp_mom", "turnover_index_mom", "log_gdp_lag1",
"unemployment_rate_lag1", "weighted_stringency_index_lag1"
]
print(f"[INFO] Predictions available: {pred_vars}")
print(f"[INFO] Explanatory variables: {exog_vars}")
# --- Run panel regressions using utility function ---
panel_results = {}
for y_var in pred_vars:
try:
model = run_panel_regression(df_eval, y_var, exog_vars)
panel_results[y_var] = model
print(f"β
Regression completed: {y_var}")
except Exception as e:
print(f"β οΈ {y_var} failed ({e})")
# --- Display key regression outputs ---
for name, model in panel_results.items():
print(f"\nπ Model: {name}")
display(model.summary().tables[1]) # Coefficients table only
π Evaluation subset: 312 rows [INFO] Predictions available: ['yhat_naive', 'yhat_arimax', 'yhat_sarimax', 'yhat_xgb', 'yhat_lgbm'] [INFO] Explanatory variables: ['log_gdp_mom', 'turnover_index_mom', 'log_gdp_lag1', 'unemployment_rate_lag1', 'weighted_stringency_index_lag1'] β Regression completed: yhat_naive β Regression completed: yhat_arimax β Regression completed: yhat_sarimax β Regression completed: yhat_xgb β Regression completed: yhat_lgbm π Model: yhat_naive
| coef | std err | t | P>|t| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| const | 3.6463 | 0.341 | 10.689 | 0.000 | 2.975 | 4.317 |
| log_gdp_mom | 0.1055 | 0.149 | 0.708 | 0.480 | -0.188 | 0.399 |
| turnover_index_mom | -0.0020 | 0.003 | -0.712 | 0.477 | -0.008 | 0.004 |
| log_gdp_lag1 | 1.0394 | 0.037 | 28.155 | 0.000 | 0.967 | 1.112 |
| unemployment_rate_lag1 | 0.0178 | 0.020 | 0.879 | 0.380 | -0.022 | 0.058 |
| weighted_stringency_index_lag1 | -0.0198 | 0.009 | -2.206 | 0.028 | -0.037 | -0.002 |
π Model: yhat_arimax
| coef | std err | t | P>|t| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| const | 3.7826 | 0.348 | 10.873 | 0.000 | 3.098 | 4.467 |
| log_gdp_mom | 0.1830 | 0.152 | 1.203 | 0.230 | -0.116 | 0.482 |
| turnover_index_mom | 0.0063 | 0.003 | 2.201 | 0.029 | 0.001 | 0.012 |
| log_gdp_lag1 | 1.0293 | 0.038 | 27.341 | 0.000 | 0.955 | 1.103 |
| unemployment_rate_lag1 | 0.0189 | 0.021 | 0.913 | 0.362 | -0.022 | 0.060 |
| weighted_stringency_index_lag1 | -0.0193 | 0.009 | -2.118 | 0.035 | -0.037 | -0.001 |
π Model: yhat_sarimax
| coef | std err | t | P>|t| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| const | 3.7530 | 0.358 | 10.494 | 0.000 | 3.049 | 4.457 |
| log_gdp_mom | 0.1497 | 0.156 | 0.957 | 0.339 | -0.158 | 0.457 |
| turnover_index_mom | 0.0036 | 0.003 | 1.226 | 0.221 | -0.002 | 0.009 |
| log_gdp_lag1 | 1.0362 | 0.039 | 26.771 | 0.000 | 0.960 | 1.112 |
| unemployment_rate_lag1 | 0.0128 | 0.021 | 0.604 | 0.547 | -0.029 | 0.055 |
| weighted_stringency_index_lag1 | -0.0203 | 0.009 | -2.164 | 0.031 | -0.039 | -0.002 |
π Model: yhat_xgb
| coef | std err | t | P>|t| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| const | 3.7330 | 0.345 | 10.817 | 0.000 | 3.054 | 4.412 |
| log_gdp_mom | 0.2098 | 0.151 | 1.391 | 0.165 | -0.087 | 0.507 |
| turnover_index_mom | 0.0042 | 0.003 | 1.481 | 0.140 | -0.001 | 0.010 |
| log_gdp_lag1 | 1.0321 | 0.037 | 27.635 | 0.000 | 0.959 | 1.106 |
| unemployment_rate_lag1 | 0.0183 | 0.021 | 0.892 | 0.373 | -0.022 | 0.059 |
| weighted_stringency_index_lag1 | -0.0181 | 0.009 | -1.997 | 0.047 | -0.036 | -0.000 |
π Model: yhat_lgbm
| coef | std err | t | P>|t| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| const | 3.6632 | 0.339 | 10.792 | 0.000 | 2.995 | 4.331 |
| log_gdp_mom | 0.2459 | 0.148 | 1.657 | 0.098 | -0.046 | 0.538 |
| turnover_index_mom | 0.0047 | 0.003 | 1.673 | 0.095 | -0.001 | 0.010 |
| log_gdp_lag1 | 1.0340 | 0.037 | 28.148 | 0.000 | 0.962 | 1.106 |
| unemployment_rate_lag1 | 0.0223 | 0.020 | 1.108 | 0.269 | -0.017 | 0.062 |
| weighted_stringency_index_lag1 | -0.0175 | 0.009 | -1.969 | 0.050 | -0.035 | -1.51e-05 |
# %% ===============================================================
# STEP 5 β VISUALIZE ECONOMETRIC DRIVERS ACROSS MODELS
# ===============================================================
# Purpose:
# Compare coefficient magnitudes and directions for key macroeconomic
# drivers obtained from panel regressions (STEP 4).
# Inputs:
# - panel_results (dictionary of fitted regression models)
# Outputs:
# - Combined coefficient plot across models (excluding intercept)
# ===============================================================
# type: ignore[name-defined]
# --- Collect coefficients & confidence intervals ---
coef_list = []
for name, model in panel_results.items():
coefs = model.params.dropna()
conf = model.conf_int()
conf.columns = ["ci_lower", "ci_upper"]
df_coef = pd.concat([coefs, conf], axis=1).reset_index()
df_coef.columns = ["variable", "coef", "ci_lower", "ci_upper"]
df_coef["model"] = name
coef_list.append(df_coef)
# --- Combine all models ---
coef_all = pd.concat(coef_list, ignore_index=True)
# --- Remove region fixed effects ---
coef_filtered = coef_all[~coef_all["variable"].str.startswith("C(region)")].copy()
# --- Clean variable names for readability ---
pretty_names = {
"log_gdp_mom": "Log of GDP (MoM)",
"turnover_index_mom": "Turnover (MoM)",
"log_gdp_lag1": "Log of GDP Lag 1",
"unemployment_rate_lag1": "Unemployment Lag 1",
"weighted_stringency_index_lag1": "Stringency Lag 1"
}
coef_filtered["variable"] = coef_filtered["variable"].replace(pretty_names)
# --- Exclude intercept before plotting ---
coef_no_intercept = coef_filtered[~coef_filtered["variable"].isin(["Intercept", "const"])]
# --- Plot coefficient comparison ---
plt.figure(figsize=(10, 6))
sns.barplot(
data=coef_no_intercept,
x="variable",
y="coef",
hue="model",
palette="viridis"
)
plt.axhline(0, color="black", lw=1)
plt.title("Macroeconomic Drivers of Forecasted Hotel Demand", fontsize=12, pad=15)
plt.xlabel("Explanatory Variable")
plt.ylabel("Estimated Coefficient (Ξ²)")
plt.xticks(rotation=30, ha="right")
plt.legend(title="Model", bbox_to_anchor=(1.05, 1), loc="upper left")
plt.tight_layout()
plt.show()
# %% ===============================================================
# STEP 6 β EXPORT REGRESSION RESULTS & VISUALIZATIONS
# ===============================================================
# Purpose:
# Save coefficient tables and plots from STEP 5 for reporting.
# Outputs:
# - driver_regression_summary.csv
# - driver_coefficients.png
# ===============================================================
# type: ignore[name-defined]
REG_SUMMARY_PATH = REPORTS / "driver_regression_summary.csv"
REG_PLOT_PATH = FIGURES / "driver_coefficients.png"
# --- Export coefficient summary ---
coef_filtered.to_csv(REG_SUMMARY_PATH, index=False)
print(f"πΎ Regression summary saved β {REG_SUMMARY_PATH.resolve()}")
print(f"[INFO] {coef_filtered.shape[0]} rows Γ {coef_filtered.shape[1]} columns")
# --- Save coefficient plot (intercept excluded) ---
plt.figure(figsize=(10, 6))
sns.barplot(
data=coef_no_intercept,
x="variable",
y="coef",
hue="model",
palette="viridis"
)
plt.axhline(0, color="black", lw=1)
plt.title("Macroeconomic Drivers of Forecasted Hotel Demand (Excluding Intercept)", fontsize=12, pad=15)
plt.xlabel("Explanatory Variable")
plt.ylabel("Estimated Coefficient (Ξ²)")
plt.xticks(rotation=30, ha="right")
plt.legend(title="Model", bbox_to_anchor=(1.05, 1), loc="upper left")
plt.tight_layout()
plt.savefig(REG_PLOT_PATH, dpi=300, bbox_inches="tight")
plt.close()
print(f"πΌοΈ Plot saved β {REG_PLOT_PATH.resolve()} (Intercept excluded)")
πΎ Regression summary saved β /Users/golibsanaev/Library/CloudStorage/Dropbox/GitHub_gsanaev/forecasting-explaining-hotel-demand-in-eu/outputs/reports/driver_regression_summary.csv [INFO] 30 rows Γ 5 columns πΌοΈ Plot saved β /Users/golibsanaev/Library/CloudStorage/Dropbox/GitHub_gsanaev/forecasting-explaining-hotel-demand-in-eu/outputs/figures/driver_coefficients.png (Intercept excluded)
π§Ύ Summary β Model Interpretability and Economic Drivers (Notebook 4)ΒΆ
This notebook combined machine-learning explainability (SHAP) and econometric panel regressions to uncover how macroeconomic fundamentals drive EU hotel-demand forecasts (2015 β 2026).
It confirmed that predictive accuracy from modern ML methods is anchored in economically interpretable mechanisms, not opaque correlations.
π Comparative Summary β Macroeconomic Drivers of Forecasted Hotel Demand (Panel Regression, 2024)ΒΆ
| Variable | Expected Effect | Naive | ARIMAX | SARIMAX | XGBoost | LightGBM | Interpretation |
|---|---|---|---|---|---|---|---|
| Log of GDP (MoM) | β | β οΈ (0.51) | β οΈ (0.26) | β οΈ (0.38) | β οΈ (0.16) | β οΈ (0.10) | Short-term GDP growth moderately lifts demand |
| Turnover (MoM) | β | β οΈ (0.48) | β (0.03) | β οΈ (0.19) | β οΈ (0.15) | β οΈ (0.12) | Sector turnover reinforces cyclical movements |
| Log of GDP Lag 1 | β | β (0.00) | β (0.00) | β (0.00) | β (0.00) | β (0.00) | Lagged GDP drives long-term demand persistence |
| Unemployment Lag 1 | β | β οΈ (0.38) | β οΈ (0.36) | β οΈ (0.55) | β οΈ (0.29) | β οΈ (0.28) | Higher unemployment weakly depresses demand |
| Stringency Lag 1 | β | β (0.03) | β (0.04) | β (0.03) | β (0.03) | β οΈ (0.05) | Policy restrictions remain mildly negative |
Legend: β = significant (p < 0.05)βββ οΈ = borderline (0.05 β€ p < 0.10)
π¬ Key TakeawaysΒΆ
- GDP (current + lagged) dominates as the principal driver across all models.
- Turnover explains short-run cyclical movements.
- Stringency retains a small negative impact, confirming recovery patterns.
- Unemployment adds limited explanatory power once GDP is controlled for.
Econometric and ML evidence align closely, strengthening confidence that data-driven forecasts reflect real economic dynamics rather than statistical artifacts.
π Next β Notebook 5ΒΆ
Use these estimated elasticities to design macroeconomic scenarios (Β± GDP or policy shocks) and simulate 2025β2026 trajectories under alternative policy conditions.