The shap_analysis module provides comprehensive SHAP (SHapley Additive exPlanations) value calculation and visualization tools for XGBoost, LightGBM, and CatBoost models. SHAP values explain individual predictions by quantifying each feature’s contribution.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/larsiusprime/openavmkit/llms.txt
Use this file to discover all available pages before exploring further.
Core Functions
get_full_model_shaps
Trained prediction model (must be one of the supported tree-based models)
Training set features (independent variables)
Test set features
Sales data features
Universe (full population) features
Print detailed progress information during SHAP calculation
Dictionary containing
shap.Explanation objects with keys:"train": SHAP values for training data"test": SHAP values for test data"sales": SHAP values for sales data"univ": SHAP values for universe data
- XGBoost: Uses approximate mode by default for speed
- LightGBM: Uses exact native
pred_contrib=Truemethod - CatBoost: Uses native
get_feature_importance(type="ShapValues")with approximate mode
make_shap_table
SHAP Explanation object (output from
get_full_model_shaps or tree explainer)Primary keys in the same row order as explained data (e.g., parcel IDs)
Feature names in canonical training order
Optional transaction keys (for sales data)
Include a column reconstructing the model prediction:
base_value + sum(SHAP values)DataFrame with columns:
key: Primary identifierkey_sale: Transaction identifier (if provided)base_value: Model’s base prediction value- One column per feature with SHAP contribution values
contribution_sum: Reconstructed prediction (ifinclude_pred=True)
[key, key_sale?, base_value, feature_1, feature_2, ..., feature_n, contribution_sum?]
plot_full_beeswarm
SHAP Explanation object to visualize
Plot title
Optional file path to save figure (e.g.,
"plots/shap_beeswarm.png")
Format inferred from extension (.png, .pdf, .svg)Additional arguments for
plt.savefig() (e.g., {"dpi": 300, "bbox_inches": "tight"})Maximum character width for feature name wrapping
- Automatic figure sizing based on feature count
- Wrapped feature names for readability
- Color-coded by feature value (red = high, blue = low)
- Sorted by mean absolute SHAP value
Private Helper Functions
_xgboost_shap
- Uses
tree_path_dependentperturbation for categorical splits - Enables categorical DMatrix properties automatically
_lightgbm_shap
- With categorical features: uses
tree_path_dependentmode without background data - Without categorical features: uses
interventionalmode with background samples
_catboost_shap
- Uses
tree_path_dependentmode (required for categorical splits) - Tags explainer with
_cb_modelattribute for special handling
_shap_explain
- CatBoost: Uses native
get_feature_importance(type="ShapValues")with “Approximate” mode - LightGBM: Uses native
booster.predict(pred_contrib=True)for speed - XGBoost: Uses standard TreeExplainer with categorical support
Usage Examples
Example 1: Calculate SHAP Values for All Datasets
Example 2: Create SHAP Contribution Table
Example 3: Compare Feature Importance Across Subsets
Example 4: Individual Prediction Explanation
Understanding SHAP Values
What SHAP Values Represent
- Base Value: Average model prediction across training data
- SHAP Value: Change in prediction (on log scale for log models) attributable to that feature
- Prediction:
base_value + sum(all SHAP values)
Interpretation
- Positive SHAP value: Feature increases predicted value
- Negative SHAP value: Feature decreases predicted value
- Magnitude: Larger absolute values = stronger influence
- Additivity: SHAP values sum exactly to the prediction
For Log-Scale Models
If your model predictslog(price), SHAP values are also on log scale:
Model Support Matrix
| Model Type | Categorical Support | Approximate Mode | Native SHAP |
|---|---|---|---|
| XGBoost | ✓ | ✓ | via TreeExplainer |
| LightGBM | ✓ | ✗ | ✓ (pred_contrib) |
| CatBoost | ✓ | ✓ | ✓ (get_feature_importance) |
- All models support categorical features through appropriate handling
- LightGBM’s native method is exact and fast
- CatBoost’s “Approximate” mode provides significant speed gains