No importance ranking can be simultaneously faithful, stable, and complete under symmetry — at any level of a model, from input features to internal circuits — and we prove it in Lean 4.
If you have ever retrained an XGBoost model and noticed the "most important feature" changed, this paper proves that is not a bug — it is a mathematical inevitability. The same instability appears inside neural networks: 10 independently trained transformers on TinyStories agree at only ρ=0.54 on which circuit component matters most. The instability is not in SHAP, not in activation patching — it is in the mathematics of attribution itself.
Input-level: On Breast Cancer Wisconsin, 50 retrains produce 24 distinct top-3 rankings (4.2% agreement). Under standard regularization, 45% of German Credit applicants receive a different "most important feature" across equivalent models. 68% of 77 public datasets show instability.
Component-level: A G-invariant projection (orbit averaging over the architectural symmetry group) lifts circuit agreement from ρ=0.54 to ρ=0.98, with 14/14 theory-derived predictions confirmed across two architectural scales. A fine-tuned GPT-2 confirms the boundary: without Rashomon diversity, instability vanishes (ρ=0.993).
For binary attribution questions (SHAP sign, feature selection, circuit membership), faithful + stable alone is impossible (the bilemma). The design space has exactly two families; DASH (orbit averaging) is Pareto-optimal. A 7-line nonparametric diagnostic outperforms the Gaussian formula by 2×. Everything is machine-verified in Lean 4: 357 theorems across 58 files with 6 axioms and 0 sorry.
NeurIPS 2026 submission (10 pages + 81-page supplement). Monograph: 82 pages. arXiv preprint forthcoming.
# Full setup (Lean + Python)
make setup
# Build Lean proofs (~5 min cached)
make lean
# Compile all 4 paper versions
make paper
# Verify counts match papers
make verify
# Run key validation experiments (~5 min)
make validateAttribution rankings flip when you retrain a model — at every level.
Input-level: SHAP feature rankings flip for collinear features with similar importance. The ranking is literally a coin flip — 50% of the time, the "most important" feature changes. In a survey of 77 public datasets, 68% exhibit this instability. This applies to XGBoost, LightGBM, CatBoost, Lasso, neural networks, and permutation importance alike.
Component-level: Activation patching importance rankings flip for architecturally symmetric components (heads within a layer). 10 independently trained transformers on TinyStories agree at only ρ=0.54. All 4 heads in layer 0 appear as "most important" across 10 seeds — full S₄ symmetry realized.
The root cause is the Rashomon property: when components are interchangeable (collinear features, or architecturally symmetric heads), multiple configurations exist that rank them in opposite orders. The fix is the same at both levels: orbit averaging — aggregate across the symmetry to recover what is shared. For features, this is DASH (ensemble averaging). For circuits, this is the G-invariant projection.
No importance ranking — of input features or internal components — can be simultaneously faithful (reflect what the model computed), stable (robust to retraining), and complete (decide every pair). Faithful, Stable, Complete: Pick Two.
The Attribution Trilemma
Faithful ────────── IMPOSSIBLE ────────── Stable
│ X │
│ (Theorem 1) │
│ │
Family A Family B
(single model) (DASH ensemble)
+ Faithful, Complete + Faithful, Stable
- Unstable (50% flips) - Within-group ties
- Lean:
attribution_impossibilityinTrilemma.lean - Axiom dependencies: Zero. The proof uses only the Rashomon property as a hypothesis.
- Proof: A 4-line contradiction. If a ranking is faithful, it must agree with each model. But collinear features admit models ranking them in opposite orders (Rashomon). So the ranking cannot agree with both, violating stability. If stable, it cannot be faithful to all models. Completeness forces the choice.
This is the paper's center of gravity. Everything else quantifies, extends, or resolves this result.
The impossibility is not a mild nuisance. The attribution ratio — how much the dominant feature is overweighted relative to the true contribution — diverges or explodes depending on the model class:
| Model Class | Ratio | Behavior | Lean File |
|---|---|---|---|
| GBDT | 1/(1-rho^2) | Diverges to infinity as rho approaches 1 | Ratio.lean, SplitGap.lean |
| Lasso | Infinity | One feature gets everything, the other gets zero | Lasso.lean |
| Neural networks | Conditional | Depends on which feature the network "captures" | NeuralNet.lean |
| Random forests | O(1/sqrt(T)) | Converges (bounded violations) — the contrast case | RandomForest.lean |
Key theorems:
ratio_tendsto_atTopinRatio.lean— the attribution ratio tends to infinity as correlation approaches 1split_gap_exactinSplitGap.lean— the exact split gap formula (pure algebra)lasso_impossibilityinLasso.lean— Lasso attribution ratio is infinitenn_impossibilityinNeuralNet.lean— neural net impossibility conditional on captured feature
The GBDT ratio 1/(1-rho^2) means that at rho=0.9, the dominant feature gets approximately 5.3x its fair share. At rho=0.99, it gets approximately 50x. The flip rate for binary collinear groups is exactly 1/2 — a literal coin flip (binary_group_flip_rate in FlipRate.lean).
The impossibility at both levels admits the same resolution: orbit averaging.
Input-level: DASH (Diversified Aggregation for Stable Hypotheses): average |SHAP| values across M independently trained models. Provably the minimum-variance unbiased estimator via Rao-Blackwell (Lean-verified).
Component-level: G-invariant projection. For a network with symmetry group G = S_k^L, average activation patching importance within orbits (heads in the same layer). Lifts agreement from ρ=0.54 to ρ=0.98 on TinyStories. No retraining needed — just a post-processing step on a single model.
consensus_equityinCorollary.lean— DASH achieves equity (zero unfaithfulness) for collinear feature pairs in balanced ensemblessum_squares_ge_inv_MinEnsembleBound.lean— Cauchy-Schwarz optimality via Titu's lemma (the sum of squares of weights is minimized by equal weights)- Variance decreases as 1/M, with tight ensemble size formula: M_min = ceil(2.71 * sigma^2 / Delta^2)
DASH is Pareto-optimal: no attribution method achieves better stability without sacrificing more faithfulness. The proof is in DesignSpace.lean. The tradeoff: DASH produces ties for within-group features (it cannot distinguish features that are genuinely interchangeable), but ranks between-group features faithfully and stably.
The achievable set of attribution methods under symmetry has exactly two families among deterministic methods — and nothing else:
The Attribution Design Space
┌──────────────────────────────────────────────────────┐
│ │
│ Family A (single-model methods) │
│ + Faithful + Complete │
│ - Unstable: 50% flip rate for within-group pairs │
│ Examples: TreeSHAP, KernelSHAP, permutation imp. │
│ │
├──────────────────────────────────────────────────────┤
│ │
│ Family B (DASH ensemble methods) │
│ + Faithful + Stable (variance -> 0 as M -> inf) │
│ - Incomplete: within-group ties │
│ Example: DASH with M >= M_min models │
│ │
└──────────────────────────────────────────────────────┘
No third family exists. Every attribution method falls into A or B.
design_space_theoreminDesignSpace.lean— the design space characterizationfamily_a_or_family_binDesignSpaceFull.lean— exhaustiveness proof
Both relaxation paths (drop completeness, drop stability) converge to the same solution: DASH. This convergence is itself proved (relaxation_paths_converge in PathConvergence.lean) and contrasts with Arrow's theorem, where relaxation paths diverge.
The impossibility is an instance of a general pattern: any symmetric decision problem admits exactly two strategy families — the faithful-but-unstable individual strategy and the stable-but-tied aggregate strategy. This is the Symmetric Bayes Dichotomy (SBD), a reusable proof technique from invariant decision theory.
symmetric_bayes_dichotomyinSymmetricBayes.lean— the general theorem with orbit bounds viaMulAction
Three verified instances, each with a different symmetry group:
| Instance | Symmetry Group | Impossibility | Lean File |
|---|---|---|---|
| Feature attribution | S_2 transpositions | Cannot rank collinear features | Trilemma.lean |
| Model selection | S_2 permutations | Cannot select among Rashomon-equivalent models | ModelSelection.lean |
| Causal discovery | CPDAG automorphisms | Cannot orient edges in Markov equivalence class | CausalDiscovery.lean |
Each extension is a self-contained theorem in its own Lean file:
| Extension | Key Result | Lean File |
|---|---|---|
| Conditional SHAP | Cannot escape when beta_j = beta_k; escapes when Delta-beta > ~0.2 | ConditionalImpossibility.lean |
| Fairness audits | SHAP-based proxy audits are coin flips; (1/2)^K intersectional compounding | FairnessAudit.lean |
| Intersectional fairness | Multi-attribute fairness impossibility | IntersectionalFairness.lean |
| Fisher information | Independent proof path via FIM; Rashomon ellipsoid | FIMImpossibility.lean |
| Query complexity | Omega(sigma^2/Delta^2) queries needed; Le Cam lower bound | QueryComplexity.lean |
| Query complexity (parametric) | Parametric query complexity bounds | QueryComplexityParametric.lean |
| Rashomon inevitability | Impossibility is inescapable for standard ML under symmetry | RashomonInevitability.lean |
| Local vs global | Local instability >= global instability | LocalGlobal.lean |
| Flip rate | Exact GBDT flip rate; binary group = coin flip | FlipRate.lean |
| Efficiency | SHAP efficiency amplification factor m/(m-1) | Efficiency.lean |
| Alpha-faithfulness | Bound on alpha-faithful attribution | AlphaFaithful.lean |
| Unfaithfulness bound | Unfaithfulness >= 1/2 for complete rankings; ties are optimal | UnfaithfulBound.lean |
| Rashomon universality | Rashomon property from symmetry via feature swap | RashomonUniversality.lean |
| Gaussian flip rate | Standard normal CDF, phi(0)=1/2, flip rate from Gaussian noise | GaussianFlipRate.lean |
| Mutual information | Information-theoretic impossibility bounds | MutualInformation.lean |
| Robustness (Lipschitz) | Lipschitz robustness bounds for attribution | RobustnessLipschitz.lean |
| Local sufficiency | Local sufficiency conditions for attribution | LocalSufficiency.lean |
| Stump proportionality | Decision stump proportionality derivation | StumpProportionality.lean |
| Model selection design space | Design space for model selection (SBD instance) | ModelSelectionDesignSpace.lean |
dash-impossibility-lean/
│
├── PROJECT_REGISTRY.md # Complete catalogue: every proof, experiment, claim
├── DASHImpossibility/ # 58 Lean 4 files, 357 theorems, 6 axioms, 0 sorry
│ │
│ │ ── Level 0: Pure Logic ──
│ ├── Trilemma.lean # attribution_impossibility (zero axiom deps, 4-line proof)
│ │
│ │ ── Level 1: Framework ──
│ ├── Iterative.lean # IterativeOptimizer connects to Rashomon property
│ │
│ │ ── Level 2: Model Instantiation ──
│ ├── General.lean # GBDT instance, gbdt_impossibility, gbdtOptimizer
│ ├── Lasso.lean # lasso_impossibility (ratio = infinity)
│ ├── NeuralNet.lean # nn_impossibility (conditional on captured feature)
│ │
│ │ ── Level 3: Quantitative Bounds ──
│ ├── SplitGap.lean # split_gap_exact, split_gap_ge_half (pure algebra)
│ ├── Ratio.lean # attribution_ratio = 1/(1-rho^2), ratio_tendsto_atTop
│ │
│ │ ── Level 4: Spearman ──
│ ├── SpearmanDef.lean # Spearman from midranks, spearman_instability_bound (derived)
│ │
│ │ ── Level 5: Resolution ──
│ ├── Corollary.lean # DASH consensus equity, variance convergence
│ ├── Impossibility.lean # Combined: equity violation + stability bound
│ │
│ │ ── Level 6: Design Space ──
│ ├── DesignSpace.lean # design_space_theorem, DASH ties, Pareto structure
│ ├── DesignSpaceFull.lean # family_a_or_family_b (exhaustiveness)
│ │
│ │ ── Level 7: Derivation ──
│ ├── SymmetryDerive.lean # attribution_sum_symmetric (DERIVED from axioms)
│ │
│ │ ── Level 8: Generalization ──
│ ├── SymmetricBayes.lean # General SBD: orbit bounds, trichotomy, exhaustiveness
│ │
│ │ ── Level 9: SBD Instances ──
│ ├── ModelSelection.lean # Model selection impossibility
│ ├── ModelSelectionDesignSpace.lean # Model selection design space
│ ├── CausalDiscovery.lean # Causal discovery impossibility (Markov equivalence)
│ ├── SBDInstances.lean # SBD instances + abstract aggregation
│ │
│ │ ── Level 10: Extensions ──
│ ├── ConditionalImpossibility.lean # Conditional SHAP impossibility + escape condition
│ ├── FairnessAudit.lean # Fairness audit impossibility (coin-flip audits)
│ ├── IntersectionalFairness.lean # Multi-attribute fairness impossibility
│ ├── FlipRate.lean # Exact GBDT flip rate, binary group = coin flip
│ │
│ │ ── Level 11: Bounds ──
│ ├── EnsembleBound.lean # DASH variance optimality + ensemble size (Titu's lemma)
│ ├── Efficiency.lean # SHAP efficiency amplification m/(m-1)
│ ├── AlphaFaithful.lean # alpha-faithfulness and approximate tradeoff
│ ├── UnfaithfulBound.lean # Unfaithfulness >= 1/2, ties optimal
│ ├── UnfaithfulQuantitative.lean # Pr(unfaithfulness) = 1/2 under DGP symmetry
│ ├── PathConvergence.lean # Relaxation path convergence
│ ├── QueryComplexity.lean # Query complexity Omega(sigma^2/Delta^2), Le Cam
│ ├── QueryComplexityParametric.lean # Parametric query complexity bounds
│ ├── QueryComplexityDerived.lean # Chebyshev-derived query complexity lower bound
│ ├── MutualInformation.lean # Information-theoretic impossibility bounds
│ ├── RobustnessLipschitz.lean # Lipschitz robustness bounds
│ ├── LocalSufficiency.lean # Local sufficiency conditions
│ │
│ │ ── Level 12: Universality ──
│ ├── RashomonUniversality.lean # Rashomon from symmetry via feature swap
│ ├── RashomonInevitability.lean # Impossibility is inescapable for standard ML
│ ├── LocalGlobal.lean # Local instability >= global instability
│ │
│ │ ── Level 13: Universal Framework ──
│ ├── ExplanationSystem.lean # Abstract explanation system (Theta -> Y -> H)
│ ├── Bilemma.lean # Strengthened impossibility for binary explanations
│ ├── BinaryQuantizer.lean # Binary quantizer capture fraction (decision stump)
│ ├── MechInterp.lean # Mechanistic interpretability impossibility
│ ├── BeyondBinary.lean # Extensions beyond binary explanation problems
│ │
│ │ ── Level 14: Optimality ──
│ ├── ParetoOptimality.lean # DASH Pareto-optimal over ALL methods
│ ├── BayesOptimalTie.lean # Bayes-optimality of ties for symmetric features
│ ├── LossPreservation.lean # Loss preservation for Rashomon-from-symmetry
│ ├── VarianceDerivation.lean # Derive Var(consensus) = Var(phi)/M from independence
│ │
│ │ ── Strengthening ──
│ ├── ProportionalityLocal.lean # Impossibility from per-model c only
│ ├── Qualitative.lean # Impossibility from 2 axioms: dominance + surjectivity
│ ├── ApproximateEquity.lean # Rashomon from bounded proportionality
│ ├── StumpProportionality.lean # Decision stump proportionality derivation
│ ├── Setup.lean # GBDTSetup structure bundling all axioms
│ │
│ │ ── Infrastructure ──
│ ├── Defs.lean # FeatureSpace, 6 axioms, stability/equity defs, Mathlib
│ ├── MeasureHypotheses.lean # Measure-theoretic definitions for probabilistic claims
│ ├── Consistency.lean # Axiom system consistency (Fin 4 construction)
│ ├── GaussianFlipRate.lean # Standard normal CDF, flip rate formula
│ ├── FIMImpossibility.lean # Gaussian FIM impossibility, Rashomon ellipsoid
│ ├── RandomForest.lean # Contrast case (documentation, no formal proofs)
│ └── Basic.lean # Import hub (all 58 files)
│
├── paper/
│ ├── main_definitive.tex # 82-page monograph (source of truth)
│ ├── main_jmlr.tex # 59-page JMLR submission (after NeurIPS)
│ ├── main.tex # 10-page NeurIPS version
│ ├── supplement.tex # 81-page NeurIPS supplement
│ ├── references.bib # 49 references
│ ├── FINDINGS_MAP.md # Complete 109-finding reference with tier classification
│ ├── scripts/ # 51 experiment scripts + utilities
│ │ ├── requirements.txt # Pinned Python dependencies
│ │ ├── axiom_consistency_model.py # Constructs Fin 4 model, 16/6 axioms PASS
│ │ ├── f1_f5_validation.py # multi-model Z-test + single-model screen diagnostics
│ │ ├── monte_carlo_flip_rate.py # 1K-trial Monte Carlo flip rate validation
│ │ ├── prevalence_survey.py # 77-dataset prevalence (68% unstable)
│ │ ├── prevalence_survey_openml.py # OpenML prevalence survey
│ │ ├── snr_calibration.py # SNR predicts flip rate (R^2=0.94 for SNR>=0.5)
│ │ ├── cross_implementation_validation.py # XGBoost/LightGBM/CatBoost comparison
│ │ ├── dash_vs_alternatives.py # DASH vs SAGE, PDP, permutation importance
│ │ ├── nn_shap_validation.py # Neural network SHAP instability (87%)
│ │ ├── llm_attention_instability.py # LLM attention instability under fine-tuning
│ │ ├── conditional_shap_threshold.py # Conditional escape threshold
│ │ ├── generate_figures.py # All figure generation
│ │ └── ... # 37 more scripts (see paper/scripts/)
│ ├── figures/ # 12 PDF figures
│ └── results_*.json # 33 JSON result files
│
├── docs/
│ ├── co-author-guide.md # Plain English onboarding for collaborators
│ ├── verification-audit.md # 32-item ranked human verification checklist
│ ├── self-verification-report.md # Machine-verified #print axioms results
│ └── ... # Assessment, audit, and planning documents
│
├── lakefile.toml # Lean build configuration (Mathlib dependency)
├── lean-toolchain # leanprover/lean4:v4.29.0-rc8
├── CLAUDE.md # AI development instructions and conventions
└── README.md # This file
| Paper | File | Pages | Target | Status |
|---|---|---|---|---|
| NeurIPS 2026 (primary submission) | paper/main.tex |
10 | NeurIPS 2026 | Submission-ready |
| NeurIPS supplement | paper/supplement.tex |
81 | NeurIPS supplement | Submission-ready |
| Monograph (source of truth) | paper/main_definitive.tex |
82 | arXiv | arXiv-ready |
| JMLR (expanded) | paper/main_jmlr.tex |
59 | JMLR (after NeurIPS) | Ready |
| arXiv preprint | paper/main_preprint.tex |
10 | arXiv (NeurIPS format) | Ready |
The NeurIPS paper (10 pages + 81-page supplement) covers both input-level and component-level attribution. The monograph (82 pages, including new TinyStories and mean ablation sections) is the definitive reference. The arXiv tarball is at paper/arxiv_monograph.tar.gz.
Edit flow: monograph → NeurIPS → JMLR. Always update the monograph first.
357 theorems. 6 axioms. 0 sorry. 58 files. 15 abstraction levels. 139 multi-step proofs (>=5 tactic lines).
The Lean formalization caught 2 logical inconsistencies and 1 type mismatch that survived informal review. The axiom consistency proof (a Fin 4 construction in Consistency.lean) demonstrates the axiom system is non-vacuous — there exists a concrete model satisfying all 6 axioms.
| Level | What It Proves | Files |
|---|---|---|
| 0 (pure logic) | attribution_impossibility — zero axiom deps |
Trilemma.lean |
| 1 (framework) | IterativeOptimizer connects to Rashomon | Iterative.lean |
| 2 (instantiation) | Model-specific impossibilities | General.lean, Lasso.lean, NeuralNet.lean |
| 3 (quantitative) | 1/(1-rho^2) divergence (pure algebra) | SplitGap.lean, Ratio.lean |
| 4 (Spearman) | Spearman from midranks, stability bounds (derived) | SpearmanDef.lean |
| 5 (resolution) | DASH equity, combined results | Corollary.lean, Impossibility.lean |
| 6 (design space) | Complete design space characterization | DesignSpace.lean, DesignSpaceFull.lean |
| 7 (derivation) | attribution_sum_symmetric derived from axioms |
SymmetryDerive.lean |
| 8 (generalization) | General SBD theorem | SymmetricBayes.lean |
| 9 (instances) | SBD applied to three domains | ModelSelection.lean, ModelSelectionDesignSpace.lean, CausalDiscovery.lean, SBDInstances.lean |
| 10 (extensions) | Conditional SHAP, fairness, intersectional fairness, flip rates | ConditionalImpossibility.lean, FairnessAudit.lean, IntersectionalFairness.lean, FlipRate.lean |
| 11 (bounds) | DASH optimality, efficiency, alpha-faithfulness, query complexity, mutual info, robustness, local sufficiency | EnsembleBound.lean, Efficiency.lean, AlphaFaithful.lean, UnfaithfulBound.lean, PathConvergence.lean, QueryComplexity.lean, QueryComplexityParametric.lean, MutualInformation.lean, RobustnessLipschitz.lean, LocalSufficiency.lean |
| 12 (universality) | Rashomon is inescapable, local >= global | RashomonUniversality.lean, RashomonInevitability.lean, LocalGlobal.lean |
| 13 (universal framework) | Abstract explanation system, bilemma for binary explanations, mechanistic interp impossibility | ExplanationSystem.lean, Bilemma.lean, BinaryQuantizer.lean, MechInterp.lean, BeyondBinary.lean |
| 14 (optimality) | Pareto-optimality of DASH, Bayes-optimal ties, loss preservation, variance derivation | ParetoOptimality.lean, BayesOptimalTie.lean, LossPreservation.lean, VarianceDerivation.lean |
| Strengthening | Per-model impossibility, qualitative impossibility, approximate equity, stump proportionality, bundled setup | ProportionalityLocal.lean, Qualitative.lean, ApproximateEquity.lean, StumpProportionality.lean, Setup.lean |
| Contrast | Bounded violations (documentation only) | RandomForest.lean |
The paper distinguishes four levels of verification:
- Proved: Lean theorem with zero domain-axiom dependencies (e.g.,
attribution_impossibility) - Derived: Lean theorem conditional on the axiom system (e.g.,
gbdt_impossibility) - Argued: Supplement proof, not formalized in Lean (e.g., random forest convergence rate)
- Empirical: Experiment with reproducible script (e.g., 48% flip rate on Breast Cancer)
6 axioms. Reduced from 16 by defining splitCount, attribution, testing_constant, and modelMeasurableSpace as definitions with derived properties. The core impossibility (Level 0) uses none of these — only the Rashomon property as a hypothesis.
| # | Lean Name | Category | Plain English |
|---|---|---|---|
| 1 | Model |
Type | "There exist trained models" |
| 2 | firstMover |
Function | "Each model has a dominant feature" |
| 3 | firstMover_surjective |
Domain | "Every feature can be the first-mover in some model" |
| 4 | crossGroupBaselineCore |
Domain | "Cross-group split counts depend only on group indices" |
| 5 | proportionalityConstant |
Domain | "Attribution = c * split count, with c > 0" |
| 6 | modelMeasure |
Measure | "Models have a probability distribution (training distribution)" |
Formerly axiomatized, now defined or derived (10 eliminated):
| Former axiom | Now | How |
|---|---|---|
splitCount |
def |
If-then-else on firstMover group membership |
splitCount_firstMover |
theorem | Derived from splitCount def |
splitCount_nonFirstMover |
theorem | Derived from splitCount def |
splitCount_crossGroup_symmetric |
theorem | Same crossGroupBaselineCore for same-group features |
splitCount_crossGroup_stable |
theorem | crossGroupBaselineCore depends only on group indices |
numTrees + numTrees_pos |
fields of FeatureSpace |
Bundled into structure (T, hT) |
attribution + proportionality_global |
def + theorem |
attribution := c * splitCount |
modelMeasurableSpace |
instance := ⊤ |
Discrete sigma-algebra (works for any type) |
testing_constant + testing_constant_pos |
def := 1/8 + norm_num |
Le Cam's value from Tsybakov 2009 |
Axiom stratification (verified by #print axioms):
| Theorem | Axioms Used |
|---|---|
attribution_impossibility (core) |
0 (only Rashomon as hypothesis) |
impossibility_qualitative |
0 (dominance + surjectivity as hypotheses) |
attribution_impossibility_bundled |
0 (fully parametric via GBDTSetup) |
gbdt_impossibility_local |
3 (firstMover, surjective, crossGroupBaselineCore) |
consensus_equity (DASH) |
5 (+ proportionalityConstant) |
| Full system (variance + query complexity) | 6 |
51 experiment scripts produce 33 JSON result files and 12 figures. 109 distinct findings are catalogued in paper/FINDINGS_MAP.md with tier classification.
| Experiment | Key Finding | Script |
|---|---|---|
| Synthetic Gaussian | 50% flip rate for rho=0.9, matches theory | generate_figures.py |
| Monte Carlo flip rate | 1K trials, all 8 rho values validated | monte_carlo_flip_rate.py |
| Breast Cancer (Wisconsin) | 48% flip rate for correlated features | f1_f5_validation.py |
| 11 real datasets | Z-test |r| > 0.89 across all datasets | real_world_validation.py |
| XGBoost/LightGBM/CatBoost | Instability is implementation-independent | cross_implementation_validation.py |
| Permutation importance | 91% of correlated pairs unstable | permutation_importance_validation.py |
| Neural networks | 87% unstable; KernelSHAP noise 11% vs model instability 87% | nn_shap_validation.py |
| P=500 scalability | Runs in 2.5 minutes, |r|=0.876 | high_dimensional_validation.py |
| DASH vs alternatives | DASH dominates SAGE, PDP, permutation importance | dash_vs_alternatives.py |
| Prevalence survey | 77 datasets, 68% exhibit instability | prevalence_survey.py |
| German Credit, Taiwan CC, Lending Club | Positive control: correlated financial features flip | financial_case_study.py |
| LLM attention | Instability under fine-tuning | llm_attention_instability.py |
| TinyStories Config A | ρ=0.565→0.972, 7/7 PASS, Cohen's d=5.4 | docs/tinystories-results-reference.json |
| TinyStories Config B | ρ=0.540→0.982, 7/7 PASS, Cohen's d=11.9 | docs/tinystories-results-reference.json |
| GPT-2 boundary | ρ=0.993, within-flip=0.043 (no Rashomon) | docs/tinystories-results-reference.json |
| Mean ablation robustness | G-inv ρ≈0.97-0.99, cross-method ρ≈0.5-0.6 | docs/mean-ablation-results-reference.json |
| Conditional SHAP sweep | Delta-beta threshold for escape | conditional_shap_threshold.py |
| SNR calibration | R^2=0.94 for SNR>=0.5 | snr_calibration.py |
| Causal DAG | Causal discovery instability | causal_dag_experiment.py |
| Longitudinal retraining | Time-series attribution stability | longitudinal_retraining.py |
All scripts use fixed random seeds and run on a standard laptop. Quick validation: make validate (~5 min). Full suite: make experiments.
- Toolchain:
leanprover/lean4:v4.29.0-rc8 - Build:
lake buildormake lean(~5 min cached, ~20 min first build) - Expected: 2886 jobs, 0 errors, some unused-variable linter warnings
- Requires TeX Live (jmlr.cls bundled in paper/)
- Build all:
make paper - Individual:
make jmlr,make neurips,make definitive,make supplement
- Python 3.9+
pip install -r paper/scripts/requirements.txt- Optional:
pip install lightgbm torch openml(for cross-impl, LLM, OpenML experiments)
4 GitHub Actions jobs on every push:
- Lean 4 Build —
lake build(2886 jobs) - Verify Theorem/Axiom Counts — grep-based count check
- Paper Compilation Check — pdflatex+bibtex for all 3 submission papers
- Axiom Consistency Model — Python numerical verification of all 6 axioms
Data scientist using SHAP. Your feature rankings for correlated features are unreliable. The instability is not noise or a software bug — it is a provable consequence of how gradient boosting interacts with collinearity. The fix is DASH: average SHAP values from multiple independently trained models. See the dash-shap companion package and the stability API in PR #255 for the single-model screen to Z-test (multi-model validation) to DASH (ensemble consensus) workflow.
Researcher in XAI or ML theory. This is a formally verified impossibility theorem with 357 Lean proofs and 0 sorry. The Symmetric Bayes Dichotomy (Section 6 of the definitive paper) is a general proof technique from invariant decision theory that applies to any symmetric decision problem — we demonstrate it on feature attribution, model selection, and causal discovery under Markov equivalence. The Design Space Theorem characterizes the full achievable set.
Regulator or model risk officer. Single-model SHAP explanations are provably unreliable under collinearity. In a survey of 77 public datasets, 68% exhibit attribution instability. This affects EU AI Act Art. 13(3)(b)(ii) requirements for disclosing "known and foreseeable circumstances" affecting accuracy, and SR 11-7 model risk management compliance. The paper provides disclosure templates and a diagnostic workflow.
Lean or Mathlib community. 357 theorems across 58 files, 15 abstraction levels, using MulAction for orbit bounds, ProbabilityTheory.cdf for the Gaussian flip rate, and Analysis.Calculus for the FIM impossibility. The Gaussian CDF symmetry proofs (phi(0)=1/2, phi(-x)=1-phi(x)) via NoAtoms + prob_compl_eq_one_sub may be of independent interest. The axiom consistency proof constructs a Fin 4 model satisfying all 6 axioms.
- No
sorrywithout a-- TODO:comment - No
autoImplicit true— all variables explicit - Verify counts before committing:
make verify - Keep all 4 papers synchronized (monograph first)
- Do NOT commit paper changes without running the verification block
- Write the theorem in the appropriate .lean file
lake buildto verify- Add to the monograph (main_definitive.tex) — cite by
\texttt{name} - Add to the cross-reference table in the Lean section
- Update paper/FINDINGS_MAP.md
- Propagate to JMLR and NeurIPS papers as appropriate
- Write script in paper/scripts/
- Run and save results to paper/results_*.json
- Add subsection to monograph with key numbers
- Add to experiment summary table
- Update paper/FINDINGS_MAP.md
- Propagate to JMLR paper
Theorems+lemmas: 357
Axioms: 6
Sorry: 0
Files: 58
Verify: make registry-check (30 automated checks) or python3 paper/scripts/verify_registry.py.
@article{caraker2026attribution,
title={The Attribution Impossibility: No Feature Ranking Is Faithful, Stable,
and Complete Under Collinearity},
author={Caraker, Drake and Arnold, Bryan and Rhoads, David},
year={2026},
doi={10.5281/zenodo.19468379},
url={https://doi.org/10.5281/zenodo.19468379},
note={Lean 4 formalization: \url{https://github.com/DrakeCaraker/dash-impossibility-lean}}
}What is SHAP? SHAP (SHapley Additive exPlanations) assigns each feature an importance score for a model's prediction, based on cooperative game theory. It is the most widely used feature attribution method. See Lundberg & Lee, 2017.
What is Lean 4? A programming language and interactive theorem prover. You write mathematical proofs as code, and the computer checks every step. Think of it as a compiler for math. See leanprover.github.io.
What does "sorry" mean in Lean? It is a placeholder that says "trust me, this is true" without providing a proof — the Lean equivalent of a TODO. This project has 0 sorry: every claim is machine-verified.
Is DASH just averaging? Yes — but with provable optimality guarantees. DASH computes the mean |SHAP| across M independently trained models. The paper proves this is the minimum-variance unbiased linear estimator (via Cauchy-Schwarz / Titu's lemma; Lean-verified) and gives the tight ensemble size formula M_min = ceil(2.71 * sigma^2 / Delta^2). "Just averaging" is like saying OLS is "just matrix inversion." The value is in understanding exactly when and why it is optimal.
Does this apply to my dataset?
Check two things: (1) Do you have features with |correlation| > 0.5? (2) Do those features have similar importance? If both yes, your SHAP rankings for those features are unreliable. Adapt paper/scripts/f1_f5_validation.py to your data to check.
What about conditional SHAP?
It does not help when features have equal causal effects (beta_j = beta_k). The impossibility extends to conditional attributions in that case. When causal effects differ (Delta-beta > ~0.2), conditional SHAP can resolve the instability. See ConditionalImpossibility.lean.
How does this relate to Arrow's theorem? Same structure: Arrow proves no voting system can satisfy IIA + Pareto + non-dictatorship simultaneously. We prove no ranking can satisfy faithfulness + stability + completeness simultaneously. Both are resolved by allowing ties (partial orders). Our relaxation paths converge (unlike Arrow's, which diverge) — reflecting the DGP symmetry that Arrow's heterogeneous voters lack.
What is the bilemma?
For binary explanation problems (SHAP sign positive/negative, feature selected/not, circuit A or B), the impossibility is strictly stronger than the trilemma: faithful + stable alone is impossible. There is no neutral element (like a tie) to serve as a resolution. The only fix is to enrich the explanation space with equivalence classes. See Bilemma.lean.
What is the mechanistic interpretability instance?
We prove that no circuit explanation of a neural network can be simultaneously faithful and stable (the bilemma applies). Meloux et al. (ICLR 2025) found 85 valid circuits for a simple XOR task — the Rashomon property holds with extreme force for circuits. The resolution: report circuit equivalence classes, not individual decompositions. See MechInterp.lean.
What is the universal framework?
The ExplanationSystem abstraction (ExplanationSystem.lean) generalizes beyond feature attribution. Any system with configurations (Theta), observables (Y), and explanations (H) faces the same trilemma when the Rashomon property holds. The attribution impossibility is Instance 1; mechanistic interpretability is another instance. The NeurIPS universal paper (neurips_universal.tex) presents the full framework.
What is the relationship to dash-shap?
This repository contains the theory (Lean proofs + paper). dash-shap is the Python implementation: DASH pipeline, experiments, diagnostics. The stability API (PR #255) provides screen(), validate(), consensus(), report().
- CLAUDE.md — AI agent conventions, project rules, axiom inventory, "Do NOT" list
- paper/FINDINGS_MAP.md — Complete 109-finding reference with tier classification
- docs/co-author-guide.md — Co-author onboarding, verification checklist, timeline
- Companion code — dash-shap Python package (stability API in PR #255)
- Bilodeau et al. (2024): Proved completeness + linearity cannot coexist; we address stability instead and provide a constructive resolution.
- Chouldechova (2017): Proved calibration + balance + equal FPR cannot coexist when base rates differ; our trilemma is the explainability analogue.
- Arrow (1951): Proved IIA + Pareto + non-dictatorship cannot coexist; same structural template, different domain.
- Nipkow (2009): Formalized Arrow's theorem in Isabelle/HOL; we extend this tradition to explainable AI in Lean 4.
- Zhang et al. (2026): Formalized statistical learning theory in Lean 4; complementary Lean formalization work.
Drake Caraker — conceptualization, Lean formalization, experiments, software Bryan Arnold — [role] David Rhoads — [role]
Independent researchers. Contact: drakecaraker@gmail.com
Paper: "The Attribution Impossibility: No Importance Ranking Is Faithful, Stable, and Complete Under Symmetry"
Primary target: NeurIPS 2026 (abstract May 4, paper May 6) | Expanded: JMLR (after NeurIPS decision)
arXiv: Monograph at paper/arxiv_monograph.tar.gz (82 pages). Categories: cs.LG, cs.AI, stat.ML, cs.LO.