The Attribution Impossibility

No importance ranking can be simultaneously faithful, stable, and complete under symmetry — at any level of a model, from input features to internal circuits — and we prove it in Lean 4.

If you have ever retrained an XGBoost model and noticed the "most important feature" changed, this paper proves that is not a bug — it is a mathematical inevitability. The same instability appears inside neural networks: 10 independently trained transformers on TinyStories agree at only ρ=0.54 on which circuit component matters most. The instability is not in SHAP, not in activation patching — it is in the mathematics of attribution itself.

Input-level: On Breast Cancer Wisconsin, 50 retrains produce 24 distinct top-3 rankings (4.2% agreement). Under standard regularization, 45% of German Credit applicants receive a different "most important feature" across equivalent models. 68% of 77 public datasets show instability.

Component-level: A G-invariant projection (orbit averaging over the architectural symmetry group) lifts circuit agreement from ρ=0.54 to ρ=0.98, with 14/14 theory-derived predictions confirmed across two architectural scales. A fine-tuned GPT-2 confirms the boundary: without Rashomon diversity, instability vanishes (ρ=0.993).

For binary attribution questions (SHAP sign, feature selection, circuit membership), faithful + stable alone is impossible (the bilemma). The design space has exactly two families; DASH (orbit averaging) is Pareto-optimal. A 7-line nonparametric diagnostic outperforms the Gaussian formula by 2×. Everything is machine-verified in Lean 4: 357 theorems across 58 files with 6 axioms and 0 sorry.

NeurIPS 2026 submission (10 pages + 81-page supplement). Monograph: 82 pages. arXiv preprint forthcoming.

Quick Start

# Full setup (Lean + Python)
make setup

# Build Lean proofs (~5 min cached)
make lean

# Compile all 4 paper versions
make paper

# Verify counts match papers
make verify

# Run key validation experiments (~5 min)
make validate

What This Proves

1. The Problem

Attribution rankings flip when you retrain a model — at every level.

Input-level: SHAP feature rankings flip for collinear features with similar importance. The ranking is literally a coin flip — 50% of the time, the "most important" feature changes. In a survey of 77 public datasets, 68% exhibit this instability. This applies to XGBoost, LightGBM, CatBoost, Lasso, neural networks, and permutation importance alike.

Component-level: Activation patching importance rankings flip for architecturally symmetric components (heads within a layer). 10 independently trained transformers on TinyStories agree at only ρ=0.54. All 4 heads in layer 0 appear as "most important" across 10 seeds — full S₄ symmetry realized.

The root cause is the Rashomon property: when components are interchangeable (collinear features, or architecturally symmetric heads), multiple configurations exist that rank them in opposite orders. The fix is the same at both levels: orbit averaging — aggregate across the symmetry to recover what is shared. For features, this is DASH (ensemble averaging). For circuits, this is the G-invariant projection.

2. The Impossibility (Theorem 1)

No importance ranking — of input features or internal components — can be simultaneously faithful (reflect what the model computed), stable (robust to retraining), and complete (decide every pair). Faithful, Stable, Complete: Pick Two.

                       The Attribution Trilemma

       Faithful ────────── IMPOSSIBLE ────────── Stable
           │                    X                    │
           │              (Theorem 1)                │
           │                                         │
       Family A                                 Family B
       (single model)                           (DASH ensemble)
       + Faithful, Complete                     + Faithful, Stable
       - Unstable (50% flips)                   - Within-group ties

Lean: attribution_impossibility in Trilemma.lean
Axiom dependencies: Zero. The proof uses only the Rashomon property as a hypothesis.
Proof: A 4-line contradiction. If a ranking is faithful, it must agree with each model. But collinear features admit models ranking them in opposite orders (Rashomon). So the ranking cannot agree with both, violating stability. If stable, it cannot be faithful to all models. Completeness forces the choice.

This is the paper's center of gravity. Everything else quantifies, extends, or resolves this result.

3. How Bad Is It? (Quantitative Bounds)

The impossibility is not a mild nuisance. The attribution ratio — how much the dominant feature is overweighted relative to the true contribution — diverges or explodes depending on the model class:

Model Class	Ratio	Behavior	Lean File
GBDT	1/(1-rho^2)	Diverges to infinity as rho approaches 1	`Ratio.lean`, `SplitGap.lean`
Lasso	Infinity	One feature gets everything, the other gets zero	`Lasso.lean`
Neural networks	Conditional	Depends on which feature the network "captures"	`NeuralNet.lean`
Random forests	O(1/sqrt(T))	Converges (bounded violations) — the contrast case	`RandomForest.lean`

Key theorems:

ratio_tendsto_atTop in Ratio.lean — the attribution ratio tends to infinity as correlation approaches 1
split_gap_exact in SplitGap.lean — the exact split gap formula (pure algebra)
lasso_impossibility in Lasso.lean — Lasso attribution ratio is infinite
nn_impossibility in NeuralNet.lean — neural net impossibility conditional on captured feature

The GBDT ratio 1/(1-rho^2) means that at rho=0.9, the dominant feature gets approximately 5.3x its fair share. At rho=0.99, it gets approximately 50x. The flip rate for binary collinear groups is exactly 1/2 — a literal coin flip (binary_group_flip_rate in FlipRate.lean).

4. The Fix: DASH

The impossibility at both levels admits the same resolution: orbit averaging.

Input-level: DASH (Diversified Aggregation for Stable Hypotheses): average |SHAP| values across M independently trained models. Provably the minimum-variance unbiased estimator via Rao-Blackwell (Lean-verified).

Component-level: G-invariant projection. For a network with symmetry group G = S_k^L, average activation patching importance within orbits (heads in the same layer). Lifts agreement from ρ=0.54 to ρ=0.98 on TinyStories. No retraining needed — just a post-processing step on a single model.

consensus_equity in Corollary.lean — DASH achieves equity (zero unfaithfulness) for collinear feature pairs in balanced ensembles
sum_squares_ge_inv_M in EnsembleBound.lean — Cauchy-Schwarz optimality via Titu's lemma (the sum of squares of weights is minimized by equal weights)
Variance decreases as 1/M, with tight ensemble size formula: M_min = ceil(2.71 * sigma^2 / Delta^2)

DASH is Pareto-optimal: no attribution method achieves better stability without sacrificing more faithfulness. The proof is in DesignSpace.lean. The tradeoff: DASH produces ties for within-group features (it cannot distinguish features that are genuinely interchangeable), but ranks between-group features faithfully and stably.

5. The Complete Map (Design Space Theorem)

The achievable set of attribution methods under symmetry has exactly two families among deterministic methods — and nothing else:

                    The Attribution Design Space

    ┌──────────────────────────────────────────────────────┐
    │                                                      │
    │   Family A (single-model methods)                    │
    │   + Faithful + Complete                              │
    │   - Unstable: 50% flip rate for within-group pairs   │
    │   Examples: TreeSHAP, KernelSHAP, permutation imp.  │
    │                                                      │
    ├──────────────────────────────────────────────────────┤
    │                                                      │
    │   Family B (DASH ensemble methods)                   │
    │   + Faithful + Stable (variance -> 0 as M -> inf)    │
    │   - Incomplete: within-group ties                    │
    │   Example: DASH with M >= M_min models               │
    │                                                      │
    └──────────────────────────────────────────────────────┘

    No third family exists. Every attribution method falls into A or B.

design_space_theorem in DesignSpace.lean — the design space characterization
family_a_or_family_b in DesignSpaceFull.lean — exhaustiveness proof

Both relaxation paths (drop completeness, drop stability) converge to the same solution: DASH. This convergence is itself proved (relaxation_paths_converge in PathConvergence.lean) and contrasts with Arrow's theorem, where relaxation paths diverge.

6. It Is Not Just SHAP (Symmetric Bayes Dichotomy)

The impossibility is an instance of a general pattern: any symmetric decision problem admits exactly two strategy families — the faithful-but-unstable individual strategy and the stable-but-tied aggregate strategy. This is the Symmetric Bayes Dichotomy (SBD), a reusable proof technique from invariant decision theory.

symmetric_bayes_dichotomy in SymmetricBayes.lean — the general theorem with orbit bounds via MulAction

Three verified instances, each with a different symmetry group:

Instance	Symmetry Group	Impossibility	Lean File
Feature attribution	S_2 transpositions	Cannot rank collinear features	`Trilemma.lean`
Model selection	S_2 permutations	Cannot select among Rashomon-equivalent models	`ModelSelection.lean`
Causal discovery	CPDAG automorphisms	Cannot orient edges in Markov equivalence class	`CausalDiscovery.lean`

7. Extensions

Each extension is a self-contained theorem in its own Lean file:

Extension	Key Result	Lean File
Conditional SHAP	Cannot escape when beta_j = beta_k; escapes when Delta-beta > ~0.2	`ConditionalImpossibility.lean`
Fairness audits	SHAP-based proxy audits are coin flips; (1/2)^K intersectional compounding	`FairnessAudit.lean`
Intersectional fairness	Multi-attribute fairness impossibility	`IntersectionalFairness.lean`
Fisher information	Independent proof path via FIM; Rashomon ellipsoid	`FIMImpossibility.lean`
Query complexity	Omega(sigma^2/Delta^2) queries needed; Le Cam lower bound	`QueryComplexity.lean`
Query complexity (parametric)	Parametric query complexity bounds	`QueryComplexityParametric.lean`
Rashomon inevitability	Impossibility is inescapable for standard ML under symmetry	`RashomonInevitability.lean`
Local vs global	Local instability >= global instability	`LocalGlobal.lean`
Flip rate	Exact GBDT flip rate; binary group = coin flip	`FlipRate.lean`
Efficiency	SHAP efficiency amplification factor m/(m-1)	`Efficiency.lean`
Alpha-faithfulness	Bound on alpha-faithful attribution	`AlphaFaithful.lean`
Unfaithfulness bound	Unfaithfulness >= 1/2 for complete rankings; ties are optimal	`UnfaithfulBound.lean`
Rashomon universality	Rashomon property from symmetry via feature swap	`RashomonUniversality.lean`
Gaussian flip rate	Standard normal CDF, phi(0)=1/2, flip rate from Gaussian noise	`GaussianFlipRate.lean`
Mutual information	Information-theoretic impossibility bounds	`MutualInformation.lean`
Robustness (Lipschitz)	Lipschitz robustness bounds for attribution	`RobustnessLipschitz.lean`
Local sufficiency	Local sufficiency conditions for attribution	`LocalSufficiency.lean`
Stump proportionality	Decision stump proportionality derivation	`StumpProportionality.lean`
Model selection design space	Design space for model selection (SBD instance)	`ModelSelectionDesignSpace.lean`

Repository Structure

dash-impossibility-lean/
│
├── PROJECT_REGISTRY.md                  # Complete catalogue: every proof, experiment, claim
├── DASHImpossibility/                    # 58 Lean 4 files, 357 theorems, 6 axioms, 0 sorry
│   │
│   │  ── Level 0: Pure Logic ──
│   ├── Trilemma.lean                     # attribution_impossibility (zero axiom deps, 4-line proof)
│   │
│   │  ── Level 1: Framework ──
│   ├── Iterative.lean                    # IterativeOptimizer connects to Rashomon property
│   │
│   │  ── Level 2: Model Instantiation ──
│   ├── General.lean                      # GBDT instance, gbdt_impossibility, gbdtOptimizer
│   ├── Lasso.lean                        # lasso_impossibility (ratio = infinity)
│   ├── NeuralNet.lean                    # nn_impossibility (conditional on captured feature)
│   │
│   │  ── Level 3: Quantitative Bounds ──
│   ├── SplitGap.lean                     # split_gap_exact, split_gap_ge_half (pure algebra)
│   ├── Ratio.lean                        # attribution_ratio = 1/(1-rho^2), ratio_tendsto_atTop
│   │
│   │  ── Level 4: Spearman ──
│   ├── SpearmanDef.lean                  # Spearman from midranks, spearman_instability_bound (derived)
│   │
│   │  ── Level 5: Resolution ──
│   ├── Corollary.lean                    # DASH consensus equity, variance convergence
│   ├── Impossibility.lean                # Combined: equity violation + stability bound
│   │
│   │  ── Level 6: Design Space ──
│   ├── DesignSpace.lean                  # design_space_theorem, DASH ties, Pareto structure
│   ├── DesignSpaceFull.lean              # family_a_or_family_b (exhaustiveness)
│   │
│   │  ── Level 7: Derivation ──
│   ├── SymmetryDerive.lean               # attribution_sum_symmetric (DERIVED from axioms)
│   │
│   │  ── Level 8: Generalization ──
│   ├── SymmetricBayes.lean               # General SBD: orbit bounds, trichotomy, exhaustiveness
│   │
│   │  ── Level 9: SBD Instances ──
│   ├── ModelSelection.lean               # Model selection impossibility
│   ├── ModelSelectionDesignSpace.lean     # Model selection design space
│   ├── CausalDiscovery.lean              # Causal discovery impossibility (Markov equivalence)
│   ├── SBDInstances.lean                 # SBD instances + abstract aggregation
│   │
│   │  ── Level 10: Extensions ──
│   ├── ConditionalImpossibility.lean     # Conditional SHAP impossibility + escape condition
│   ├── FairnessAudit.lean                # Fairness audit impossibility (coin-flip audits)
│   ├── IntersectionalFairness.lean       # Multi-attribute fairness impossibility
│   ├── FlipRate.lean                     # Exact GBDT flip rate, binary group = coin flip
│   │
│   │  ── Level 11: Bounds ──
│   ├── EnsembleBound.lean                # DASH variance optimality + ensemble size (Titu's lemma)
│   ├── Efficiency.lean                   # SHAP efficiency amplification m/(m-1)
│   ├── AlphaFaithful.lean               # alpha-faithfulness and approximate tradeoff
│   ├── UnfaithfulBound.lean              # Unfaithfulness >= 1/2, ties optimal
│   ├── UnfaithfulQuantitative.lean       # Pr(unfaithfulness) = 1/2 under DGP symmetry
│   ├── PathConvergence.lean              # Relaxation path convergence
│   ├── QueryComplexity.lean              # Query complexity Omega(sigma^2/Delta^2), Le Cam
│   ├── QueryComplexityParametric.lean    # Parametric query complexity bounds
│   ├── QueryComplexityDerived.lean       # Chebyshev-derived query complexity lower bound
│   ├── MutualInformation.lean            # Information-theoretic impossibility bounds
│   ├── RobustnessLipschitz.lean          # Lipschitz robustness bounds
│   ├── LocalSufficiency.lean             # Local sufficiency conditions
│   │
│   │  ── Level 12: Universality ──
│   ├── RashomonUniversality.lean         # Rashomon from symmetry via feature swap
│   ├── RashomonInevitability.lean        # Impossibility is inescapable for standard ML
│   ├── LocalGlobal.lean                  # Local instability >= global instability
│   │
│   │  ── Level 13: Universal Framework ──
│   ├── ExplanationSystem.lean            # Abstract explanation system (Theta -> Y -> H)
│   ├── Bilemma.lean                      # Strengthened impossibility for binary explanations
│   ├── BinaryQuantizer.lean              # Binary quantizer capture fraction (decision stump)
│   ├── MechInterp.lean                   # Mechanistic interpretability impossibility
│   ├── BeyondBinary.lean                 # Extensions beyond binary explanation problems
│   │
│   │  ── Level 14: Optimality ──
│   ├── ParetoOptimality.lean             # DASH Pareto-optimal over ALL methods
│   ├── BayesOptimalTie.lean              # Bayes-optimality of ties for symmetric features
│   ├── LossPreservation.lean             # Loss preservation for Rashomon-from-symmetry
│   ├── VarianceDerivation.lean           # Derive Var(consensus) = Var(phi)/M from independence
│   │
│   │  ── Strengthening ──
│   ├── ProportionalityLocal.lean         # Impossibility from per-model c only
│   ├── Qualitative.lean                  # Impossibility from 2 axioms: dominance + surjectivity
│   ├── ApproximateEquity.lean            # Rashomon from bounded proportionality
│   ├── StumpProportionality.lean         # Decision stump proportionality derivation
│   ├── Setup.lean                        # GBDTSetup structure bundling all axioms
│   │
│   │  ── Infrastructure ──
│   ├── Defs.lean                         # FeatureSpace, 6 axioms, stability/equity defs, Mathlib
│   ├── MeasureHypotheses.lean            # Measure-theoretic definitions for probabilistic claims
│   ├── Consistency.lean                  # Axiom system consistency (Fin 4 construction)
│   ├── GaussianFlipRate.lean             # Standard normal CDF, flip rate formula
│   ├── FIMImpossibility.lean             # Gaussian FIM impossibility, Rashomon ellipsoid
│   ├── RandomForest.lean                 # Contrast case (documentation, no formal proofs)
│   └── Basic.lean                        # Import hub (all 58 files)
│
├── paper/
│   ├── main_definitive.tex               # 82-page monograph (source of truth)
│   ├── main_jmlr.tex                     # 59-page JMLR submission (after NeurIPS)
│   ├── main.tex                          # 10-page NeurIPS version
│   ├── supplement.tex                    # 81-page NeurIPS supplement
│   ├── references.bib                    # 49 references
│   ├── FINDINGS_MAP.md                   # Complete 109-finding reference with tier classification
│   ├── scripts/                          # 51 experiment scripts + utilities
│   │   ├── requirements.txt              # Pinned Python dependencies
│   │   ├── axiom_consistency_model.py    # Constructs Fin 4 model, 16/6 axioms PASS
│   │   ├── f1_f5_validation.py           # multi-model Z-test + single-model screen diagnostics
│   │   ├── monte_carlo_flip_rate.py      # 1K-trial Monte Carlo flip rate validation
│   │   ├── prevalence_survey.py          # 77-dataset prevalence (68% unstable)
│   │   ├── prevalence_survey_openml.py   # OpenML prevalence survey
│   │   ├── snr_calibration.py            # SNR predicts flip rate (R^2=0.94 for SNR>=0.5)
│   │   ├── cross_implementation_validation.py  # XGBoost/LightGBM/CatBoost comparison
│   │   ├── dash_vs_alternatives.py       # DASH vs SAGE, PDP, permutation importance
│   │   ├── nn_shap_validation.py         # Neural network SHAP instability (87%)
│   │   ├── llm_attention_instability.py  # LLM attention instability under fine-tuning
│   │   ├── conditional_shap_threshold.py # Conditional escape threshold
│   │   ├── generate_figures.py           # All figure generation
│   │   └── ...                           # 37 more scripts (see paper/scripts/)
│   ├── figures/                          # 12 PDF figures
│   └── results_*.json                    # 33 JSON result files
│
├── docs/
│   ├── co-author-guide.md                # Plain English onboarding for collaborators
│   ├── verification-audit.md             # 32-item ranked human verification checklist
│   ├── self-verification-report.md       # Machine-verified #print axioms results
│   └── ...                               # Assessment, audit, and planning documents
│
├── lakefile.toml                         # Lean build configuration (Mathlib dependency)
├── lean-toolchain                        # leanprover/lean4:v4.29.0-rc8
├── CLAUDE.md                             # AI development instructions and conventions
└── README.md                             # This file

Paper Versions

Paper	File	Pages	Target	Status
NeurIPS 2026 (primary submission)	`paper/main.tex`	10	NeurIPS 2026	Submission-ready
NeurIPS supplement	`paper/supplement.tex`	81	NeurIPS supplement	Submission-ready
Monograph (source of truth)	`paper/main_definitive.tex`	82	arXiv	arXiv-ready
JMLR (expanded)	`paper/main_jmlr.tex`	59	JMLR (after NeurIPS)	Ready
arXiv preprint	`paper/main_preprint.tex`	10	arXiv (NeurIPS format)	Ready

The NeurIPS paper (10 pages + 81-page supplement) covers both input-level and component-level attribution. The monograph (82 pages, including new TinyStories and mean ablation sections) is the definitive reference. The arXiv tarball is at paper/arxiv_monograph.tar.gz.

Edit flow: monograph → NeurIPS → JMLR. Always update the monograph first.

Proof Architecture

357 theorems. 6 axioms. 0 sorry. 58 files. 15 abstraction levels. 139 multi-step proofs (>=5 tactic lines).

The Lean formalization caught 2 logical inconsistencies and 1 type mismatch that survived informal review. The axiom consistency proof (a Fin 4 construction in Consistency.lean) demonstrates the axiom system is non-vacuous — there exists a concrete model satisfying all 6 axioms.

Level	What It Proves	Files
0 (pure logic)	`attribution_impossibility` — zero axiom deps	`Trilemma.lean`
1 (framework)	IterativeOptimizer connects to Rashomon	`Iterative.lean`
2 (instantiation)	Model-specific impossibilities	`General.lean`, `Lasso.lean`, `NeuralNet.lean`
3 (quantitative)	1/(1-rho^2) divergence (pure algebra)	`SplitGap.lean`, `Ratio.lean`
4 (Spearman)	Spearman from midranks, stability bounds (derived)	`SpearmanDef.lean`
5 (resolution)	DASH equity, combined results	`Corollary.lean`, `Impossibility.lean`
6 (design space)	Complete design space characterization	`DesignSpace.lean`, `DesignSpaceFull.lean`
7 (derivation)	`attribution_sum_symmetric` derived from axioms	`SymmetryDerive.lean`
8 (generalization)	General SBD theorem	`SymmetricBayes.lean`
9 (instances)	SBD applied to three domains	`ModelSelection.lean`, `ModelSelectionDesignSpace.lean`, `CausalDiscovery.lean`, `SBDInstances.lean`
10 (extensions)	Conditional SHAP, fairness, intersectional fairness, flip rates	`ConditionalImpossibility.lean`, `FairnessAudit.lean`, `IntersectionalFairness.lean`, `FlipRate.lean`
11 (bounds)	DASH optimality, efficiency, alpha-faithfulness, query complexity, mutual info, robustness, local sufficiency	`EnsembleBound.lean`, `Efficiency.lean`, `AlphaFaithful.lean`, `UnfaithfulBound.lean`, `PathConvergence.lean`, `QueryComplexity.lean`, `QueryComplexityParametric.lean`, `MutualInformation.lean`, `RobustnessLipschitz.lean`, `LocalSufficiency.lean`
12 (universality)	Rashomon is inescapable, local >= global	`RashomonUniversality.lean`, `RashomonInevitability.lean`, `LocalGlobal.lean`
13 (universal framework)	Abstract explanation system, bilemma for binary explanations, mechanistic interp impossibility	`ExplanationSystem.lean`, `Bilemma.lean`, `BinaryQuantizer.lean`, `MechInterp.lean`, `BeyondBinary.lean`
14 (optimality)	Pareto-optimality of DASH, Bayes-optimal ties, loss preservation, variance derivation	`ParetoOptimality.lean`, `BayesOptimalTie.lean`, `LossPreservation.lean`, `VarianceDerivation.lean`
Strengthening	Per-model impossibility, qualitative impossibility, approximate equity, stump proportionality, bundled setup	`ProportionalityLocal.lean`, `Qualitative.lean`, `ApproximateEquity.lean`, `StumpProportionality.lean`, `Setup.lean`
Contrast	Bounded violations (documentation only)	`RandomForest.lean`

Proof Status Transparency

The paper distinguishes four levels of verification:

Proved: Lean theorem with zero domain-axiom dependencies (e.g., attribution_impossibility)
Derived: Lean theorem conditional on the axiom system (e.g., gbdt_impossibility)
Argued: Supplement proof, not formalized in Lean (e.g., random forest convergence rate)
Empirical: Experiment with reproducible script (e.g., 48% flip rate on Breast Cancer)

Axiom System

6 axioms. Reduced from 16 by defining splitCount, attribution, testing_constant, and modelMeasurableSpace as definitions with derived properties. The core impossibility (Level 0) uses none of these — only the Rashomon property as a hypothesis.

#	Lean Name	Category	Plain English
1	`Model`	Type	"There exist trained models"
2	`firstMover`	Function	"Each model has a dominant feature"
3	`firstMover_surjective`	Domain	"Every feature can be the first-mover in some model"
4	`crossGroupBaselineCore`	Domain	"Cross-group split counts depend only on group indices"
5	`proportionalityConstant`	Domain	"Attribution = c * split count, with c > 0"
6	`modelMeasure`	Measure	"Models have a probability distribution (training distribution)"

Formerly axiomatized, now defined or derived (10 eliminated):

Former axiom	Now	How
`splitCount`	`def`	If-then-else on firstMover group membership
`splitCount_firstMover`	theorem	Derived from splitCount def
`splitCount_nonFirstMover`	theorem	Derived from splitCount def
`splitCount_crossGroup_symmetric`	theorem	Same crossGroupBaselineCore for same-group features
`splitCount_crossGroup_stable`	theorem	crossGroupBaselineCore depends only on group indices
`numTrees` + `numTrees_pos`	fields of `FeatureSpace`	Bundled into structure (T, hT)
`attribution` + `proportionality_global`	`def` + theorem	`attribution := c * splitCount`
`modelMeasurableSpace`	`instance := ⊤`	Discrete sigma-algebra (works for any type)
`testing_constant` + `testing_constant_pos`	`def := 1/8` + `norm_num`	Le Cam's value from Tsybakov 2009

Axiom stratification (verified by #print axioms):

Theorem	Axioms Used
`attribution_impossibility` (core)	0 (only Rashomon as hypothesis)
`impossibility_qualitative`	0 (dominance + surjectivity as hypotheses)
`attribution_impossibility_bundled`	0 (fully parametric via GBDTSetup)
`gbdt_impossibility_local`	3 (firstMover, surjective, crossGroupBaselineCore)
`consensus_equity` (DASH)	5 (+ proportionalityConstant)
Full system (variance + query complexity)	6

Experiments & Results

51 experiment scripts produce 33 JSON result files and 12 figures. 109 distinct findings are catalogued in paper/FINDINGS_MAP.md with tier classification.

Experiment	Key Finding	Script
Synthetic Gaussian	50% flip rate for rho=0.9, matches theory	`generate_figures.py`
Monte Carlo flip rate	1K trials, all 8 rho values validated	`monte_carlo_flip_rate.py`
Breast Cancer (Wisconsin)	48% flip rate for correlated features	`f1_f5_validation.py`
11 real datasets	Z-test \|r\| > 0.89 across all datasets	`real_world_validation.py`
XGBoost/LightGBM/CatBoost	Instability is implementation-independent	`cross_implementation_validation.py`
Permutation importance	91% of correlated pairs unstable	`permutation_importance_validation.py`
Neural networks	87% unstable; KernelSHAP noise 11% vs model instability 87%	`nn_shap_validation.py`
P=500 scalability	Runs in 2.5 minutes, \|r\|=0.876	`high_dimensional_validation.py`
DASH vs alternatives	DASH dominates SAGE, PDP, permutation importance	`dash_vs_alternatives.py`
Prevalence survey	77 datasets, 68% exhibit instability	`prevalence_survey.py`
German Credit, Taiwan CC, Lending Club	Positive control: correlated financial features flip	`financial_case_study.py`
LLM attention	Instability under fine-tuning	`llm_attention_instability.py`
TinyStories Config A	ρ=0.565→0.972, 7/7 PASS, Cohen's d=5.4	docs/tinystories-results-reference.json
TinyStories Config B	ρ=0.540→0.982, 7/7 PASS, Cohen's d=11.9	docs/tinystories-results-reference.json
GPT-2 boundary	ρ=0.993, within-flip=0.043 (no Rashomon)	docs/tinystories-results-reference.json
Mean ablation robustness	G-inv ρ≈0.97-0.99, cross-method ρ≈0.5-0.6	docs/mean-ablation-results-reference.json
Conditional SHAP sweep	Delta-beta threshold for escape	`conditional_shap_threshold.py`
SNR calibration	R^2=0.94 for SNR>=0.5	`snr_calibration.py`
Causal DAG	Causal discovery instability	`causal_dag_experiment.py`
Longitudinal retraining	Time-series attribution stability	`longitudinal_retraining.py`

All scripts use fixed random seeds and run on a standard laptop. Quick validation: make validate (~5 min). Full suite: make experiments.

Building

Lean 4

Toolchain: leanprover/lean4:v4.29.0-rc8
Build: lake build or make lean (~5 min cached, ~20 min first build)
Expected: 2886 jobs, 0 errors, some unused-variable linter warnings

LaTeX

Requires TeX Live (jmlr.cls bundled in paper/)
Build all: make paper
Individual: make jmlr, make neurips, make definitive, make supplement

Python

Python 3.9+
pip install -r paper/scripts/requirements.txt
Optional: pip install lightgbm torch openml (for cross-impl, LLM, OpenML experiments)

CI Pipeline

4 GitHub Actions jobs on every push:

Lean 4 Build — lake build (2886 jobs)
Verify Theorem/Axiom Counts — grep-based count check
Paper Compilation Check — pdflatex+bibtex for all 3 submission papers
Axiom Consistency Model — Python numerical verification of all 6 axioms

Who Is This For?

Data scientist using SHAP. Your feature rankings for correlated features are unreliable. The instability is not noise or a software bug — it is a provable consequence of how gradient boosting interacts with collinearity. The fix is DASH: average SHAP values from multiple independently trained models. See the dash-shap companion package and the stability API in PR #255 for the single-model screen to Z-test (multi-model validation) to DASH (ensemble consensus) workflow.

Researcher in XAI or ML theory. This is a formally verified impossibility theorem with 357 Lean proofs and 0 sorry. The Symmetric Bayes Dichotomy (Section 6 of the definitive paper) is a general proof technique from invariant decision theory that applies to any symmetric decision problem — we demonstrate it on feature attribution, model selection, and causal discovery under Markov equivalence. The Design Space Theorem characterizes the full achievable set.

Regulator or model risk officer. Single-model SHAP explanations are provably unreliable under collinearity. In a survey of 77 public datasets, 68% exhibit attribution instability. This affects EU AI Act Art. 13(3)(b)(ii) requirements for disclosing "known and foreseeable circumstances" affecting accuracy, and SR 11-7 model risk management compliance. The paper provides disclosure templates and a diagnostic workflow.

Lean or Mathlib community. 357 theorems across 58 files, 15 abstraction levels, using MulAction for orbit bounds, ProbabilityTheory.cdf for the Gaussian flip rate, and Analysis.Calculus for the FIM impossibility. The Gaussian CDF symmetry proofs (phi(0)=1/2, phi(-x)=1-phi(x)) via NoAtoms + prob_compl_eq_one_sub may be of independent interest. The axiom consistency proof constructs a Fin 4 model satisfying all 6 axioms.

Contributing

Conventions (full list in CLAUDE.md)

No sorry without a -- TODO: comment
No autoImplicit true — all variables explicit
Verify counts before committing: make verify
Keep all 4 papers synchronized (monograph first)
Do NOT commit paper changes without running the verification block

Adding a New Theorem

Write the theorem in the appropriate .lean file
lake build to verify
Add to the monograph (main_definitive.tex) — cite by \texttt{name}
Add to the cross-reference table in the Lean section
Update paper/FINDINGS_MAP.md
Propagate to JMLR and NeurIPS papers as appropriate

Adding a New Experiment

Write script in paper/scripts/
Run and save results to paper/results_*.json
Add subsection to monograph with key numbers
Add to experiment summary table
Update paper/FINDINGS_MAP.md
Propagate to JMLR paper

Current State (verified 2026-04-25)

Theorems+lemmas: 357
Axioms:          6
Sorry:           0
Files:           58

Verify: make registry-check (30 automated checks) or python3 paper/scripts/verify_registry.py.

Citation

@article{caraker2026attribution,
  title={The Attribution Impossibility: No Feature Ranking Is Faithful, Stable,
         and Complete Under Collinearity},
  author={Caraker, Drake and Arnold, Bryan and Rhoads, David},
  year={2026},
  doi={10.5281/zenodo.19468379},
  url={https://doi.org/10.5281/zenodo.19468379},
  note={Lean 4 formalization: \url{https://github.com/DrakeCaraker/dash-impossibility-lean}}
}

FAQ

What is SHAP? SHAP (SHapley Additive exPlanations) assigns each feature an importance score for a model's prediction, based on cooperative game theory. It is the most widely used feature attribution method. See Lundberg & Lee, 2017.

What is Lean 4? A programming language and interactive theorem prover. You write mathematical proofs as code, and the computer checks every step. Think of it as a compiler for math. See leanprover.github.io.

What does "sorry" mean in Lean? It is a placeholder that says "trust me, this is true" without providing a proof — the Lean equivalent of a TODO. This project has 0 sorry: every claim is machine-verified.

Is DASH just averaging? Yes — but with provable optimality guarantees. DASH computes the mean |SHAP| across M independently trained models. The paper proves this is the minimum-variance unbiased linear estimator (via Cauchy-Schwarz / Titu's lemma; Lean-verified) and gives the tight ensemble size formula M_min = ceil(2.71 * sigma^2 / Delta^2). "Just averaging" is like saying OLS is "just matrix inversion." The value is in understanding exactly when and why it is optimal.

Does this apply to my dataset? Check two things: (1) Do you have features with |correlation| > 0.5? (2) Do those features have similar importance? If both yes, your SHAP rankings for those features are unreliable. Adapt paper/scripts/f1_f5_validation.py to your data to check.

What about conditional SHAP? It does not help when features have equal causal effects (beta_j = beta_k). The impossibility extends to conditional attributions in that case. When causal effects differ (Delta-beta > ~0.2), conditional SHAP can resolve the instability. See ConditionalImpossibility.lean.

How does this relate to Arrow's theorem? Same structure: Arrow proves no voting system can satisfy IIA + Pareto + non-dictatorship simultaneously. We prove no ranking can satisfy faithfulness + stability + completeness simultaneously. Both are resolved by allowing ties (partial orders). Our relaxation paths converge (unlike Arrow's, which diverge) — reflecting the DGP symmetry that Arrow's heterogeneous voters lack.

What is the bilemma? For binary explanation problems (SHAP sign positive/negative, feature selected/not, circuit A or B), the impossibility is strictly stronger than the trilemma: faithful + stable alone is impossible. There is no neutral element (like a tie) to serve as a resolution. The only fix is to enrich the explanation space with equivalence classes. See Bilemma.lean.

What is the mechanistic interpretability instance? We prove that no circuit explanation of a neural network can be simultaneously faithful and stable (the bilemma applies). Meloux et al. (ICLR 2025) found 85 valid circuits for a simple XOR task — the Rashomon property holds with extreme force for circuits. The resolution: report circuit equivalence classes, not individual decompositions. See MechInterp.lean.

What is the universal framework? The ExplanationSystem abstraction (ExplanationSystem.lean) generalizes beyond feature attribution. Any system with configurations (Theta), observables (Y), and explanations (H) faces the same trilemma when the Rashomon property holds. The attribution impossibility is Instance 1; mechanistic interpretability is another instance. The NeurIPS universal paper (neurips_universal.tex) presents the full framework.

What is the relationship to dash-shap? This repository contains the theory (Lean proofs + paper). dash-shap is the Python implementation: DASH pipeline, experiments, diagnostics. The stability API (PR #255) provides screen(), validate(), consensus(), report().

References

CLAUDE.md — AI agent conventions, project rules, axiom inventory, "Do NOT" list
paper/FINDINGS_MAP.md — Complete 109-finding reference with tier classification
docs/co-author-guide.md — Co-author onboarding, verification checklist, timeline
Companion code — dash-shap Python package (stability API in PR #255)

Related Work

Bilodeau et al. (2024): Proved completeness + linearity cannot coexist; we address stability instead and provide a constructive resolution.
Chouldechova (2017): Proved calibration + balance + equal FPR cannot coexist when base rates differ; our trilemma is the explainability analogue.
Arrow (1951): Proved IIA + Pareto + non-dictatorship cannot coexist; same structural template, different domain.
Nipkow (2009): Formalized Arrow's theorem in Isabelle/HOL; we extend this tradition to explainable AI in Lean 4.
Zhang et al. (2026): Formalized statistical learning theory in Lean 4; complementary Lean formalization work.

Authors

Drake Caraker — conceptualization, Lean formalization, experiments, software Bryan Arnold — [role] David Rhoads — [role]

Independent researchers. Contact: drakecaraker@gmail.com

Paper: "The Attribution Impossibility: No Importance Ranking Is Faithful, Stable, and Complete Under Symmetry" Primary target: NeurIPS 2026 (abstract May 4, paper May 6) | Expanded: JMLR (after NeurIPS decision) arXiv: Monograph at paper/arxiv_monograph.tar.gz (82 pages). Categories: cs.LG, cs.AI, stat.ML, cs.LO.

Name		Name	Last commit message	Last commit date
Latest commit History 360 Commits
.claude		.claude
.githooks		.githooks
.github/workflows		.github/workflows
DASHImpossibility		DASHImpossibility
docs		docs
experiments		experiments
paper		paper
.gitattributes		.gitattributes
.gitignore		.gitignore
.zenodo.json		.zenodo.json
CITATION.cff		CITATION.cff
CLAUDE.md		CLAUDE.md
DASHImpossibility.lean		DASHImpossibility.lean
Makefile		Makefile
OVERVIEW.md		OVERVIEW.md
PROJECT_REGISTRY.md		PROJECT_REGISTRY.md
README.md		README.md
lake-manifest.json		lake-manifest.json
lakefile.toml		lakefile.toml
lean-toolchain		lean-toolchain

Folders and files

Latest commit

History

Repository files navigation

The Attribution Impossibility

Quick Start

What This Proves

1. The Problem

2. The Impossibility (Theorem 1)

3. How Bad Is It? (Quantitative Bounds)

4. The Fix: DASH

5. The Complete Map (Design Space Theorem)

6. It Is Not Just SHAP (Symmetric Bayes Dichotomy)

7. Extensions

Repository Structure

Paper Versions

Proof Architecture

Proof Status Transparency

Axiom System

Experiments & Results

Building

Lean 4

LaTeX

Python

CI Pipeline

Who Is This For?

Contributing

Conventions (full list in CLAUDE.md)

Adding a New Theorem

Adding a New Experiment

Current State (verified 2026-04-25)

Citation

FAQ

References

Related Work

Authors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages