If a bank runs a credit model on your encrypted data and sends you back a SHAP explanation of the decision, the explanation can often be reverse-engineered back into your original input. The textbook fix — adding differential-privacy noise — does not work at the privacy budgets anyone actually uses. This draft explains why, by how much, and what to publish instead.
A bank holds a model. A client wants a prediction. The client does not want the bank to see their financial details. The fix is fully homomorphic encryption. The new problem is what happens after the prediction — specifically, what happens when the client asks "why?".
A bank, hospital, or insurer holds a machine-learning model — say, a credit-default classifier. A client sends an encrypted feature vector x: age, income, debt-to-income, number of open accounts. Fully homomorphic encryption (FHE) lets the server evaluate the model over the ciphertext and return an encrypted prediction f(x). The server never sees the plaintext. Good.
Modern deployments go further and also return a SHAP explanation — a per-feature score φ₁, φ₂, …, φ_d telling the client which of their features drove the decision. This is often a regulatory requirement: the client has to be able to audit, dispute, or file the decision. The server runs the explanation under FHE, the client decrypts.
Here is the problem. If the client publishes the explanation — shows it to a regulator, attaches it to a complaint, uploads it to a portal — an attacker who sees it can often reconstruct most of x, because φ is a function of x. Published SHAP vectors leak. The standard response is to add differential-privacy noise before releasing φ. This paper shows that fix has no good operating point.
Differential privacy works by adding just enough noise to hide whether the input was x or a neighbour x'. The cost depends on how much the output can change when one feature changes. For SHAP, that quantity grows with the square root of K — the number of coalition queries SHAP runs internally.
SHAP does not read x directly into a single scoring function. It builds K different coalitions — each a random subset S ⊆ {1…d} of features — evaluates the model on each masked input x_S (keeping features in S, replacing the rest with a baseline b), and solves a weighted regression on the results to allocate credit per feature.
Changing one feature of x by 1 changes each masked input x_{S_k} by 1 only in coalitions that include that feature — roughly K/2 of them. So a single-feature change perturbs Θ(K) entries of y(x), each by a bounded amount. When you stack those perturbations into a vector and take its L₂ norm, you get √K growth. The regression matrix R redistributes that across the d output coordinates but does not shrink it.
x perturbs roughly K/2 of the K coalition queries. Stacking those perturbations into a vector and taking its L₂ norm gives Θ(√K) growth. The regression operator R = (ZᵀWZ)⁻¹ZᵀW redistributes across d coordinates without shrinking the norm. The Gaussian-mechanism DP noise must absorb that entire Θ(√K/ε) budget — for K ≈ 400 that is about 20× larger than a typical SHAP value.Under the Gaussian mechanism with global or smooth sensitivity, there is no (ε, δ) with ε ≤ 5 and SNR ≥ 1 — for linear, tree-ensemble, or shallow-MLP models — at any K ≥ 100. The noise overwhelms the signal at every privacy budget tight enough to resist the reconstruction attack.
Plugging the sensitivity into the Gaussian mechanism and varying the privacy budget.
| Privacy budget ε | Model class | Gaussian σ (global sens.) | Median |φ| | SNR | Verdict |
|---|---|---|---|---|---|
| 1 | logistic | ~3.2 | ~0.04 | 0.01 | explanation destroyed |
| 5 | logistic | ~0.7 | ~0.04 | 0.06 | explanation destroyed |
| 10 | tree-ensemble | ~1.4 | ~0.08 | 0.06 | explanation destroyed |
| 20 | logistic | ~0.35 | ~0.04 | 0.12 | nominal DP only |
| 20 | shallow-MLP | ~0.5 | ~0.05 | 0.10 | nominal DP only |
SNR = median SHAP magnitude / Gaussian σ. Any useful explanation needs SNR ≥ 1. Every row above is orders of magnitude short — and these are the good rows, using smooth sensitivity and aggressive clipping.
Sanity check: at the budgets where the explanation is still useful, an attacker really can recover the input. At the budgets where the attack fails, the explanation is already noise.
We trained ridge, kernel-ridge, and 2-layer MLP attackers to predict the standardised input x from the released φ. Data: UCI Adult (d ≈ 87 after one-hot), German Credit (d = 24). K = 200 coalitions. Reported: MSE in standardised-feature units, lower is better; random baseline is 1.0.
| Privacy budget ε | Attacker | Adult MSE | German MSE | Interpretation |
|---|---|---|---|---|
| ∞ (no DP) | ridge | 0.12 | 0.15 | strong recovery |
| ∞ (no DP) | MLP | 0.09 | 0.13 | strong recovery |
| 10 | ridge | 0.62 | 0.68 | partial recovery |
| 5 | ridge | 0.91 | 0.94 | marginal · explanation also gone |
| 1 | ridge | ~1.0 | ~1.0 | no leak · no signal either |
If releasing cardinal SHAP values under DP is impossible, what can you release? Three constructions, all with tighter sensitivity than the raw vector.
Publish only the identity of the top-ranked feature. Sensitivity depends on the gap between the top and second SHAP value, not on the absolute magnitudes. Roughly √K cheaper than cardinal release.
Release the top-1 feature via report-noisy-max, plus its DP-noised magnitude as a single scalar. One noisy scalar, not a full vector — the √K is gone.
Release only features whose |φ| exceeds a threshold, under propose-test-release. For models where most features contribute near zero, the effective sensitivity is the sensitivity of the subset, not of the full d-dimensional vector.
Release the top-t ranking with the exponential mechanism over permutations. Useful as a baseline but absolute budgets for t > 1 are still large at realistic SHAP gaps — more a research direction than a deployment recipe.
If you have to release explanations from FHE inference under DP, release ranks, not values. Release top-1, not top-d. Pair DP with access control, not instead of it.
Upper bounds tell you the Gaussian mechanism can't do better than Θ(√K/ε). Lower bounds would tell you no mechanism can. We have a sketch, not a theorem — with one gap resolved and one gap that now looks like it invalidates the √K part of the bound entirely.
The sketch adapts Bun–Ullman–Vadhan's fingerprinting lower bounds. Build a packing {x_u} indexed by sign vectors u ∈ {±1}^d such that Hamming-adjacent packing points have ‖φ(x_u) − φ(x_u')‖₂ = Ω(L₀ √(K/d)). Then:
Choose 2^d points so that adjacent ones differ in exactly one coordinate, with SHAP-vector separation growing as √K/d per coordinate.
Any mechanism with error α ≪ L₀√(K/d) lets a codebook decoder recover u from φ̃. Accurate release ⇒ recoverable identity.
DP's group-privacy blowup at distance d bounds the probability of correct recovery. Steinke–Ullman's tracing argument gives the contradiction.
Summing the d coordinates of u gives the √d factor. Combined with the √K from the packing separation, the sketch would yield Ω(√(Kd)/ε) — if the packing separation survived.
Step B needs the coalition matrix Z to be well-conditioned so that changes in y translate into changes in φ without cancellation. We prove a matrix-Chernoff bound: for K ≥ 32 d ln(d/η), the smallest singular value satisfies σ_d(Z)² ≥ K/8 with probability ≥ 1 − η.
𝔼[zzᵀ] = ¼I + ¼11ᵀ has λ_min = 1/4 on 1^⊥.
At our experimental scale (d = 87, K = 200) the Chernoff threshold is K* ≈ 18,900, so the quantitative bound doesn't bind. We verify empirically instead: σ_d(Z) measured on 100 draws has median 2.53, min 2.22, max 2.76 — comfortably bounded. Full rank holds almost surely by coupon-collector at Pr ≥ 1 − 8.4·10⁻³³.
√K cancellation — existence proof for a better mechanismThis is the positive result hiding inside a "failed" lower bound. The argument assumed ‖RΔy‖₂ inherits the √K growth from ‖Δy‖₂. Closer analysis shows the opposite: for additive and threshold-stump models, ‖RΔy‖₂ = Θ(1) independent of K.
Why this matters: it is a constructive proof that a smarter mechanism than Gaussian exists in principle. The Gaussian mechanism pays the √K upfront because it noises y before the regression. But the SHAP output itself doesn't have a √K sensitivity — the regression step erases it. A mechanism that noises after the regression, or uses the algebraic structure of R, could avoid the amplification entirely. We haven't built that mechanism yet; but we now know the search space is non-empty.
(ZᵀWZ)⁻¹ ZᵀW acting on columns of Z exactly cancels the √K growth.
√K factor in the conjectured lower bound was expected to survive the regression step. It doesn't — at least for additive and stump models. The true lower bound for cardinal kernel-SHAP under DP may be substantially below the Gaussian-mechanism upper bound, leaving room for a better mechanism.Upper bound (Gaussian mechanism must use σ = Θ(√K/ε)): tight for that mechanism class, hence the impossibility under Gaussian.
Lower bound against all DP mechanisms: only Ω(√d/ε) is established. The conjectured Ω(√(Kd)/ε) is provably unreachable via the sensitivity route.
Constructive consequence: the column-cancellation lemma is itself evidence that a DP mechanism with accuracy o(√K/ε) exists for cardinal kernel SHAP. The √K amplification is an artefact of the Gaussian mechanism's "noise-y-first" structure, not of the SHAP-release problem itself. Finding an explicit construction — e.g. a mechanism that noises in the post-regression basis or exploits R's structure — is the main open constructive question this paper leaves on the table.
K ≤ 20 is sometimes feasible if you are willing to accept coarser explanations; the √K factor is then 4–5, not 20.ε ≳ 20) with orthogonal defences: rate limits, access audit, query logging. DP alone at these budgets is symbolic.The short version: when a client does encrypted SHAP and wants to publish the result, the right design choice is almost never "slap Gaussian noise on it." It is one of three alternatives — and the formal "impossibility" is narrower than earlier versions of this draft claimed.
Under global or smooth sensitivity, adding DP noise to a full-vector cardinal SHAP release is not a standalone mitigation at conventional ε. The √K sensitivity of the kernel-SHAP regression forces noise orders of magnitude larger than the SHAP values themselves at any privacy budget worth claiming. Clipping and smooth sensitivity shrink the constants by up to 5× but preserve the rate.
FHE already protects the input from the server. If the client consumes the explanation internally — e.g. displays it in a local dashboard, never exports it — no further privacy machinery is needed. Most deployments should start here.
Report-noisy-max on per-coordinate sensitivity is roughly √K cheaper than cardinal DP-SHAP but still needs ε in the tens at realistic gaps (ε ≈ 70 at gap 0.1 on logistic Adult under the Shapley-weighted constant c_W ≈ 148). Ranking is the right axis — not a small-ε rescue.
Aggressive clipping at ε = 10–20 makes reconstruction harder without qualifying as DP in any formal sense. Combine it with rate-limiting, access control, and FHE to raise the overall bar. Don't claim formal DP at these budgets.
Ω(√(Kd)/ε) is provably unreachable via the sensitivity route; that is itself evidence a DP mechanism with accuracy o(√K/ε) exists. Constructing one is the headline open problem.Don't publish cardinal SHAP under DP. Publish rankings, or nothing at all, and use DP — if at all — as one layer of a defence-in-depth stack, not as the headline guarantee.