《Understanding Black-box Predictions via Influence Functions》
这篇paper是来自2017年的ICML best paper的,其背景在摘要部分已经写明,即为了解释黑盒预测。
所谓黑盒预测,在深度学习中,一个深层次的神经网络,往往能得到更好的预测性能和泛化能力,对于神经网络的应用层来说,通过各种方法如修改模型结构,调整参数,改造激活函以及一些训练过程中的trick优化网络性能,而对于为什么会模型能够work,还需要等待理论的发展来支撑。
在本文中,使用影响函数(统计学方法)通过学习算法跟踪模型训练数据对预测的影响,从而确定对预测集(测试集)影响最大的训练点。
How can we explain the predictions of a black- box model? In this paper, we use influence func- tions — a classic technique from robust statis- tics — to trace a model’s prediction through the learning algorithm and back to its training data, thereby identifying training points most respon- sible for a given prediction.
通过摘要可以看到,这篇paper是从数据点的角度,探究训练点对于测试集的影响。
第一章是Introduction,主要介绍一下机器学习黑盒系统的背景,
A key question often asked of machine learning systems is “Why did the system make this prediction?”
从第二章Approach开始介绍方法
θ \theta θ为需要学习的参数,如在线性回归中, y ^ i = x i 1 θ 1 + x i 2 θ 2 + . . . + x i m θ m \hat{y}_{i}=x_{i1}\theta_{1}+x_{i2}\theta_{2}+...+x_{im}\theta_{m} y^i=xi1θ1+xi2θ2+...+ximθm
即每个训练点 x i x_{i} xi的预测值 y ^ i \hat{y}_{i} y^i由 m m m 个 θ 个\theta 个θ参数和x的m个特征相乘求和。再与实际值 y i 通 过 损 失 函 数 y_{i}通过损失函数 yi通过损失函数 l o s s _ f u n c ( y ^ i , y i ) 计 算 损 失 loss\_func(\hat{y}_{i}, y_{i})计算损失 loss_func(y^i,yi)计算损失。而在神经网络中 y ^ i \hat{y}_{i} y^i的表达式往往更复杂,但本质也是通过参数 θ \theta θ和 z z z计算,故使用通用符合 L ( z , θ ) = l o s s L(z, \theta)=loss L(z,θ)=loss
Upweighting a training point
由2.1中的关于训练点的定义,当从训练集删除一个训练点 z z z时,参数 θ ^ \hat{\theta} θ^变为 θ ^ − z \hat{\theta}_{-z} θ^−z,此时参数的变化为: θ ^ − z − θ ^ \hat{\theta}_{-z}-\hat{\theta} θ^−z−θ^,而 θ ^ − z \hat{\theta}_{-z} θ^−z定义为
θ ^ − z = a r g min θ ∈ Θ 1 n ∑ z i ≠ z L ( z i , θ ) \hat{\theta}_{-z} = arg \min_{\theta \in \Theta } \frac{1}{n} {\textstyle \sum_{z_{i}\ne z}^{}L(z_{i},\theta)} θ^−z=argθ∈Θminn1∑zi=zL(zi,θ)
即删除一个训练点后,重新训练,并找出其损失函数收敛时候的 θ ^ − z \hat{\theta}_{-z} θ^−z
而幸运的是,影响函数给了我们一个有效逼近.
Fortunately, influence functions give us an efficient approximation.
思想是计算 z z z的改变量,对 θ \theta θ的影响,假如对 z z z施加一个小的影响因子 ϵ \epsilon ϵ,新的参数即改变为 θ ^ ϵ , z \hat{\theta}_{\epsilon, z} θ^ϵ,z,定义如下
θ ^ ϵ , z = a r g min θ ∈ Θ 1 n ∑ i = 1 n L ( z i , θ ) + ϵ L ( z , θ ^ ) \hat{\theta}_{\epsilon, z} = arg \min_{\theta \in \Theta } \frac{1}{n} {\textstyle \sum_{i=1}^{n}L(z_{i},\theta) } + \epsilon L(z, \hat{\theta}) θ^ϵ,z=argθ∈Θminn1∑i=1nL(zi,θ)+ϵL(z,θ^)
在1982年的文献中,这种方式计算 z z z的改变量对于参数 θ ^ \hat{\theta} θ^ 的影响有解,如下
A classic result (Cook & Weisberg, 1982) tells us that the in- fluence of upweighting z on the parameters θ ^ \hat{\theta} θ^ is given
I u p , p a r a m s ( z ) = d θ ^ ϵ , z d ϵ ∣ ϵ = 0 = − H θ ^ − 1 ∇ θ L ( z , θ ) I_{up, params}(z) = \frac{\mathrm{d} \hat{\theta}_{\epsilon, z} } {\mathrm{d} \epsilon } \mid_{\epsilon = 0}= -H^{-1}_{\hat{\theta}}\nabla_{\theta}L(z,\theta) Iup,params(z)=dϵdθ^ϵ,z∣ϵ=0=−Hθ^−1∇θL(z,θ)
其中 H θ ^ = ∑ i = 1 n ∇ θ 2 L ( z , θ ^ ) H_{\hat{\theta}}={\textstyle \sum_{i=1}^{n} \nabla^{2}_{\theta}L(z, \hat{\theta} )} Hθ^=∑i=1n∇θ2L(z,θ^)为海森矩阵,并且假设其正定。
由于当 ϵ = − 1 n \epsilon = −\frac{1}{n} ϵ=−n1时相当于将 z z z移除,可以线性逼近移除 z z z后的参数变化 θ ^ − z − θ ^ ≈ − 1 n I u p , p a r a m s ( z ) \hat{\theta}_{-z}-\hat{\theta} \approx −\frac{1}{n}I_{up, params}(z) θ^−z−θ^≈−n1Iup,params(z),而不用重新训练模型。
Since removing a point z z z is the same as upweighting it by ϵ = − 1 n \epsilon = −\frac{1}{n} ϵ=−n1 , we can linearly approximate the parameter change due to removing z z z by computing θ ^ − z − θ ^ ≈ I u p , p a r a m s ( z ) \hat{\theta}_{-z}-\hat{\theta} \approx I_{up, params}(z) θ^−z−θ^≈Iup,params(z), without retraining the model.
之后,基于上述方法,作者提出:当更新训练点 z z z后,在测试集上的loss会改变多少。可以得到一个闭式的解如下,
I u p , l o s s ( z , z t e s t ) = d L ( z t e s t , θ ^ ϵ , z ) d ϵ ∣ ϵ = 0 = ∇ θ L ( z t e s t , θ ^ ) T d θ ^ ϵ , z d ϵ ∣ ϵ = 0 = − ∇ θ L ( z t e s t , θ ^ ) T H θ ^ − 1 ∇ θ L ( z , θ ^ ) \begin{aligned} I_{up, loss}(z, z_{test}) & = \frac{\mathrm{d} L(z_{test}, \hat{\theta}_{\epsilon, z} )}{\mathrm{d} \epsilon } \mid_{\epsilon = 0} \\& = \nabla_{\theta}L(z_{test},\hat{\theta})^{T} \frac{\mathrm{d} \hat{\theta}_{\epsilon, z}}{\mathrm{d} \epsilon } \mid_{\epsilon = 0} \\&=-\nabla_{\theta}L(z_{test},\hat{\theta})^{T}H^{-1}_{\hat{\theta}}\nabla_{\theta}L(z_, \hat{\theta}) \end{aligned} Iup,loss(z,ztest)=dϵdL(ztest,θ^ϵ,z)∣ϵ=0=∇θL(ztest,θ^)Tdϵdθ^ϵ,z∣ϵ=0=−∇θL(ztest,θ^)THθ^−1∇θL(z,θ^)
Perturbing a training input
作者通过反事实进一步研究细化影响的概念,若干扰模型的输入,则对于预测会发生什么变化。
Let us develop a finer-grained notion of influence by studying a different counterfactual: how would the model’s pre- dictions change if a training input were modified
对于一个训练点 z = ( x , y ) z=(x, y) z=(x,y),定义:
z δ = ( x + δ , y ) z_{\delta}=(x+\delta, y) zδ=(x+δ,y).
即对样本点施加干扰从 z → z δ z \to z_{\delta} z→zδ,
令 θ ^ z δ , − z \hat{\theta}_{z_{\delta}, -z} θ^zδ,−z 为训练点 z z z 替换为 z δ z_{\delta} zδ后,训练损失的最小经验风险,即损失收敛处的参数值。
相当于 z z z 替换为 z δ z_{\delta} zδ后,重新训练, θ ^ z δ , − z \hat{\theta}_{z_{\delta}, -z} θ^zδ,−z 为重新训练后的参数。
即此时参数的改变为
θ ^ z δ , − z − θ \hat{\theta}_{z_{\delta}, -z}- \theta θ^zδ,−z−θ
为了逼近 θ ^ z δ , − z − θ \hat{\theta}_{z_{\delta}, -z}- \theta θ^zδ,−z−θ,定义从 z → z δ z \to z_{\delta} z→zδ:
θ ^ ϵ , z δ , − z = a r g min θ ∈ Θ 1 n ∑ i = 1 n L ( z i , θ ) + ϵ L ( z δ , θ ) − ϵ L ( z , θ ) \hat{\theta}_{\epsilon, z_{\delta},-z} = arg \min_{\theta \in \Theta } \frac{1}{n} {\textstyle \sum_{i=1}^{n}L(z_{i},\theta) } + \epsilon L(z_{\delta}, \theta)-\epsilon L(z, \theta) θ^ϵ,zδ,−z=argθ∈Θminn1∑i=1nL(zi,θ)+ϵL(zδ,θ)−ϵL(z,θ)
得到:
d θ ^ ϵ , z δ , − z d ϵ ∣ ϵ = 0 = I u p , p a r a m s ( z δ ) − I u p , p a r a m s ( z ) = − H θ ^ − 1 ( ∇ θ L ( z δ , θ ) − ∇ θ L ( z , θ ^ ) ) \begin{aligned} \frac{\mathrm{d} \hat{\theta}_{\epsilon, z_{\delta},-z} }{\mathrm{d} \epsilon} \mid_{\epsilon=0} &=I_{up, params}(z_{\delta})- I_{up, params}(z) \\&=-H^{-1}_{\hat{\theta}}(\nabla_{\theta}L(z_{\delta},\theta)-\nabla_{\theta}L(z,\hat{\theta})) \end{aligned} dϵdθ^ϵ,zδ,−z∣ϵ=0=Iup,params(zδ)−Iup,params(z)=−Hθ^−1(∇θL(zδ,θ)−∇θL(z,θ^))
因此,同样有 θ ^ z δ , − z − θ ≈ − 1 n ( I u p , p a r a m s ( z δ ) − I u p , p a r a m s ( z ) ) \hat{\theta}_{z_{\delta}, -z}- \theta \approx -\frac{1}{n} ( I_{up, params}(z_{\delta})- I_{up, params}(z)) θ^zδ,−z−θ≈−n1(Iup,params(zδ)−Iup,params(z)),
给出了从 z → z δ z \to z_{\delta} z→zδ的一个影响估计值。
上诉例子中, δ \delta δ 为施加于x输入中,即 ( x , y ) → ( x + δ , y ) (x, y) \to (x+{\delta}, y) (x,y)→(x+δ,y),
同样的结论适用于 y y y的扰动, ( x , y ) → ( x , y + δ ) (x, y) \to (x, y+{\delta}) (x,y)→(x,y+δ)
Analogous equations also apply for changes in y.
虽然影响函数似乎只适用于无穷小(因此是连续的)扰动,但需要注意的是,这种近似适用于任意 δ \delta δ: ϵ \epsilon ϵ-更新方法允许在 z z z和 z δ z_δ zδ之间进行平滑插值。这对于离散数据(例如,在NLP中)或离散标签的处理特别有用。
While influence functions might appear to only work for infinitesimal (therefore continuous) perturbations, it is important to note that this approximation holds for arbitrary δ \delta δ: the ϵ \epsilon ϵ-upweighting scheme allows us to smoothly interpolate between z z z and z δ z_δ zδ. This is particularly useful for working with discrete data (e.g., in NLP) or with discrete label changes.
如果 x x x是连续且小,以下
d θ ^ ϵ , z δ , − z d ϵ ∣ ϵ = 0 = I u p , p a r a m s ( z δ ) − I u p , p a r a m s ( z ) = − H θ ^ − 1 ( ∇ θ L ( z δ , θ ) − ∇ θ L ( z , θ ^ ) ) \begin{aligned} \frac{\mathrm{d} \hat{\theta}_{\epsilon, z_{\delta},-z} }{\mathrm{d} \epsilon} \mid_{\epsilon=0} &=I_{up, params}(z_{\delta})- I_{up, params}(z) \\&=-H^{-1}_{\hat{\theta}}(\nabla_{\theta}L(z_{\delta},\theta)-\nabla_{\theta}L(z,\hat{\theta})) \end{aligned} dϵdθ^ϵ,zδ,−z∣ϵ=0=Iup,params(zδ)−Iup,params(z)=−Hθ^−1(∇θL(zδ,θ)−∇θL(z,θ^))
可以得到一个进一步的逼近.
假定x的输入域 χ ∈ R d \chi \in \mathbb{R} ^d χ∈Rd, 参数域 Θ ∈ R d \Theta \in \mathbb{R} ^d Θ∈Rd, L L L对于 θ \theta θ和 x x x可微,
当 ∥ δ ∥ → 0 \left \| \delta \right \| \to 0 ∥δ∥→0时
( ∇ θ L ( z δ , θ ) − ∇ θ L ( z , θ ^ ) ) ≈ ( ∇ x ∇ θ L ( z , θ ^ ) ) δ (\nabla_{\theta}L(z_{\delta},\theta)-\nabla_{\theta}L(z,\hat{\theta})) \approx (\nabla_{x}\nabla_{\theta}L(z,\hat{\theta}))\delta (∇θL(zδ,θ)−∇θL(z,θ^))≈(∇x∇θL(z,θ^))δ
代入原式中,得
d θ ^ ϵ , z δ , − z d ϵ ∣ ϵ = 0 = I u p , p a r a m s ( z δ ) − I u p , p a r a m s ( z ) = − H θ ^ − 1 ( ∇ θ L ( z δ , θ ) − ∇ θ L ( z , θ ^ ) ) = − H θ ^ − 1 ( ∇ x ∇ θ L ( z , θ ^ ) ) δ \begin{aligned} \frac{\mathrm{d} \hat{\theta}_{\epsilon, z_{\delta},-z} }{\mathrm{d} \epsilon} \mid_{\epsilon=0} &=I_{up, params}(z_{\delta})- I_{up, params}(z) \\&=-H^{-1}_{\hat{\theta}}(\nabla_{\theta}L(z_{\delta},\theta)-\nabla_{\theta}L(z,\hat{\theta})) \\&= -H^{-1}_{\hat{\theta}}(\nabla_{x}\nabla_{\theta}L(z,\hat{\theta}))\delta \end{aligned} dϵdθ^ϵ,zδ,−z∣ϵ=0=Iup,params(zδ)−Iup,params(z)=−Hθ^−1(∇θL(zδ,θ)−∇θL(z,θ^))=−Hθ^−1(∇x∇θL(z,θ^))δ
因此 θ ^ z δ , − z − θ ≈ − 1 n H θ ^ − 1 ( ∇ x ∇ θ L ( z , θ ^ ) ) δ \hat{\theta}_{z_{\delta}, -z}- \theta \approx -\frac{1}{n} H^{-1}_{\hat{\theta}}(\nabla_{x}\nabla_{\theta}L(z,\hat{\theta}))\delta θ^zδ,−z−θ≈−n1Hθ^−1(∇x∇θL(z,θ^))δ
使用链式法则对 δ \delta δ微分得到:
I p e r t , l o s s ( z , z t e s t ) T = ∇ δ L ( z t e s t , θ ^ z δ , − z ) T ∣ δ = 0 = − ∇ θ L ( z t e s t , θ ^ ) T H θ ^ − 1 ∇ x ∇ θ L ( z , θ ^ ) \begin{aligned} I_{pert, loss}(z, z_{test})^T & = \nabla_{\delta}L(z_{test},\hat{\theta}_{z_{\delta},-z})^{T} \mid_{\delta= 0} \\&=-\nabla_{\theta}L(z_{test},\hat{\theta})^{T}H^{-1}_{\hat{\theta}}\nabla_{x}\nabla_{\theta}L(z,\hat{\theta}) \end{aligned} Ipert,loss(z,ztest)T=∇δL(ztest,θ^zδ,−z)T∣δ=0=−∇θL(ztest,θ^)THθ^−1∇x∇θL(z,θ^)
I p e r t , l o s s ( z , z t e s t ) T δ I_{pert, loss}(z, z_{test})^T\delta Ipert,loss(z,ztest)Tδ 为 z → z + δ z \to z+{\delta} z→z+δ 的有效逼近
通过设置 δ \delta δ可以建立在训练集上,对测试集影响最大的局部扰动