本文记录三篇文章,探究深度学习对象(图形、文本)中的因果关系(Causalty).
文章零、From Dependence to Causation
这是一本书,讲述了causal relationship在各个领域的应用。其中介绍了下面两篇论文,从作者的思路思考这个问题应该会有用
文章一、Discovering Causal Signals in Images
Causal features: cause the presence of the object in the scene
Anticausal features: caused by the presence of the object in the scene
Object features: are mostly activated inside the bounding box of the object of interest
Context features: are mostly activated outside the bounding box of the object of interest
Hypothesis 1. Image datasets carry an observable statistical signal revealing the asymmetric relationship between object categories that results from their causal dispositions.
Hypothesis 2:
There exists an observable statistical dependence between object features and anticausal features. The statistical dependence between context features and causal features is nonexistent or much weaker.
作者想要证明,anticausal features 通常在bounding box里面(是object feature)。
作者训练了一个NCC (Neural Causation Coefficient),可以计算两个sample之间的关系。(分类)给定a bag of samples drawn from a distribution , NCC will get the result of the causal direction between and .
作者的实验基于一个假设:The features computed by the final layers of a convolutional neural network (CNN) [14, 21, 8] often indicate the presence of a well localized object-like feature in the scene depicted by the image under study. 就是全连接层之前的每一个feature都与图像中的对象有关联。。。(极不靠谱的假设)
然后作者构建了一个20分类的神经网络。取全连接层之前的512维vector 作为feature score. 然后取分类网络的结果(20维)作为objects . 通过所有的图片的分值,计算每一个feature 与每一个object 之间的因果关系。
然后对于每一个object , 作者选出前1% causal and anticausal features。
接下来如何证明 anticausal feature在bounding 内呢? 作者又做了一个不太靠谱的假设。。If the anticausal feature score is imputable more to the bounding boxes of the object of category , than the causal feature score. then it means that there is a statistical dependence between object features and anticausal features.
看图4,蓝色柱总比绿色高,意味着top anticausal feature 受到bounding box内容的影响总比 top causal feature高。
文章二、Causal Discovery Using Proxy Variables
之前的mentioned estimate the causal relation between two random entities X and Y (包括文章一,探讨的是网络中特定位置的feature和object的causal relationship)。本文聚焦在发掘两个static variable,比如两张特定的图片、两个特定的词汇等等,之间的causal relationship。For instance, one art masterpiece and its fraudulent copy, one translated document and its original version, or one pair of causally linked words in natural language, such as “virus” and “death”.
1. Causality 的基础概念
Principle 1 (Principle of common cause). If two random variables X and Y are statistically dependent (X ̸⊥ Y ), then one of the following causal explanations must hold:
i) X causes Y (X→Y),or ii) Y causes X(X←Y),or
iii) there exists a random variable Z that is the common cause of both X and Y (X←Z→Y).
作者的causal detection建立在ANM model的基础上。
In ANM, one assumes that the causal modelhastheformY =F(X)+N, where X ⊥ N. It turns out that, under some assumptions, the reverse ANM X = G(Y ) + E will not satisfy the independence assumption Y ⊥ E (Fig. 1).
ANM和RCC都是基础的因果关系预测网络,可以预测两个variable之间的因果关系.
2. The framework of proxy variables to estimate the causal relation between static entities
然而!!对于static variable来说,ANM和RCC都不符合条件!他们需要很多独立同分布的variable,但是static variable很明显只有几个,两个static variable无法直接计算causality。However, both ANM and RCC based methods need n ≫ 1 samples from P (X, Y ) to classify the causal relation between the random variables X and Y。
所以作者提出了 Proxy Variables and Proxy Projections
First, a proxy random variable W is a random variable taking values in some set W, which can be understood as a random source of information related to x and y. This definition is on purpose rather vague and will be illustrated through several examples in the following sections.
Second, a proxy projection is a function π : W × S → R. Using a proxy variable and projection, we can construct a pair of scalar random variables A = π(W, x) and B = π(W,y). A proxy variable and projection are causal if the pair of random entities (A, B) share the same causal footprint as the pair of static entities (x, y).
所以作者的办法就是:利用独立同分布的proxy variable W 来处理x 和 y. 然后利用同样的proxy projection 将x 和 y 映射到另一个空间 A和B. 如果A->B,则说明x->y。
3. Causal discovery in image and video
在图像领域,作者定义W为和图片相同大小的mask,其中只有10*10 大小的方块是1,其余是0. 这样的mask可以生成很多个。
而proxy projection 是两个矩阵的内积(对应位置元素相乘之和)。所以如图Figure 3, 就是mask=1区域内图像像素之和。同理。然后作者用ANM计算{}的causality。如果, 说明 .
作者用相同的方法证明了,一段视频里长度为8帧的前后帧帧图像之间的cause关系。
4. Causal discovery in NLP
在NLP领域。作者将proxy variable定为了从语料库里挑选出来的10,000个不重复的单词。
proxy projection定义为语料库单词与目标单词之间的距离等因素,作者选了好几个。
作者还收集了一个数据集,内含10,000 具有causal relationship的单词对
然后训练RCC,用RCC分类caulsality。
总结:
这两篇论文让我对Causality的研究内容有了一个非常笼统的了解。两篇文章都是在试图发掘图像、NLP领域中的一些因果关系。但是文章一的重点在于提出了一种causality分类的网络,能够发现神经分类网络里面节点之间的因果关系。
我更喜欢文章二,逻辑合理,论证也详细。提出了一种发掘static variable 因果关系的解决办法,可以解决两张图像之间的因果关系。当然,作者的这种办法可行与否,很大程度上取决于proxy variable和proxy projection的选择,而且他们的选择在不同任务下肯定是不同的。
关于未来的应用,我的粗略想法是,如果能发掘出attribute与object之间的因果关系,发现那些attribute是object的直接结果??.
发现的过程可以利用文章一中的方法,挖掘网络节点之间的因果关系(因为网络节点就是代表了相应的attribute和object)。
然后利用发现的有强相关性的attribute对zero-shot网络进行提升?
这个想法还有很多问题需要解决。