Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect

  1. The data will be inevitably long-tailed.
    For example, if we target at increasing the images of tail class instances like “remote controller”, we have to bring in more head instance like “sofa” and “TV” simultaneously in every newly added image.

  2. The paradoxical effects of long tail:
    “bad” bias: It is bad because the classification is severely biased towards the data-rich data.
    “good” bias: It is good because the long-tailed distribution essentially encodes the natural inter-dependencies of classes , for example, '‘TV’ is indeed a good context for “remote controller”. Any disrespect of it will hurt the feature representaion learning, e.g., re-weighting or re-sampling, inevitably causes under-fitting to the head or over-fitting to the tail.

  3. Causal graph
    Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect_第1张图片
    Revisit the “bad” bias in a causal view:

  1. the backdoor path X<----- M–> D --> Y causes the spurious correlation even if X has nothing to do with the prediction Y, e.g., misclassifying a tail sample to the head (rich data).
  2. the mediation path X—> D —> Y mixes up the pure contribution made by X —> Y.
    Summary: What we want is X --> Y, but it will be mixed up by X<----- M–> D --> Y and X—> D —> Y.

Revisit the “good” bias in a causal view:
X—> D —> Y respects the inter-relationships of the semantic concepts in classification, that is, the head class knowledge contributes a reliable evidence to filter out wrong predictions. For example, if a rare sample is closer to the head class “TV” and “sofa”, it is more likely to be a living room object (e.g., “remote controller”) but not an outdoor one (e.g., “car”).

  1. A principle solution for long-tailed classification
    Pursuing the direct causal effect along the X —> Y by removing the momentum effect (cut off the backdoor path X<----- M–> D --> Y ).
    Keep the “good” (retain the mediation path X—> D —> Y) while remove the “bad”.
    Calculate the direct causal effect of X --> Y as the final prediction logits.

  2. decomposition of feature vector
    X—> D —> Y and X —> Y, these links indicate that the effect of X can be disentangled into an indirect (mediation) and a direct effect. We can decompose the feature vector 在这里插入图片描述
    The indirect effect is affected by d while the direct effect is affected by the x(…).

  3. The proposed solution
    Before we calculate the final Total Direct Effect (TDE), we need to first perform de-confounded training to estimate the “modified” causal graph parameters (modify the graph by removing the link M -->X).
    Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect_第2张图片
    do-operator denotes the causal intervention. do-operator removes the “bad” confounder bias while keep the “good” mediator bias because the do-operator retains the mediation path X—> D —> Y and remove the path M --> X.

你可能感兴趣的:(人工智能)