泛读论文:Self-taught Learning: Transfer Learning from Unlabeled Data

Self-taught Learning: Transfer Learning from Unlabeled Data

ICML2007

问题

  • Semi-supervised learning typically makes the additional assumption that the unlabeled data can be labeled with the same labels as the classification task, and that these labels are merely unobserved (Nigam et al., 2000)
  • Transfer learning typically requires further labeled data from a different but related task, and at its heart typically transfers knowledge from one supervised learning task to another; thus it requires additional labeled (and therefore often expensive-to-obtain) data, rather than unlabeled data, for these other supervised learning tasks.1 (Thrun, 1996; Caruana, 1997; Ando & Zhang, 2005)
  • 就是说半监督学习虽然可以用没标签的数据,但是要跟有标签的数据一个类别
  • 迁移学习是用的有标签数据
  • 本文提出的自学习(Self-taught Learning)自学习对于未标记的数据有很少的限制,相比于半监督学习和迁移学习更简单
  • 本算法中基向量的个数可以比输入维数大很多(PCA不行)

方法

  • 启发:作者发现许多从网上随机下载的图像也包含与大象和犀牛类似的基本视觉模式(如边缘)
  • 高层表达学习:用无标签的数据训练获得基向量b
    • m i n i m i z e b , a ∑ i ∣ ∣ x u ( i ) − ∑ j a j ( i ) b j ∣ ∣ 2 2 + β ∣ ∣ a ( i ) ∣ ∣ 1 minimize_{b,a} \sum_i||x_u^{(i)}-\sum_ja_j^{(i)}b_j||_2^2+\beta||a^{(i)}||_1 minimizeb,aixu(i)jaj(i)bj22+βa(i)1
    • 这里第二项用L1约束 a a a 使得更稀疏
  • 泛读论文:Self-taught Learning: Transfer Learning from Unlabeled Data_第1张图片
  • 无监督特征构建:对训练数据 T ^ = { a ^ ( x l ( i ) ) , y ( i ) } i = 1 m \hat{T}=\{\hat{a}(x_l^{(i)}),y^{(i)}\}_{i=1}^m T^={ a^(xl(i)),y(i)}i=1m用该空间表示
    • a ^ ( x l ( i ) ) = a r g m i n a ( i ) ∣ ∣ x ; ( i ) − ∑ j a j ( i ) b j ∣ ∣ 2 2 + β ∣ ∣ a ( i ) ∣ ∣ 1 \hat{a}(x_l^{(i)})=arg min_{a^{(i)}}||x_;^{(i)}-\sum_ja_j^{(i)}b_j||_2^2+\beta||a^{(i)}||_1 a^(xl(i))=argmina(i)x;(i)jaj(i)bj22+βa(i)1
  • 最后通过有监督学习算法(比如SVM)

收获

  1. 设计带稀疏性质的目标函数可以通过增加L1正则化
  2. 提取出来的基向量有点像CNN学出来的feature map

参考

http://ai.stanford.edu/~hllee/icml07-selftaughtlearning.pdf

你可能感兴趣的:(论文笔记)