Traditional data mining algorithms work well under a strict assumption: the training data and test data are drawn from the same distribution in the same feature space with the same set of class labels. In many real world problems, however, we usually do not have sufficient training data that satisfies this assumption. In order to apply traditional data mining algorithms, we may need to label a lot of training data, which is dreadfully expensive. It would be extremely useful if we can transfer the knowledge from other available data to our intended task while avoiding the effort of data labeling. In this workshop, we call for papers on the topic of transfer mining: transfer learning in data mining. There are several challenges to successfully transfer knowledge between different tasks. A first challenge is to judge the relatedness between tasks and avoid negative transfer. A second challenge is when given related tasks, decide what to transfer. Tasks may share some hyper-parameters, some features or some instances. It is nontrivial to decide what kind of knowledge should be transferred. Finally, how to transfer knowledge efficiently and effectively is another important issue. Transfer mining, which aims at transferring of knowledge between different domains and tasks in data mining, has emerged as one of the most active areas in data mining. There is a strong need to boost the research on knowledge transfer in the data mining community. Unlike in ICML/NIPS venues, the workshop will invite papers that address knowledge transfer from a data mining perspective. We welcome theoretical and applied disseminations that make efforts (1) to expose novel knowledge transfer methodology, frameworks and KDD processes for transfer mining. (2) to investigate effective (automated, human-machined-cooperated) principles and techniques for acquiring, representing, modeling and engaging transfer mining in real-world data mining, (3) trends and directions of transfer mining in both theories and applications. The workshop on Transfer Mining will bring active researchers and industry practitioners together toward developing next-generation KDD theories. It will also further benefit the deployment of knowledge discovery in real world applications, and reduce the gap between data mining and machine learning, industry and practice.


Topics (The topics of interest include but are not limited to the following:)

Knowledge transfer on relational and heterogeneous data

Transfer mining for different types of data mining algorithms, including association rules, decision tree, KNN, K-means and so on.

Feature selection, extraction and construction in transfer mining.

Transferring among multiple related but different data sources.

Theory and algorithms to help avoid negative transfer.

Transfer mining on very large-scale data.

Transfer mining in novel applications, such as Web, social networks, sensor networks and bioinformatics.

Unsupervised and semi-supervised transfer mining.


迁移学习( Transfer Learning




  在传统的机器学习的框架下,学习的任务就是在给定充分训练数据的基础上来学习一个分类模型;然后利用这个学习到的模型来对测试文档进行分类与预测。然而,我们看到机器学习算法在当前的Web挖掘研究中存在着一个关键的问题:一些新出现的领域中的大量训练数据非常难得到。我们看到Web应用领域的发展非常快速。大量新的领域不断涌现,从传统的新闻,到网页,到图片,再到博客、播客等等。传统的机器学习需要对每个领域都标定大量训练数据,这将会耗费大量的人力与物力。而没有大量的标注数据,会使得很多与学习相关研究与应用无法开展。其次,传统的机器学习假设训练数据与测试数据服从相同的数据分布。然而,在许多情况下,这种同分布假设并不满足。通常可能发生的情况如训练数据过期。这往往需要我们去重新标注大量的训练数据以满足我们训练的需要,但标注新数据是非常昂贵的,需要大量的人力与物力。从另外一个角度上看,如果我们有了大量的、在不同分布下的训练数据,完全丢弃这些数据也是非常浪费的。如何合理的利用这些数据就是迁移学习主要解决的问题。迁移学习可以从现有的数据中迁移知识,用来帮助将来的学习。迁移学习(Transfer Learning)的目标是将从一个环境中学到的知识用来帮助新环境中的学习任务。因此,迁移学习不会像传统机器学习那样作同分布假设。






  基于实例的迁移学习的基本思想是,尽管辅助训练数据和源训练数据或多或少会有些不同,但是辅助训练数据中应该还是会存在一部分比较适合用来训练一个有效的分类模型,并且适应测试数据。于是,我们的目标就是从辅助训练数据中找出那些适合测试数据的实例,并将这些实例迁移到源训练数据的学习中去。在基于实例的迁移学习方面,我们推广了传统的AdaBoost算法,提出一种具有迁移能力的boosting算法:Tradaboosting [9],使之具有迁移学习的能力,从而能够最大限度的利用辅助训练数据来帮助目标的分类。我们的关键想法是,利用boosting的技术来过滤掉辅助数据中那些与源训练数据最不像的数据。其中,boosting的作用是建立一种自动调整权重的机制,于是重要的辅助训练数据的权重将会增加,不重要的辅助训练数据的权重将会减小。调整权重之后,这些带权重的辅助训练数据将会作为额外的训练数据,与源训练数据一起从来提高分类模型的可靠度。








  2.1 基于特征的有监督迁移学习




  2.2 基于特征的无监督迁移学习:自学习聚类






  3 异构空间下的迁移学习:翻译学习




