【一种不用数据的 one-shot FL 异构训练方法】A PRACTICAL DATA-FREE APPROACH TO ONE-SHOT FL WITH HETEROGENEITY

本文介绍了一种 data-free 的 one-shot FL异构学习方法。
区别于先前方法,本方法实现了以下的优点:

(1) FedSyn requires no additional information (except the model parameters) to be transferred between clients and the server; (practical
(2) FedSyn does not require any auxiliary dataset for training; (data-free
(3) FedSyn is the first to consider both model and statistical heterogeneities in FL, i.e., the clients’ data are non-iid and different clients may have different model architectures. (heterogeneities

目录

  • 为什么选择 one-shot FL?
  • 同构还是异构?
  • FedSyn

为什么选择 one-shot FL?

Moreover, frequently sharing information is at a high risk of being attacked。除此以外,malicious attackers (主动“投毒”攻击的攻击者)在多轮训练中会造成更大的危害。
one-shot 可以更好的保护隐私,并且节省了大量的 communication cost( more practical in real-world applications (e.g., model market))。

但是 one-shot FL存在一个问题:However, the main problem of one-shot FL is the difficult convergence of the global model and it is hard to achieve a satisfactory performance especially when the data on clients are non-iid.

同构还是异构?

同构网络更加容易遭受隐私推断的攻击,尤其是针对神经网络模型。如果你不知道目标模型的结构,那么所获取的信息变少,推断的难度也会增加。

先前对于 one-shot 的研究主要集中在同构FL框架,但是异构情况更加常见 : different clients have different model architectures [18], which is very common in practical scenarios.

接下来介绍本文方法。

FedSyn

分为 2 个 stage:
In the first stage, we utilize the ensemble models (i.e., ensemble of local models uploaded by clients) to train a generator, which can generate high-quality unlabeled data (解决了模型蒸馏问题中data来源的问题,实现了 data-free) for training in the second stage.
In the second stage, we distill the knowledge of the ensemble models to the (server’s) global model.

如下图所示:

【一种不用数据的 one-shot FL 异构训练方法】A PRACTICAL DATA-FREE APPROACH TO ONE-SHOT FL WITH HETEROGENEITY_第1张图片
整个流程的算法:
【一种不用数据的 one-shot FL 异构训练方法】A PRACTICAL DATA-FREE APPROACH TO ONE-SHOT FL WITH HETEROGENEITY_第2张图片

为什么采用 generator 生成数据?
our goal is to train a generator that can generate high-quality data that are similar to the training data of clients, i.e., share the same distribution with the training data of clients.
为什么不采用 GAN 来生产数据?
Recent work [21] generated data by utilizing a pre-trained generative adversarial network (GAN). However, such a method is unable to generate high-quality data as the pre-trained GAN is trained on public datasets, which is likely to have different data distribution from the training data of clients. 看来GAN也是有局限性的。

同时,文章提出蒸馏中一个有趣的问题:
【一种不用数据的 one-shot FL 异构训练方法】A PRACTICAL DATA-FREE APPROACH TO ONE-SHOT FL WITH HETEROGENEITY_第3张图片
左图,生成的数据可能是红色的,但是看右图,此时黑色圆圈也变得很有必要,为此,本文还提出了 boundary support loss 这一概念,which urges the generator to generate more synthetic data between the decision boundaries of the ensemble models and the global model.

你可能感兴趣的:(论文阅读,深度学习,pytorch,python,机器学习)