Federated Learning: Strategies for Improving Communication Efficiency

Federated Learning: Strategies for Improving Communication Efficiency

问题

联邦学习的通信效率至关重要,考虑到naive implementation 的联邦学习要求每个client每轮发送一个完整的模型,对于大模型,这是bottleneck:不对称或者不一致的的网络连接上传下载速度,(上传速度远小于下载速度)

假设

考虑了一个理想的synchronized algorithms for FL:
A subset of existing clients is selected, each of which downloads the current model.
Each client in the subset computes an updated model based on their local data.
The model updates are sent from the selected clients to the server.
The server aggregates these models(typically by averaging)to construct an improved global model.

创新

Structed Update

Low rank: represent H in AB ,fix A in a random matrix which can be express in random seed. and send trained B. And A afresh in each round and for each client independently
B as a projection matrix and A as a reconstruction matrix
Given a given random reconstruction, what is the project that will recover most information?
Random mask
update H in a spare matrix, following a predefined random sparsity pattern. 类似dropout的操作手法,随机概率丢弃部分点
比较:low rank will zero out some rank which will throw part of dimension away and it will cause much damage to the accuracy. Random mask get better performance which is mentioned in the paper, but didn’t explain the reason.

Sketched Update

Subsampling
only communicate matrix H′ which is formed from random subset of the values of H^i_t. Server will average it.
(It is familiar with random mask, but update the full model with some parament being throw away. Random mask train the spare)
probabilistic quantization:
模型量化在最初的定义里是为了压缩模型参数,比如韩松在ICLR2016上获得best paper的论文,首次提出了参数量化方法。其使用k-mean聚类,让相近的数值聚类到同一个聚类中心,复用同一个数值,从而达到用更少的数值表示更多的数,这是量化操作的一种方案With appling random rotation, the federated learning become more stable.

贡献

提出了两种策略减少上传的开销,结构性更新:直接学习更新结果从一个缩小的参数使用更少的变量譬如 低秩和随机掩码;草图更新:学习到完整的模型,更新并且压缩使用量化,随机旋转 下采样在发送到服务器之前
实验包括卷积和循环神经网络都展示出我们的方法可以降低开销两个数量级

实验

For low rank updates, ‘mode = 25%’ refers to the rank of the update being set to 1/4 of rank of the full layer transformation, for random mask or sketching, this refers to all but 25% of the parameters being zeroed out
low rank and mask comparsion: CIFAR 10
Random mask比sketch update(包含subsampling
和rotation的更新要好得多,但是skecth 更新的准确度更快。
由于草图更新将丢弃部分参数,因此最终,random mask
将具有更好的准确性。(这或许是dropout 牛逼的原因)

你可能感兴趣的:(联邦学习,联邦学习,神经网络,深度学习,机器学习,人工智能)