地址:启智社区:https://openi.pcl.ac.cn/
云燧T20是基于邃思2.0芯片打造的面向数据中心的第二代人工智能训练加速卡,具有模型覆盖面广、性能强、软件生态开放等特点,可支持多种人工智能训练场景。同时具备灵活的可扩展性,提供业界领先的人工智能算力集群方案。
优势特点
https://openi.pcl.ac.cn/Enflame/GCU_PaddlePaddle_Example
Resnet+imagenet_raw
单卡单Epoch
"model": "ResNet50",
"local_rank": 0,
"batch_size": 64,
"epochs": 1,
"best_acc1": 0.05368589743589743,
"device": "gcu",
"skip_steps": 5,
"early_stop_steps": -1,
"train_fps_mean": 181.94580085847983,
"train_fps_min": 171.20650785663634,
"train_fps_max": 185.50593755138325,
"training_time": "0:12:37"
fps_mean:181.95
最佳acc:0.05368589743589743
8卡单epochs
"model": "ResNet50",
"local_rank": 0,
"batch_size": 64,
"epochs": 1,
"best_acc1": 0.03766025641025641,
"device": "gcu",
"skip_steps": 10,
"early_stop_steps": -1,
"train_fps_mean": 132.09731651456303,
"train_fps_min": 124.26364291218985,
"train_fps_max": 154.88106976141714,
"training_time": "0:08:09"
fps_mean:132.09731651456303,
最佳acc:00.03766025641025641
线性度:72.6%
8卡50epochs
"model": "ResNet50",
"local_rank": 0,
"batch_size": 64,
"epochs": 50,
"best_acc1": 0.7596153846153846,
"device": "gcu",
"skip_steps": 10,
"early_stop_steps": -1,
"train_fps_mean": 136.83746977332163,
"train_fps_min": 57.52560204784666,
"train_fps_max": 184.07473928475426,
"training_time": "1:02:08"
fps_mean:136.83746977332163
最佳acc:0.7596153846153846
心得
通过运行可以发现单Epoch或者Epochs数量不多的情况下多卡其实优势并不明显,只有数据集较大以及Epochs较多的情况下多卡才能体现出他的优势
飞桨在GCU下运行resnet+imagenet_raw总体感觉速度没有torch在GCU下面运行的快
同样的batch_size(64)飞桨在GCU下运行需要1小时02分,但是pytorch 100个epoch也只用了1小时18分22秒,可能飞桨在GCU下还可以继续优化
建议