qemu live migration 优化 1(compress and xbzrle)

qemu本身对动态迁移有丰富的优化项,通过qemu monitor可以查看

(qemu) info migrate_capabilities
xbzrle: off
rdma-pin-all: off
auto-converge: off
zero-blocks: off
compress: off
events: off
postcopy-ram: off
x-colo: off
release-ram: off
block: off
return-path: off
pause-before-switchover: off
x-multifd: off
dirty-bitmaps: off
postcopy-blocktime: off
late-block-activate: off

有部分优化选项比较常见,比如postcopy-ram, compress。查询了一下这些优化项,有些优化项完全没有资料,看来只能看源码来了解它的作用了。接下来会尝试这些优化项并学习其原理,看看它们对于迁移优化的程度。


1. compress

打开compress的选项后,服务器会在迁移前对ram中的数据做压缩,在测试中,压缩的比率差异很大,从10%到80%,对于带宽不足的虚拟机迁移效率提升有所帮助。但是做压缩的时候会对cpu有额外的消耗,并且压缩也会耗时,所以在带宽足够大的情况下,迁移前进行压缩反而会导致动态迁移时间变长。

可以开启压缩和解压的多线程,可以加速压缩的速率,默认情况下配置的是compress 8 threads, decompress 2 threads,由于用的压缩算法是zlib,压缩和解压的速率差4倍,所以配置多线程时建议压缩线程数是解压线程数的四倍。

此外,可以对压缩的比率和压缩速度进行调节。level 0代表不压缩,level 1代表压缩速度最快但是压缩比率最低,level 9代表压缩比率最高,但是相应压缩速度也最慢。

虚拟机中模拟内存占用,脚本

[root@localhost ~]# cat make_cache_2.2G.sh
#!/bin/bash -x
mount -t ramfs ramfs z/
cp 800m_file z/1
cp 800m_file z/2
[root@localhost ~]# free -h
              total        used        free      shared  buff/cache   available
Mem:           4.9G        180M        2.5G         11M        2.2G        4.0G
Swap:          4.0G          0B        4.0G

把两个800m的文件放到ramfs中,可以看到memory中buff/cache已经上升到了2.2G


在qemu monitor 中配置

源端服务器

(qemu) migrate_set_capability compress on
(qemu) migrate_set_parameter compress-threads 1
(qemu) migrate_set_parameter compress-level 1

目的端服务器

(qemu) migrate_set_capability compress on
(qemu) migrate_set_parameter decompress-threads 2

迁移结果对比

  no compress

compressed threads: 8

decompressed threads: 2 

compression level: 1

total time(msec): 33172 37697 (13.6%↑)
downtime(msec): 131 17 (87%)
transfered ram(kB): 3728602 3464038 (7.1%↓)
throughput(mbps): 921.04 752.96 (18.2%↓)
total ram(kB): 5374400 5374400
p.s. 绿色代表会导致迁移效率下降,红色代表能提高迁移效率


2. xbzrle

Xor Based Zero Run Length Encoding

简单说来,这个技术就是用亦或的方式找到memory 变化的部分(不开启xbzrle时,会把整张memory page进行传输),然后压缩再传输给目的服务器,从而减少了上传数据量,使得dirty pages能够尽快减少到可以down机再完整迁移的数量。尤其是对于那些memory write intensive workload的虚拟机的动态迁移,会有很大的帮助。准确的说是,不开启这个选项,由于虚拟机 memory wirte很频繁,dirty pages的数量始终维持在一个数量级, 使得迁移一直进行,无法进入到down机把剩余dirty page迁移过去的阶段。

为了找到memory变化的部分,原始memory会被储存在源服务器的cache里用以做比较,储存memory的cache对cache命中率有影响,还未对这个做深入的研究。

为了做这个测试,写了一个内存读写的程序。

#include 
#include 
int main()
{
    long int buf_length;
    long int buf_num;
    char c[100];
    buf_length = 4096;
    int compare_result;


    printf("Enter memory you want to test: ");
    gets(c);
    if (strcmp(c,"128m")==0)
        compare_result = 1;
    else if(strcmp(c,"256m")==0)
        compare_result = 2;
    else if(strcmp(c,"512m")==0)
        compare_result = 3;
    else if(strcmp(c,"1g")==0)
        compare_result = 4;
    else if(strcmp(c,"2g")==0)
        compare_result = 5;
    else
        compare_result = 0;

    switch(compare_result)
        {
        case 1:
                buf_num = 32768;
                printf("start generate 128M memory r/w load, ctrl+c to quit. \n");
                break;
        case 2:
                buf_num = 65536;
                printf("start generate 256M memory r/w load, ctrl+c to quit. \n");
                break;
        case 3:
                buf_num = 131072;
                printf("start generate 512M memory r/w load, ctrl+c to quit. \n");
                break;
        case 4:
                buf_num = 262144;
                printf("start generate 1G memory r/w load, ctrl+c to quit. \n");
                break;
        case 5:
                buf_num = 524288;
                printf("start generate 2G memory r/w load, ctrl+c to quit. \n");
                break;
        default:
                buf_num = 0;
                printf("please input following size 128m, 256m, 512, 1g, 2g. ");
                break;
        }

    if(buf_num!=0){
        printf("use free -h to monitor ");
        char *buf = (char *) calloc(buf_num, buf_length);
        while (1) {
                long int i;
                for (i = 0; i < buf_num * 4 ; i++) {
                 buf[i * buf_length / 4 ]++;
                }
                printf(".");
        }
    }
    else{
        printf("please input correct size. \n");
    }
}

可以根据输入占用128M - 2G内存,并进行读写。

在qemu monitor 中配置

源端服务器

(qemu) migrate_set_capability xbzrle on
(qemu)  migrate_set_cache_size 256m

默认的cache size是64M,最好配置cashe size大于被读写的内存,否则xbzrle cache miss 会很大,会导致迁移一直持续。

未打开xbzrle,在虚拟机中开启内存占用128M,再进行动态迁移。可以看到已经传输了11G的memory,迁移还未完成

(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: off
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off block: off return-path: off pause-before-switchover: off x-multifd: off dirty-bitmaps: off postcopy-blocktime: off late-block-activate: off
Migration status: active
total time: 99833 milliseconds
expected downtime: 4872 milliseconds
setup: 4 milliseconds
transferred ram: 11202151 kbytes
throughput: 891.65 mbps
remaining ram: 240188 kbytes
total ram: 5374400 kbytes
duplicate: 1258520 pages
skipped: 0 pages
normal: 2792316 pages
normal bytes: 11169264 kbytes
dirty sync count: 23
page size: 4 kbytes
multifd bytes: 0 kbytes
dirty pages rate: 31494 pages

当打开xbzrle时,可以看到迁移很快就完成了

(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: off
capabilities: xbzrle: on rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off block: off return-path: off pause-before-switchover: off x-multifd: off dirty-bitmaps: off postcopy-blocktime: off late-block-activate: off
Migration status: completed
total time: 6405 milliseconds
downtime: 16 milliseconds
setup: 4 milliseconds
transferred ram: 601813 kbytes
throughput: 770.35 mbps
remaining ram: 0 kbytes
total ram: 5374400 kbytes
duplicate: 1246173 pages
skipped: 0 pages
normal: 146870 pages
normal bytes: 587480 kbytes
dirty sync count: 8
page size: 4 kbytes
multifd bytes: 0 kbytes
cache size: 268435456 bytes
xbzrle transferred: 2232 kbytes
xbzrle pages: 87614 pages
xbzrle cache miss: 46297
xbzrle cache miss rate: 15.33
xbzrle overflow : 0

可以看到info migrate多出了几项和xbzrle有关的参数,看字面意义就能理解,这里就不做解释了。


你可能感兴趣的:(virtulization,qemu,monitor,live,migration,tunning)