使用python-crush均衡pg分布

作者: Yang Honggang

ceph默认创建pool时,其pg分布很不均衡。这会造成有些osd很忙,有些很闲。不能充分发挥整体的性能。

本文以rgwecpool ec pool为例子,演示如果让一个pool的pg均衡分布到各个osd上。本例子中使用的是jewel版本(v10.2.2)。

对于社区master已经有 mgr balancer plugin 来自动调整pg分布(https://www.spinics.net/lists/ceph-devel/msg37730.html)。

1. 安装 crush python 工具

$ git clone https://github.com/yanghonggang/python-crush.git
$ cd python-crush
$ git checkout -b v1.0.38
$ python setup.py bdist_wheel
$ pip install dist/crush-1.0.38.dev4-cp27-cp27mu-linux_x86_64.whl

2. 评估 rgwecpool

2.1 确定pool id

  # ceph osd pool ls detail --cluster xtao
  ...
  pool 185 'testpool' replicated size 3 min_size 1 crush_ruleset 6 object_hash rjenkins pg_num 256 pgp_num 256 last_change 4996 flags hashpspool stripe_width 0
  pool 186 'rgwecpool' erasure size 4 min_size 3 crush_ruleset 4 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 5001 flags hashpspool stripe_width 4128

2.2 评估pg分布

  # ceph osd crush dump --cluster xtao > crushmap-hehe.json
  # crush analyze --rule rgwecpool --type device  --replication-count 4 --pool 186 --pg-num 1024 --pgp-num 1024 --crushmap crushmap-hehe.json
          ~id~  ~weight~  ~PGs~  ~over/under filled %~
  ~name~                                              
  osd.3      3       6.0    101                  23.29
  osd.45    45       6.0     95                  15.97
  osd.57    57       6.0     94                  14.75
  osd.16    16       6.0     92                  12.30
  osd.30    30       6.0     92                  12.30
  osd.32    32       6.0     91                  11.08
  osd.33    33       6.0     90                   9.86
  osd.41    41       6.0     90                   9.86
  osd.42    42       6.0     88                   7.42
  osd.19    19       6.0     87                   6.20
  osd.35    35       6.0     87                   6.20
  osd.44    44       6.0     86                   4.98
  osd.23    23       6.0     86                   4.98
  osd.21    21       6.0     86                   4.98
  osd.50    50       6.0     85                   3.76
  osd.5      5       6.0     85                   3.76
  osd.31    31       6.0     85                   3.76
  osd.58    58       6.0     85                   3.76
  osd.34    34       6.0     84                   2.54
  osd.10    10       6.0     83                   1.32
  osd.4      4       6.0     83                   1.32
  osd.51    51       6.0     83                   1.32
  osd.43    43       6.0     82                   0.10
  osd.40    40       6.0     82                   0.10
  osd.39    39       6.0     82                   0.10
  osd.38    38       6.0     82                   0.10
  osd.20    20       6.0     82                   0.10
  osd.2      2       6.0     81                  -1.12
  osd.22    22       6.0     81                  -1.12
  osd.56    56       6.0     81                  -1.12
  osd.7      7       6.0     80                  -2.34
  osd.11    11       6.0     80                  -2.34
  osd.28    28       6.0     80                  -2.34
  osd.18    18       6.0     79                  -3.56
  osd.29    29       6.0     78                  -4.79
  osd.8      8       6.0     78                  -4.79
  osd.14    14       6.0     77                  -6.01
  osd.47    47       6.0     77                  -6.01
  osd.55    55       6.0     76                  -7.23
  osd.46    46       6.0     76                  -7.23
  osd.27    27       6.0     76                  -7.23
  osd.6      6       6.0     75                  -8.45
  osd.53    53       6.0     74                  -9.67
  osd.9      9       6.0     74                  -9.67
  osd.26    26       6.0     73                 -10.89
  osd.17    17       6.0     73                 -10.89
  osd.54    54       6.0     72                 -12.11
  osd.52    52       6.0     71                 -13.33
  osd.15    15       6.0     69                 -15.77
  osd.59    59       6.0     67                 -18.21
  
  Worst case scenario if a host fails:
  
          ~over filled %~
  ~type~                 
  device            23.29
  host               2.54
  root               0.00

2.3 查看集群中的真实pg分布

  # ceph osd df --cluster xtao  // 和评估的一致
  ID WEIGHT  REWEIGHT SIZE  USE    AVAIL %USE VAR  PGS 
  ... 
  50 6.00000  1.00000 5581G   151M 5581G 0.00 0.83  85 
  51 6.00000  1.00000 5581G   135M 5581G 0.00 0.74  83 
  52 6.00000  1.00000 5581G   139M 5581G 0.00 0.76  71 
  53 6.00000  1.00000 5581G   136M 5581G 0.00 0.75  74 
  54 6.00000  1.00000 5581G   132M 5581G 0.00 0.72  72 
  55 6.00000  1.00000 5581G   134M 5581G 0.00 0.73  76 
  56 6.00000  1.00000 5581G   138M 5581G 0.00 0.76  81 
  57 6.00000  1.00000 5581G   144M 5581G 0.00 0.79  94 
  58 6.00000  1.00000 5581G   139M 5581G 0.00 0.76  85 
  59 6.00000  1.00000 5581G   133M 5581G 0.00 0.73  67 
  38 6.00000  1.00000 5581G   129M 5581G 0.00 0.71  82 
  39 6.00000  1.00000 5581G   146M 5581G 0.00 0.80  82 
  40 6.00000  1.00000 5581G   133M 5581G 0.00 0.73  82 
  41 6.00000  1.00000 5581G   135M 5581G 0.00 0.74  90 
  42 6.00000  1.00000 5581G   130M 5581G 0.00 0.71  88 
  43 6.00000  1.00000 5581G   140M 5581G 0.00 0.76  82 
  44 6.00000  1.00000 5581G   142M 5581G 0.00 0.78  86 
  45 6.00000  1.00000 5581G   158M 5581G 0.00 0.86  95 
  46 6.00000  1.00000 5581G   142M 5581G 0.00 0.78  76 
  47 6.00000  1.00000 5581G   139M 5581G 0.00 0.76  77 
  26 6.00000  1.00000 5581G   134M 5581G 0.00 0.74  73 
  27 6.00000  1.00000 5581G   125M 5581G 0.00 0.68  76 
  28 6.00000  1.00000 5581G   137M 5581G 0.00 0.75  80 
  29 6.00000  1.00000 5581G   133M 5581G 0.00 0.73  78 
  30 6.00000  1.00000 5581G   145M 5581G 0.00 0.80  92 
  31 6.00000  1.00000 5581G   134M 5581G 0.00 0.73  85 
  32 6.00000  1.00000 5581G   140M 5581G 0.00 0.76  91 
  33 6.00000  1.00000 5581G   145M 5581G 0.00 0.79  90 
  34 6.00000  1.00000 5581G   136M 5581G 0.00 0.74  84 
  35 6.00000  1.00000 5581G   128M 5581G 0.00 0.70  87 
  14 6.00000  1.00000 5581G   141M 5581G 0.00 0.77  77 
  15 6.00000  1.00000 5581G   135M 5581G 0.00 0.74  69 
  16 6.00000  1.00000 5581G   136M 5581G 0.00 0.74  92 
  17 6.00000  1.00000 5581G   136M 5581G 0.00 0.74  73 
  18 6.00000  1.00000 5581G   127M 5581G 0.00 0.70  79 
  19 6.00000  1.00000 5581G   140M 5581G 0.00 0.77  87 
  20 6.00000  1.00000 5581G   148M 5581G 0.00 0.81  82 
  21 6.00000  1.00000 5581G   151M 5581G 0.00 0.83  86 
  22 6.00000  1.00000 5581G   144M 5581G 0.00 0.79  81 
  23 6.00000  1.00000 5581G   128M 5581G 0.00 0.70  86 
   2 6.00000  1.00000 5581G   134M 5581G 0.00 0.74  81 
   3 6.00000  1.00000 5581G   131M 5581G 0.00 0.71 101 
   4 6.00000  1.00000 5581G   130M 5581G 0.00 0.71  83 
   5 6.00000  1.00000 5581G   125M 5581G 0.00 0.68  85 
   6 6.00000  1.00000 5581G   136M 5581G 0.00 0.74  75 
   7 6.00000  1.00000 5581G   144M 5581G 0.00 0.79  80 
   8 6.00000  1.00000 5581G   146M 5581G 0.00 0.80  78 
   9 6.00000  1.00000 5581G   136M 5581G 0.00 0.74  74 
  10 6.00000  1.00000 5581G   138M 5581G 0.00 0.75  83 
  11 6.00000  1.00000 5581G   140M 5581G 0.00 0.77  80 
                TOTAL  327T 11008M  327T 0.00          
  MIN/MAX VAR: 0.68/5.04  STDDEV: 0.00

2.4 通过调整bucket的weight值来均衡pg分布

  // 需要确保集群状态不能为ERROR
  # ceph report --cluster xtao > report.json
  #  crush optimize --crushmap report.json --out-path op.crush  --pg-num 1024 --pgp-num 1024 --pool 186 --rule rgwecpool --out-format crush --out-version jewel --type device
  2017-12-08 15:21:36,796 argv = optimize --crushmap report.json --out-path op.crush --pg-num 1024 --pgp-num 1024 --pool 186 --rule rgwecpool --out-format txt --out-version jewel --type device --replication-count=4 --pg-num=1024 --pgp-num=1024 --rule=rgwecpool --out-version=j --no-positions --choose-args=186
  2017-12-08 15:21:36,858 hdd optimizing
  2017-12-08 15:21:44,447 hdd wants to swap 246 PGs
  2017-12-08 15:21:44,487 xt5-hdd optimizing
  2017-12-08 15:21:44,497 xt4-hdd optimizing
  2017-12-08 15:21:44,507 xt3-hdd optimizing
  2017-12-08 15:21:44,521 xt2-hdd optimizing
  2017-12-08 15:21:44,530 xt1-hdd optimizing
  2017-12-08 15:21:52,316 xt1-hdd wants to swap 34 PGs
  2017-12-08 15:21:53,629 xt2-hdd wants to swap 31 PGs
  2017-12-08 15:21:53,862 xt5-hdd wants to swap 43 PGs
  2017-12-08 15:22:00,223 xt3-hdd wants to swap 44 PGs
  2017-12-08 15:23:52,603 xt4-hdd wants to swap 30 PGs

2.5 导入修改后的 map

  // 备份集群的crushmap
  # ceph osd getcrushmap --cluster xtao > crushmap.bak
  got crush map from osdmap epoch 5003
  # ceph osd setcrushmap -i new.bin --cluster xtao
  set crush map

2.6 查看均衡效果

  # ceph -s --cluster xtao
      cluster 4acceaa6-136f-11e7-9e17-ac1f6b1196ad
       health HEALTH_OK
       monmap e3: 3 mons at {xt1=192.168.10.1:6789/0,xt2=192.168.10.2:6789/0,xt3=192.168.10.3:6789/0}
              election epoch 21560, quorum 0,1,2 xt1,xt2,xt3
       osdmap e5007: 60 osds: 60 up, 60 in
              flags sortbitwise
        pgmap v3248856: 1760 pgs, 17 pools, 215 kB data, 3338 objects
              10424 MB used, 327 TB / 327 TB avail
                  1760 active+clean
  # ceph osd df --cluster xtao
  ID WEIGHT  REWEIGHT SIZE  USE    AVAIL %USE VAR  PGS 
  .... 
  50 5.83199  1.00000 5581G   141M 5581G 0.00 0.82  81 
  51 6.00600  1.00000 5581G   126M 5581G 0.00 0.73  82 
  52 6.87099  1.00000 5581G   129M 5581G 0.00 0.74  82 
  53 6.32100  1.00000 5581G   127M 5581G 0.00 0.74  82 
  54 6.37900  1.00000 5581G   121M 5581G 0.00 0.70  82 
  55 6.05899  1.00000 5581G   123M 5581G 0.00 0.71  82 
  56 6.00000  1.00000 5581G   128M 5581G 0.00 0.74  82 
  57 4.90799  1.00000 5581G   134M 5581G 0.00 0.77  82 
  58 5.00699  1.00000 5581G   130M 5581G 0.00 0.75  82 
  59 6.61899  1.00000 5581G   123M 5581G 0.00 0.71  82 
  38 6.17099  1.00000 5581G   119M 5581G 0.00 0.69  82 
  39 6.00000  1.00000 5581G   136M 5581G 0.00 0.78  82 
  40 6.16800  1.00000 5581G   123M 5581G 0.00 0.71  81 
  41 5.49399  1.00000 5581G   125M 5581G 0.00 0.72  81 
  42 6.00000  1.00000 5581G   120M 5581G 0.00 0.69  81 
  43 6.28400  1.00000 5581G   131M 5581G 0.00 0.76  81 
  44 5.93999  1.00000 5581G   133M 5581G 0.00 0.77  82 
  45 5.53699  1.00000 5581G   148M 5581G 0.00 0.85  83 
  46 6.34799  1.00000 5581G   129M 5581G 0.00 0.74  84 
  47 6.05899  1.00000 5581G   129M 5581G 0.00 0.75  82 
  26 6.71700  1.00000 5581G   124M 5581G 0.00 0.72  82 
  27 6.63899  1.00000 5581G   116M 5581G 0.00 0.67  80 
  28 6.21700  1.00000 5581G   127M 5581G 0.00 0.73  82 
  29 6.36800  1.00000 5581G   122M 5581G 0.00 0.71  82 
  30 5.14200  1.00000 5581G   137M 5581G 0.00 0.79  82 
  31 5.97299  1.00000 5581G   124M 5581G 0.00 0.72  81 
  32 5.59200  1.00000 5581G   130M 5581G 0.00 0.75  83 
  33 5.64899  1.00000 5581G   130M 5581G 0.00 0.75  82 
  34 5.93999  1.00000 5581G   126M 5581G 0.00 0.73  83 
  35 5.76399  1.00000 5581G   118M 5581G 0.00 0.68  82 
  14 6.32700  1.00000 5581G   131M 5581G 0.00 0.75  80 
  15 6.63899  1.00000 5581G   126M 5581G 0.00 0.73  82 
  16 5.42599  1.00000 5581G   126M 5581G 0.00 0.73  82 
  17 6.69398  1.00000 5581G   127M 5581G 0.00 0.73  82 
  18 6.22299  1.00000 5581G   118M 5581G 0.00 0.68  82 
  19 5.59200  1.00000 5581G   130M 5581G 0.00 0.75  82 
  20 5.93999  1.00000 5581G   139M 5581G 0.00 0.80  82 
  21 5.31898  1.00000 5581G   141M 5581G 0.00 0.82  82 
  22 6.12599  1.00000 5581G   134M 5581G 0.00 0.78  82 
  23 5.71300  1.00000 5581G   118M 5581G 0.00 0.68  83 
   2 6.60899  1.00000 5581G   124M 5581G 0.00 0.72  82 
   3 4.76199  1.00000 5581G   122M 5581G 0.00 0.70  82 
   4 5.76399  1.00000 5581G   120M 5581G 0.00 0.69  83 
   5 5.88100  1.00000 5581G   116M 5581G 0.00 0.67  82 
   6 6.11499  1.00000 5581G   126M 5581G 0.00 0.73  84 
   7 6.00000  1.00000 5581G   134M 5581G 0.00 0.77  82 
   8 6.20900  1.00000 5581G   131M 5581G 0.00 0.76  81 
   9 6.55099  1.00000 5581G   126M 5581G 0.00 0.73  81 
  10 6.10999  1.00000 5581G   122M 5581G 0.00 0.71  82 
  11 6.00000  1.00000 5581G   130M 5581G 0.00 0.75  81 
                TOTAL  327T 10424M  327T 0.00          
  MIN/MAX VAR: 0.67/5.27  STDDEV: 0.00

  # ceph osd tree --cluster xtao
  ID  WEIGHT    TYPE NAME            UP/DOWN REWEIGHT PRIMARY-AFFINITY 
  -43  30.00000 root test                                              
  -44   6.00000     host xt5-test                                      
   49   6.00000         osd.49            up  1.00000          1.00000 
  -45   6.00000     host xt4-test                                      
   37   6.00000         osd.37            up  1.00000          1.00000 
  -46   6.00000     host xt3-test                                      
   25   6.00000         osd.25            up  1.00000          1.00000 
  -47   6.00000     host xt2-test                                      
   13   6.00000         osd.13            up  1.00000          1.00000 
  -48   6.00000     host xt1-test                                      
    1   6.00000         osd.1             up  1.00000          1.00000 
  -37         0 root yhg                                               
  -38         0     host xt5-yhg                                       
  -39         0     host xt4-yhg                                       
  -40         0     host xt3-yhg                                       
  -41         0     host xt2-yhg                                       
  -42         0     host xt1-yhg                                       
  -36  30.00000 root meta                                              
  -35   6.00000     host xt5-meta                                      
   48   6.00000         osd.48            up  1.00000          1.00000 
  -34   6.00000     host xt4-meta                                      
   36   6.00000         osd.36            up  1.00000          1.00000 
  -33   6.00000     host xt3-meta                                      
   24   6.00000         osd.24            up  1.00000          1.00000 
  -32   6.00000     host xt2-meta                                      
   12   6.00000         osd.12            up  1.00000          1.00000 
  -31   6.00000     host xt1-meta                                      
    0   6.00000         osd.0             up  1.00000          1.00000 
  -30         0 root default                                           
  -29         0     host xt7-default                                   
  -28         0     host xt6-default                                   
  -27         0     host xt5-default                                   
  -26         0     host xt4-default                                   
  -25         0     host xt3-default                                   
  -24         0     host xt2-default                                   
  -23         0     host xt1-default                                   
  -22         0     host xt9-default                                   
  -21         0     host xt8-default                                   
  -20         0 root ssd                                               
  -19         0     host xt7-ssd                                       
  -18         0     host xt6-ssd                                       
  -17         0     host xt5-ssd                                       
  -16         0     host xt4-ssd                                       
  -15         0     host xt3-ssd                                       
  -14         0     host xt2-ssd                                       
  -13         0     host xt1-ssd                                       
  -12         0     host xt9-ssd                                       
  -11         0     host xt8-ssd                                       
  -10 299.99997 root hdd                                               
   -9         0     host xt7-hdd                                       
   -8         0     host xt6-hdd                                       
   -7  64.66899     host xt5-hdd                                       
   50   5.83199         osd.50            up  1.00000          1.00000 
   51   6.00600         osd.51            up  1.00000          1.00000 
   52   6.87099         osd.52            up  1.00000          1.00000 
   53   6.32100         osd.53            up  1.00000          1.00000 
   54   6.37900         osd.54            up  1.00000          1.00000 
   55   6.05899         osd.55            up  1.00000          1.00000 
   56   6.00000         osd.56            up  1.00000          1.00000 
   57   4.90799         osd.57            up  1.00000          1.00000 
   58   5.00699         osd.58            up  1.00000          1.00000 
   59   6.61899         osd.59            up  1.00000          1.00000 
   -6  55.80899     host xt4-hdd                                       
   38   6.17099         osd.38            up  1.00000          1.00000 
   39   6.00000         osd.39            up  1.00000          1.00000 
   40   6.16800         osd.40            up  1.00000          1.00000 
   41   5.49399         osd.41            up  1.00000          1.00000 
   42   6.00000         osd.42            up  1.00000          1.00000 
   43   6.28400         osd.43            up  1.00000          1.00000 
   44   5.93999         osd.44            up  1.00000          1.00000 
   45   5.53699         osd.45            up  1.00000          1.00000 
   46   6.34799         osd.46            up  1.00000          1.00000 
   47   6.05899         osd.47            up  1.00000          1.00000 
   -5  55.06900     host xt3-hdd                                       
   26   6.71700         osd.26            up  1.00000          1.00000 
   27   6.63899         osd.27            up  1.00000          1.00000 
   28   6.21700         osd.28            up  1.00000          1.00000 
   29   6.36800         osd.29            up  1.00000          1.00000 
   30   5.14200         osd.30            up  1.00000          1.00000 
   31   5.97299         osd.31            up  1.00000          1.00000 
   32   5.59200         osd.32            up  1.00000          1.00000 
   33   5.64899         osd.33            up  1.00000          1.00000 
   34   5.93999         osd.34            up  1.00000          1.00000 
   35   5.76399         osd.35            up  1.00000          1.00000 
   -4  63.15900     host xt2-hdd                                       
   14   6.32700         osd.14            up  1.00000          1.00000 
   15   6.63899         osd.15            up  1.00000          1.00000 
   16   5.42599         osd.16            up  1.00000          1.00000 
   17   6.69398         osd.17            up  1.00000          1.00000 
   18   6.22299         osd.18            up  1.00000          1.00000 
   19   5.59200         osd.19            up  1.00000          1.00000 
   20   5.93999         osd.20            up  1.00000          1.00000 
   21   5.31898         osd.21            up  1.00000          1.00000 
   22   6.12599         osd.22            up  1.00000          1.00000 
   23   5.71300         osd.23            up  1.00000          1.00000 
   -3  61.29399     host xt1-hdd                                       
    2   6.60899         osd.2             up  1.00000          1.00000 
    3   4.76199         osd.3             up  1.00000          1.00000 
    4   5.76399         osd.4             up  1.00000          1.00000 
    5   5.88100         osd.5             up  1.00000          1.00000 
    6   6.11499         osd.6             up  1.00000          1.00000 
    7   6.00000         osd.7             up  1.00000          1.00000 
    8   6.20900         osd.8             up  1.00000          1.00000 
    9   6.55099         osd.9             up  1.00000          1.00000 
   10   6.10999         osd.10            up  1.00000          1.00000 
   11   6.00000         osd.11            up  1.00000          1.00000 
   -2         0     host xt9-hdd                                       
   -1         0     host xt8-hdd  



你可能感兴趣的:(rgw,ceph)