too many PGs per OSD (***> max 250) 代码追踪

看到问题以为很简单,马上查找源码
在PGMap.cc中

 // TOO_MANY_PGS
 auto max_pg_per_osd = cct->_conf.get_val("mon_max_pg_per_osd");
   if (num_in && max_pg_per_osd > 0) {
   auto per = sum_pg_up / num_in;
   if (per > max_pg_per_osd) {
      ostringstream ss;
      ss << "too many PGs per OSD (" << per
        << " > max " << max_pg_per_osd << ")";
      checks->add("TOO_MANY_PGS", HEALTH_WARN, ss.str(),
        per - max_pg_per_osd);
   }
 }

理所当然看到mon_max_pg_per_osd 这个值啊,我修改了。已经改成了1000

    [mon]
    mon_max_pg_per_osd = 1000

是不是很奇怪,并不生效。通过config查看

    # ceph --show-config |grep mon_max_pg
    mon_max_pg_per_osd = 250

还是250.

继续看源码 在options.cc中看到

  Option("mon_max_pg_per_osd", Option::TYPE_UINT, Option::LEVEL_ADVANCED)
 .set_min(1)
 .set_default(250)
 .add_service("mgr")
 .set_description("Max number of PGs per OSD the cluster will allow")
 .set_long_description("If the number of PGs per OSD exceeds this, a "
    "health warning will be visible in `ceph status`.  This is also used "
    "in automated PG management, as the threshold at which some pools' "
    "pg_num may be shrunk in order to enable increasing the pg_num of "
    "others."),

并且要放到global中

这里表明是由mgr-server接手了,那为何还要mon起头我就有点不明白

重启mgr服务,错误解除,问题解决。

 # ceph --show-config |grep mon_max_pg
 mon_max_pg_per_osd = 1000

你可能感兴趣的:(too many PGs per OSD (***> max 250) 代码追踪)