背景

由于异常断电导致三个副本所在的故障域，都各有两个osd处于down的状态，情况非常危机，需要现场去修复，防止数据丢失。
ceph 版本：0.94.10

pgdown.png

image.png

上述是由于三个domain里面各有两个osd处于down的状态。

原因：

  In certain cases, the ceph-osd Peering process can run into problems, preventing a PG from becoming active and usable.

这是因为peering block啦
当然peering过程是一个非常复杂的过程，后面的需要的话一定要整理一番。
注：
除了在上述故障域的pg，其他pg在数据恢复的过程中都从down的状态，变到正常的状态。我们需要关心2.9a4,2.eac这两个pg的问题。（其实这两个pg在断电之后是处于stale状态，后面我们那边运维人员意外拉起了一个osd后就处于pg down情况）。
事实上我们要解决的问题是pg stale问题，所以我们要拉起相关的osd。

解决方案

找到osd

命令：
ceph-objectstore-tool --data-path xxx --journal-path xxx --op list-ops
ceph tell query
ceph pg query

我们这边是514osd的问题。

查看514启动失败日志：

    -3> 2019-07-09 22:13:40.493486 7f97a6d81880 20 read_log coll 2.9a4_head log_oid 2/9a4//head
    -2> 2019-07-09 22:13:40.493565 7f97a6d81880 20 read_log 231404'10412575 (231404'10412574) modify   2/2d7009a4/rbd_data.d0dbe05a5f008d.0000000000004532/head by client.156138418.0:1761183652 2019-07-04 03:44:14.804189
    -1> 2019-07-09 22:13:40.493582 7f97a6d81880 20 read_log 233319'10412574 (100460'2793103) modify   2/c03f99a4/rbd_data.c2639d7eb9c025.000000000000a802/head by client.232205693.0:3303 2019-07-05 07:43:03.907875
     0> 2019-07-09 22:13:40.495397 7f97a6d81880 -1 osd/PGLog.cc: In function 'static void PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t, const pg_info_t&, std::map&, PGLog::IndexedLog&, pg_missing_t&, std::ostringstream&, std::set >*)' thread 7f97a6d81880 time 2019-07-09 22:13:40.493592
osd/PGLog.cc: 911: FAILED assert(last_e.version.version < e.version.version)

 ceph version 0.94.10.1 (c5ce8260cade179b7dd358a340351c4029e239c1)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0xbdf665]
 2: (PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t, pg_info_t const&, std::map, std::allocator > >&, PGLog::IndexedLog&, pg_missing_t&, std::basic_ostringstream, std::allocator >&, std::set, std::allocator >*)+0x1a38) [0x7751e8]
 3: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x34f) [0x7f852f]
 4: (OSD::load_pgs()+0xa99) [0x6bd539]
 5: (OSD::init()+0x181a) [0x6c10da]
 6: (main()+0x29dd) [0x64854d]
 7: (__libc_start_main()+0xf5) [0x7f97a411daf5]
 8: /usr/bin/ceph-osd() [0x661f39]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this.

在read_log 过程中出现的问题，查看ceph这部分代码

image.png

具体代码在这里出现的断言错误。
分析原因：
osd在异常断电（机械磁盘异常掉电导致一部分pglog出错）中相关pglog未写正确。

解决办法：

删除由于pglog 出错的object

1.使用ceph-kvstore-tool工具。

但由于这个命令没有rm 子命令(L版本才支持)，因为工具都是向下兼容的，我们需要移植L版本的这个工具，拷贝ceph-kvstore-tool 到目标机/usr/bin/下，同时拷贝 /usr/lib64/ceph/libceph-common.so /usr/lib64/ceph/libceph-common.so.0到规定目录。

2.使用ceph-objectstore-tool

2因为删除这个log ，需要具备prefix 和key 。所以我们需要在ceph-objectstore-tool中read log 的逻辑中加上获取相关信息的代码，代码如下：

diff --git a/src/os/DBObjectMap.cc b/src/os/DBObjectMap.cc
index b856849..7946763 100644
--- a/src/os/DBObjectMap.cc
+++ b/src/os/DBObjectMap.cc
@@ -250,12 +250,14 @@ int DBObjectMap::DBObjectMapIteratorImpl::init()
 }

 ObjectMap::ObjectMapIterator DBObjectMap::get_iterator(
-  const ghobject_t &oid)
+  const ghobject_t &oid, std::ostream *out)
 {
   MapHeaderLock hl(this, oid);
   Header header = lookup_map_header(hl, oid);
   if (!header)
     return ObjectMapIterator(new EmptyIteratorImpl());
+  if (out)
+    *out << "header seq: " << header_key(header->seq) << std::endl;
   DBObjectMapIterator iter = _get_iterator(header);
   iter->hlock.swap(hl);
   return iter;
diff --git a/src/os/DBObjectMap.h b/src/os/DBObjectMap.h
index de80d6f..7ec43b0 100644
--- a/src/os/DBObjectMap.h
+++ b/src/os/DBObjectMap.h
@@ -219,7 +219,7 @@ public:
   int list_objects(vector *objs ///< [out] objects
     );

-  ObjectMapIterator get_iterator(const ghobject_t &oid);
+  ObjectMapIterator get_iterator(const ghobject_t &oid, std::ostream *out = NULL);

   static const string USER_PREFIX;
   static const string XATTR_PREFIX;
diff --git a/src/os/FileStore.cc b/src/os/FileStore.cc
index e0afbd0..bce7d6d 100644
--- a/src/os/FileStore.cc
+++ b/src/os/FileStore.cc
@@ -4747,7 +4747,10 @@ ObjectMap::ObjectMapIterator FileStore::get_omap_iterator(coll_t c,
     if (r < 0)
       return ObjectMap::ObjectMapIterator();
   }
-  return object_map->get_iterator(hoid);
+  ostringstream oss;
+  ObjectMap::ObjectMapIterator oiter = object_map->get_iterator(hoid, &oss);
+  dout(0) << "nyao: " << " " << oss.str() << dendl;
+  return oiter;
 }

 int FileStore::_collection_hint_expected_num_objs(coll_t c, uint32_t pg_num,
diff --git a/src/os/ObjectMap.h b/src/os/ObjectMap.h
index 86f9e3e..27de1ad 100644
--- a/src/os/ObjectMap.h
+++ b/src/os/ObjectMap.h
@@ -150,7 +150,7 @@ public:
     virtual ~ObjectMapIteratorImpl() {}
   };
   typedef ceph::shared_ptr ObjectMapIterator;
-  virtual ObjectMapIterator get_iterator(const ghobject_t &oid) {
+  virtual ObjectMapIterator get_iterator(const ghobject_t &oid, std::ostream *oss = NULL) {
     return ObjectMapIterator();
   }

diff --git a/src/osd/PGLog.cc b/src/osd/PGLog.cc
index a34903f..11fd555 100644
--- a/src/osd/PGLog.cc
+++ b/src/osd/PGLog.cc
@@ -19,7 +19,8 @@
 #include "PG.h"
 #include "SnapMapper.h"
 #include "../include/unordered_map.h"
-
+//#include "os/DBObjectMap.h"
+//#include "os/KeyValueDB.h"
 #define dout_subsys ceph_subsys_osd

 static coll_t META_COLL("meta");
@@ -906,8 +907,10 @@ void PGLog::read_log(ObjectStore *store, coll_t pg_coll,
        pg_log_entry_t e;
        e.decode_with_checksum(bp);
        dout(20) << "read_log " << e << dendl;
-        oss<

 
 通过此等方法确认了prefix 和key ，把编译完的程序导入目标机上，当然这个工具静态链接一些东西，需要把一些静态库导进去。 
 3.删除信息 
  
   
     
    
   
  
    image.png 
   
  
 
 
 当然提前用get命令把相关的key给拿出来，以防删错。
 此时启动514osd，osd正常启动。 
 4.query 相关pg 
  
   
     
    
   
  
    image.png 
   
  
 
 
 照提示进行lost 相关osd
 ceph osd lost  --yes-i-really-mean-it
 重启osd.514 ,集群恢复正常。 
 复盘相关pg状态 
 pg stale： 
 stale - The placement group status has not been updated by a ceph-osd, indicating that all nodes storing this placement group may be down.
 准备一个三个节点的简单ceph环境
 步骤1：建rbd pool 向里面fio写些数据。
 步骤2：down掉相关osd使用ceph-objectstore-tool 工具删除掉其中一个pg，三个osd都要进行操作。
 步骤3：ceph pg deep-scrub  
 pg down： 
 1、准备一个三节点的简单ceph 环境。
 步骤1：down掉一个节点，
 步骤2：向集群fio写数据
 步骤3：down掉剩余两个节点
 步骤4：拉起首次down的那个节点
 2、时钟不同步也会导致pg down
 步骤1：准备osd tree 为下图这样的环境 
 
 
  
   
     
    
   
  
    image.png 
   
  
 
 
 步骤2：设置noout状态 ceph osd set noout
 步骤3：down 掉osd.0 osd.2 osd.3
 步骤4：设置n2 host的主机时钟为 14年 date -s 2014-14-12
 步骤5.启动osd 0 osd.2
 结果： 
 
 
  
   
     
    
   
  
    image.png 
   
  
 
 如果最后osd.3启动后，集群可恢复，说明只要两个副本时间是正确的，既可以将pg达成一致

pglog出错导致osd启动失败的解决办法

背景

原因：

解决方案

找到osd

查看514启动失败日志：

解决办法：

1.使用ceph-kvstore-tool工具。

2.使用ceph-objectstore-tool

3.删除信息

4.query 相关pg

复盘相关pg状态

pg stale：

pg down：

你可能感兴趣的:(pglog出错导致osd启动失败的解决办法)