radosgw cache

rgw cache用于在radosgw实例端对元数据进行缓存(不包括s3 object数据),以减少和后端

ceph集群的交互。


早期的rgw cache

rgw cache第一次加入rgw:

    commit 3e62d8a21609bb422cec68551ca2198f2325769b                                                                                                                                                                    
    Author: Yehuda Sadeh <[email protected]>                                                       
    Date:   Tue Feb 8 17:19:39 2011 -0800                                                               
                                                                                                        
        rgw: add a cache layer for the backend                                                          
                                                                                                        
    commit 866b161b0ceaf856e79aadfa3e6dcda20cb235cb 

为了更加清晰的了解cache的设计,我们从最简单的开始。获取作者加入的cache相关的
代码。

    // 获取变更
    $ git diff 866b161b0ceaf856e79aadfa3e6dcda20cb235cb 3e62d8a21609bb422cec68551ca2198f2325769b > /tmp/p.patch

类定义

    1477 +class ObjectCache {                                                                                
    1478 +  std::map<string, bufferlist> cache_map;                                                          
    1479 +                                                                                                   
    1480 +public:                                                                                            
    1481 +  ObjectCache() {}                                                                                 
    1482 +  int get(std::string& name, bufferlist& bl);                                                      
    1483 +  void put(std::string& name, bufferlist& bl);                                                     
    1484 +};                                                                                                 
    1485 +                                                                                                   
    1486 +template <class T>                                                                                 
    1487 +class RGWCache  : public T                                                                         
    1488 +{                                                                                                  
    1489 +  ObjectCache cache;                                                                               
    1490 +                                                                                                   
    1491 +  string normal_name(std::string& space, std::string& bucket, std::string& oid) {                  
    1492 +    char buf[space.size() + 1 + bucket.size() + 1 + oid.size() + 1];                               
    1493 +    sprintf("%s+%s+%s", space.c_str(), bucket.c_str(), oid.c_str());                               
    1494 +    return string(buf);                                                                            
    1495 +  }                                                                                                
    1496 +                                                                                                   
    1497 +public:                                                                                            
    1498 +  RGWCache() {}                                                                                    
    1499 +                                                                                                   
    1500 +  int put_obj_data(std::string& id, std::string& bucket, std::string& obj, const char *data,       
    1501 +              off_t ofs, size_t len, time_t *mtime);                                               
    1502 +                                                                                                   
    1503 +  int get_obj(void **handle, std::string& bucket, std::string& oid,                                
    1504 +            char **data, off_t ofs, off_t end);                                                    
    1505 +                                                                                                   
    1506 +  int obj_stat(std::string& bucket, std::string& obj, size_t *psize, time_t *pmtime);              
    1507 +}; 

- 类ObjectCache 中 cache_map用来存放需要缓存的数据

    // name(string) 用于索引 bl(bufferlist) 的内容
    std::map<string, bufferlist> cache_map;
- 类ObjectCache有两个操作cache_map的方法

put用于添加缓存条目,get用于获取缓存条目。

- 条目的key值组成

space+bucket+oid

cache 初始化

下面对0.94.5中的rgw cache进行分析。先看下其初始化。

    // main -> RGWStoreManager::get_storage
    //      -> RGWStoreManager::init_storage_provider
    //      -> RGWRados::initialize // rgw/rgw_rados.h
    //      -> RGWRados::initialize // rgw/rgw_rados.cc
    //      -> RGWCache<RGWRados>::init_rados      
    //         9003     store = new RGWCache<RGWRados>; // 初始化一个带cache的RGWRados

    // rgw/rgw_cache.h 200   int init_rados() { 

数据结构关系

radosgw cache_第1张图片

本图引用了 `RGW Cache类解析 <http://static.oschina.net/uploads/space/2016/0420/112052_Hub7_206258.png>`_

init_rados 分析

    // RGWCache::init_rados -> ObjectCache::set_ctx 
    // 
    // 初始化用于和后端ceph集群交互的CephContext,并根据配置文件中
    // 'rgw cache lru size' 设置 ObjectCache::lru_window的值。
    202     cache.set_ctx(T::cct);

    // RGWRados::init_rados 
    // 
    // 初始化到ceph集群的rados handle,并准备RGWMetadataManager和RGWDataChangesLog
    203     ret = T::init_rados();

cache lru

为了辅助cache_map维护一个lru逻辑,增加了lru list以及其他相关字段,确保最新
被访问的entry放在cache_map中。

ObjectCache中,有几个和cache lru相关的内容。

    140 class ObjectCache {                                                                                 
    141   std::map<string, ObjectCacheEntry> cache_map;                                                     
    142   std::list<string> lru; // cache_map中key的lru列表                                                                           
    143   unsigned long lru_size; // lru list的entry个数                                                                           
    144   unsigned long lru_counter; // touch_lru调用计数器                                                                        
    145   unsigned long lru_window; // 1/2*(rgw cache lru size)

touch_lru

将最新访问过的entry添加/移动到lru list的尾部(从头部掉出), 并维护touch_lru操作计数器。
还会将一直未访问的旧的entry从lru list和cache_map中删除。

当有新entry插入,或者已在list中的key再次被访问时,会调用本函数。

    // rgw/rgw_cache.cc
    215 void ObjectCache::touch_lru(string& name, ObjectCacheEntry& entry, std::list<string>::iterator& lru_iter)

    // 从lru list的头部开始对cache map进行收缩,直到cache的大小回到rgw cache lru size之下
    217   while (lru_size > (size_t)cct->_conf->rgw_cache_lru_size) {                                       
    218     list<string>::iterator iter = lru.begin();                                                      
    219     if ((*iter).compare(name) == 0) {                                                               
    220       /*                                                                                            
    221        * if the entry we're touching happens to be at the lru end, don't remove it,                 
    222        * lru shrinking can wait for next time                                                       
    223        */                                                                                           
    224       break;                                                                                        
    225     }                                                                                               
    226     map<string, ObjectCacheEntry>::iterator map_iter = cache_map.find(*iter);                       
    227     ldout(cct, 10) << "removing entry: name=" << *iter << " from cache LRU" << dendl;              
            // 将一直没有访问过entry,从cache中删除 
    228     if (map_iter != cache_map.end())                                                                
    229       cache_map.erase(map_iter);                                                                    
    230     lru.pop_front();                                                                                
    231     lru_size--;                                                                                     
    232   }   
            // 插入新key到lru 
    234   if (lru_iter == lru.end()) {                                                                      
    235     lru.push_back(name);                                                                            
    236     lru_size++;                                                                                     
    237     lru_iter--;                                                                                     
    238     ldout(cct, 10) << "adding " << name << " to cache LRU end" << dendl;                            
    239   } else {     
            // 将被访问过的key移到lru的尾部                                                                                     
    240     ldout(cct, 10) << "moving " << name << " to cache LRU end" << dendl;                            
    241     lru.erase(lru_iter);                                                                            
    242     lru.push_back(name);                                                                            
    243     lru_iter = lru.end();                                                                           
    244     --lru_iter;                                                                                     
    245   }  
          // 每次touch_lru操作,都会递增该计数器,用于辅助触发touch_lru调用
          // 不可能每次put/get都要触发touch_lru,只有发现有key落到了lru list后
          // 半部时,才触发该动作
          // 详见:ObjectCache::get
    247   lru_counter++;        
          // 记录entry的最后一次访问时的lru_counter值                                                                            
    248   entry.lru_promotion_ts = lru_counter; 

ObjectCache::get

在cache map中查找name对应的entry,并检查其type/flag。根据需要进行cache紧缩。

    12 int ObjectCache::get(string& name, ObjectCacheInfo& info, uint32_t mask, rgw_cache_entry_info *cache_info)

    //   在cache中查找是否有name对应的entry,没有找到时返回ENOENT,就是一次'miss'
    20   map<string, ObjectCacheEntry>::iterator iter = cache_map.find(name); 

    //   如果本entry已经落到lru list的后半部了,需要来一次紧缩,以便淘汰长期未访问的entry
    27   ObjectCacheEntry *entry = &iter->second;                                                          
    28                                                                                                     
    29   if (lru_counter - entry->lru_promotion_ts > lru_window) {                                         
    30     ldout(cct, 20) << "cache get: touching lru, lru_counter=" << lru_counter << " promotion_ts=" << entry->lru_promotion_ts << dendl;
    31     lock.unlock();                                                                                  
    32     lock.get_write(); /* promote lock to writer */                                                  
    33                                                                                                     
    34     /* need to redo this because entry might have dropped off the cache */                          
    35     iter = cache_map.find(name);                                                                    
    36     if (iter == cache_map.end()) {                                                                  
    37       ldout(cct, 10) << "lost race! cache get: name=" << name << " : miss" << dendl;                
    38       if(perfcounter) perfcounter->inc(l_rgw_cache_miss);                                           
    39       return -ENOENT;                                                                               
    40     }                                                                                               
    41                                                                                                     
    42     entry = &iter->second;                                                                          
    43     /* check again, we might have lost a race here */                                               
    44     if (lru_counter - entry->lru_promotion_ts > lru_window) {                                       
    45       touch_lru(name, *entry, iter->second.lru_iter);                                               
    46     }                                                                                               
    47   }   

    //   检查ObjectCacheInfo的类型和要求的是否一致,如果不一致,就是一次 'type miss'
    49   ObjectCacheInfo& src = iter->second.info;                                                         
    50   if ((src.flags & mask) != mask) {                                                                 
    51     ldout(cct, 10) << "cache get: name=" << name << " : type miss (requested=" << mask << ", cached=" << src.flags << ")" << dendl;
    52     if(perfcounter) perfcounter->inc(l_rgw_cache_miss);                                             
    53     return -ENOENT;                                                                                 
    54   }                                                                                                 
    55   ldout(cct, 10) << "cache get: name=" << name << " : hit" << dendl; 

Flag的类型:

    // rgw/rgw_cache.h
    20 #define CACHE_FLAG_DATA           0x01                                                              
    21 #define CACHE_FLAG_XATTRS         0x02                                                              
    22 #define CACHE_FLAG_META           0x04                                                              
    23 #define CACHE_FLAG_MODIFY_XATTRS  0x08                                                              
    24 #define CACHE_FLAG_OBJV           0x10 

.. Note:: cache miss有两种:'miss'和'type miss'。

ObjectCache::put

查找和name关联的entry,或者新插入entry。然后更新entry的内容。

.. Note:: 这里暂时不考虑chain相关的代码

    114 void ObjectCache::put(string& name, ObjectCacheInfo& info, rgw_cache_entry_info *cache_info)

    // 查找和name关联的entry,或者新插入entry
    122   ldout(cct, 10) << "cache put: name=" << name << dendl;                                            
    123   map<string, ObjectCacheEntry>::iterator iter = cache_map.find(name);                               
    124   if (iter == cache_map.end()) {    
            // 未找到时,插入entry                                                                
    125     ObjectCacheEntry entry;                                                                         
    126     entry.lru_iter = lru.end();                                                                     
    127     cache_map.insert(pair<string, ObjectCacheEntry>(name, entry));                                  
    128     iter = cache_map.find(name);                                                                    
    129   }                      
    // 获取到和name关联的entry                                                                            
    130   ObjectCacheEntry& entry = iter->second;                                                           
    131   ObjectCacheInfo& target = entry.info;  

    // 触发一次cache紧缩,并把新entry放到lru list的尾部(从头部掉出)
    142   touch_lru(name, entry, entry.lru_iter); 

    // 更新目标ObjectCacheInfo的信息
    144   target.status = info.status;
    ...
    158   target.flags |= info.flags; 
    // 根据info.flags更新目标ObjectCacheInfo的xattrs/data/version/meta等信息
    160   if (info.flags & CACHE_FLAG_META)                                                                 
    161     target.meta = info.meta;                                                                        
    162   else if (!(info.flags & CACHE_FLAG_MODIFY_XATTRS))                                                
    163     target.flags &= ~CACHE_FLAG_META; // non-meta change should reset meta                          
    164                                                                                                      
    165   if (info.flags & CACHE_FLAG_XATTRS) {                                                             
    166     target.xattrs = info.xattrs;                                                                    
    167     map<string, bufferlist>::iterator iter;                                                         
    168     for (iter = target.xattrs.begin(); iter != target.xattrs.end(); ++iter) {                       
    169       ldout(cct, 10) << "updating xattr: name=" << iter->first << " bl.length()=" << iter->second.length() << dendl;
    170     }                                                                                                
    171   } else if (info.flags & CACHE_FLAG_MODIFY_XATTRS) {                                               
    172     map<string, bufferlist>::iterator iter;                                                         
    173     for (iter = info.rm_xattrs.begin(); iter != info.rm_xattrs.end(); ++iter) {                     
    174       ldout(cct, 10) << "removing xattr: name=" << iter->first << dendl;                            
    175       target.xattrs.erase(iter->first);                                                             
    176     }                                                                                                
    177     for (iter = info.xattrs.begin(); iter != info.xattrs.end(); ++iter) {                           
    178       ldout(cct, 10) << "appending xattr: name=" << iter->first << " bl.length()=" << iter->second.length() << dendl;
    179       target.xattrs[iter->first] = iter->second;                                                    
    180     }                                                                                               
    181   }                                                                                                 
    182                                                                                                      
    183   if (info.flags & CACHE_FLAG_DATA)                                                                 
    184     target.data = info.data;                                                                        
    185                                                                                                     
    186   if (info.flags & CACHE_FLAG_OBJV)                                                                 
    187     target.version = info.version; 

cache 同步

rgw 实例的缓存之间通过librados提供的watch/notify机制进行同步。
当entry在某个实例的缓存中被更新时,它负责将最新数据notify所有的
watcher。其他rgw实例在watcher的回调函数(watch_cb)中,通过notify的消息完成
本地缓存的更新。

RGWWatcher 初始化

在radosgw 实例启动时,通过init_watch()初始化用于cache同步的notify对象。

    // rgw/rgw_rados.cc
    1763 int RGWRados::init_watch()
    ...
           // 读取配置文件设置的用于rgw实例cache间同步的notify对象个数。
    1780   num_watchers = cct->_conf->rgw_num_control_oids;                                                  
    1781                                                                                                     
    1782   bool compat_oid = (num_watchers == 0);                                                            
    1783                                                                                                     
    1784   if (num_watchers <= 0)                                                                            
    1785     num_watchers = 1;                                                                               
    1786                                                                                                     
    1787   notify_oids = new string[num_watchers];           
           // RGWWatcher继承自 librados::WatchCtx2,Watch notify 使用详见[2]                                                
    1788   watchers = new RGWWatcher *[num_watchers]; 

    1790   for (int i=0; i < num_watchers; i++) {                                                            
    1791     string& notify_oid = notify_oids[i];                                                            
    1792     notify_oid = notify_oid_prefix;                                                                
    ... 
             // 创建 notify.XXX
    1798     r = control_pool_ctx.create(notify_oid, false);                                                
    ... 
             // 在 notify.XXX 进行watch(创建watcher)
    1802     RGWWatcher *watcher = new RGWWatcher(this, i, notify_oid);                                      
    1803     watchers[i] = watcher;                                                                          
    1804             
             // 完成 watcher的注册,并记录到RGWRados的 'std::set<int> watchers_set;'
             // 日志中会打印: 'add_watcher() i=XXX'
             //                'all XXX watchers are set, enabling cache'
    1805     r = watcher->register_watch();                                                                  
    1810   watch_initialized = true;                                                                         
    1811                                                                                                     
    1812   set_cache_enabled(true);   

watcher的 handle_notify函数

当被watch的对象上有notify事件时,在该对象上注册的watcher的 handle_notify() 方法
就会被调用。这用于被通知者更新自己的本地缓存。

  RGWWatcher(RGWRados *r, int i, const string& o) : rados(r), index(i), oid(o), watch_handle(0) {}   
  void handle_notify(uint64_t notify_id,                                                            
                     uint64_t cookie,                                                               
                     uint64_t notifier_id,                                                          
                     bufferlist& bl) {                                                              
    ldout(rados->ctx(), 10) << "RGWWatcher::handle_notify() "                                       
                            << " notify_id " << notify_id                                               
                            << " cookie " << cookie                                                 
                            << " notifier " << notifier_id                                          
                            << " bl.length()=" << bl.length() << dendl;                            
    // 调用rados的回调函数 RGWCache<RGWRados>::watch_cb 
    rados->watch_cb(notify_id, cookie, notifier_id, bl);                                            
                                                                                                    
    bufferlist reply_bl; // empty reply payload           
    // 向notifier回发确认                                              
    rados->control_pool_ctx.notify_ack(oid, notify_id, cookie, reply_bl);                           
  }  

下面分析RGWCache<RGWRados>::watch_cb

    578 template <class T>                                                                                  
    579 int RGWCache<T>::watch_cb(uint64_t notify_id,                                                       
    580                           uint64_t cookie,                                                          
    581                           uint64_t notifier_id,                                                     
    582                           bufferlist& bl)                                                           
    583 {                                                                                                   
    584   RGWCacheNotifyInfo info;                                                                          
    585                                                                                                     
    586   try {                                                                                             
    587     bufferlist::iterator iter = bl.begin();                                                        
            // 解析notify发过来的RGWCacheNotifyInfo 
    588     ::decode(info, iter);                                                                          
    ... 
    596                                                                                                     
    597   rgw_bucket bucket;                                                                                
    598   string oid;                                                                                       
    599   normalize_bucket_and_obj(info.obj.bucket, info.obj.get_object(), bucket, oid);        
          // 将bucket/oid拼接成key,形式 '${bucket}+${oid}'             
    600   string name = normal_name(bucket, oid);                                                           
    601                                                                                                     
    602   switch (info.op) {        
          // name对应的cache entry被更新?                                                                     
    603   case UPDATE_OBJ:               
            // 使用最新信息更新本地缓存                                                                   
    604     cache.put(name, info.obj_info, NULL);                                                           
    605     break;                   
          // name对应的cache entry被删除?                                                                       
    606   case REMOVE_OBJ:              
            // 将本地缓存中的name关联的entry删除                                                                    
    607     cache.remove(name);                                                                             
    608     break;                                                                                          
    609   default:                                                                                          
    610     mydout(0) << "WARNING: got unknown notification op: " << info.op << dendl;                      
    611     return -EINVAL;                                                                                 
    612   }                                                                                                 
    613                                                                                                     
    614   return 0;                                                                                         
    615 } 

watcher的 notify 实现

当某个rgw实例有修改动作时,会改变其本地缓存。更新者负责将改变通知到其他rgw实例,
以便完成cache同步。

在下面的更新动作中,会调用 'int RGWCache<T>::distribute_cache' 方法,完成变更的
通知。

    // rgw/rgw_cache.h
    int RGWCache<T>::delete_system_obj
    int RGWCache<T>::set_attr
    int RGWCache<T>::set_attrs
    int RGWCache<T>::put_system_obj_impl
    int RGWCache<T>::put_obj_data // 并非s3 object,而是元数据object

distribute_cache函数将更新信息封装到RGWCacheNotifyInfo中,编码到bufferlist中,
告知给watcher。

    563 template <class T>                                                                                  
    564 int RGWCache<T>::distribute_cache(const string& normal_name, rgw_obj& obj, ObjectCacheInfo& obj_info, int op)
    565 {                                                                                                   
    566   RGWCacheNotifyInfo info;                                                                          
    567                                                                                                     
    568   info.op = op;                                                                                     
    569                                                                                                     
    570   info.obj_info = obj_info;                                                                         
    571   info.obj = obj;                                                                                   
    572   bufferlist bl;                                                                                    
    573   ::encode(info, bl);       
          // 对于RGWRados来说,就是 RGWRados::distribute                                                                        
    574   int ret = T::distribute(normal_name, bl);                                                         
    575   return ret;                                                                                       
    576 }   


RGWRados::distribute 的实现很直接,就是在notify_oid上发出了notify2信息。

    7659 int RGWRados::distribute(const string& key, bufferlist& bl)                                         
    7660 {                                                                                                   
    7661   /*                                                                                                 
    7662    * we were called before watch was initialized. This can only happen if we're updating some system
    7663    * config object (e.g., zone info) during init. Don't try to distribute the cache info for these  
    7664    * objects, they're currently only read on startup anyway.                                        
    7665    */                                                                                               
    7666   if (!watch_initialized)                                                                           
    7667     return 0;                                                                                       
    7668                                                                                                     
    7669   string notify_oid;   
           // 从众多的 notify.XXX 对象中,选择一个用于发送变更通知
           // 这通过对key进行简单hash 来完成选择                                                                             
    7670   pick_control_oid(key, notify_oid);                                                                
    7671                                                                                                     
    7672   ldout(cct, 10) << "distributing notification oid=" << notify_oid << " bl.length()=" << bl.length() << dendl;
    7673   int r = control_pool_ctx.notify2(notify_oid, bl, 0, NULL);                                        
    7674   return r;                                                                                         
    7675 }  

chained cache

为了方便使用 ObjectCache,基于它扩展出了chain cache。chain cache使用
email/swift_name/access_key检索user_info_entry (或者 使用bucket_info_entry检索
bucket_info_entry)。

    commit ab764f38c1cfb7860a657f3c3557419e8e96f4c2                                                     
    Author: Yehuda Sadeh <[email protected]>                                                           
    Date:   Wed Mar 19 16:34:21 2014 -0700                                                              
                                                                                                        
        rgw: an infrastructure for hooking into the raw cache                                           
                                                                                                              
        Extend the RGWCache so that we can chain other caches to it so that when                        
        data is invalidated it notifies them.                                                           
                                                                                                        
        Signed-off-by: Yehuda Sadeh <[email protected]>                                                
                                                                                                        
    commit 7fb6a3d68f4745b277507d9737aaf397b2272ffa 

chained cache和 ObjectCache的关系

分为数据关系和控制关系。

- 数据关系

chained cache的数据优先从ObjectCache中读取。chained cache中 Entry->data字段
存放了 bucket_info_entry/user_info_entry的指针。而ObjectCache的 cache_map中,
value部分为 ObjectCacheEntry结构。

- 控制关系

chained cache 的填充和淘汰依赖于 ObjectCache。当ObjectCache中有更新操作时,
会回调 ObjectCacheEntry的chained_entries上的所有 chained cache的回调函数,
将chained cache中的关联entry删除。

ObjectCache::chain_cache_entry 用以填充chained cache。

.. Note:: chained cache的hit/miss,并不会反映到cache的perf统计中。

rgw 中chained cache

目前RGW中只有两类chained cache:

    (gdb) print cache.chained_cache
    $31 = std::list = {
      [0] = 0x110e3c0 <binfo_cache>,
      [1] = 0x110d220 <uinfo_cache>
    }

binfo_cache/uinfo_cache的实现。

    template <class T>                                                                                      
    class RGWChainedCacheImpl : public RGWChainedCache {                                                                                                                                                               
      RWLock lock;                                                                                          
                                                                                                            
      map<string, T> entries; 

    // user info cache 
    struct user_info_entry {                                                                                
      RGWUserInfo info;                                                                                     
      RGWObjVersionTracker objv_tracker;                                                                    
      time_t mtime;                                                                                         
    };                                                                                                      
                                                                                                            
    static RGWChainedCacheImpl<user_info_entry> uinfo_cache;
    ...
    if (uinfo_cache.find(key, &e)) {
    uinfo_cache.put(store, key, &e, cache_info_entries);
    uinfo_cache.init(store);

    // bucket info cache 
    struct bucket_info_entry {                                                                              
      RGWBucketInfo info;                                                                               
      time_t mtime;                                                                                         
      map<string, bufferlist> attrs;                                                                    
    };                                                                                                      
                                                                                                        
    static RGWChainedCacheImpl<bucket_info_entry> binfo_cache; 
    ...
    binfo_cache.init(this);
    if (binfo_cache.find(bucket_name, &e)) {
    if (!binfo_cache.put(this, bucket_name, &e, cache_info_entries)) {

chained cache 相关数据结构

    struct ObjectCacheEntry {                                                                           
      ObjectCacheInfo info;                                                                             
      std::list<string>::iterator lru_iter;                                                             
      uint64_t lru_promotion_ts;                                                                        
      uint64_t gen;          
      // 记录本entry存在于哪些 chained cache 中, string为
      // RGWChainedCache::Entry 的 key字段
      std::list<pair<RGWChainedCache *, string> > chained_entries;                                                                                                                                                     
      ...                                                                                                    
    }; 

    class ObjectCache {                                                                                 
      std::map<string, ObjectCacheEntry> cache_map;                                                     
      // 用来记录本ObjectCache中注册的chained cache 类型,string部分为chained cache的名字
      // 当前有binfo_cache/uinfo_cache                                                    
      list<RGWChainedCache *> chained_cache;

    1158 class RGWChainedCache {                                                                             
    1159 public:                                                                                             
    1160   virtual ~RGWChainedCache() {}                                                                     
    1161   virtual void chain_cb(const string& key, void *data) = 0;                                         
    1162   virtual void invalidate(const string& key) = 0;                                                   
    1163   virtual void invalidate_all() = 0;                                                                
    1164                                                                                                     
    1165   struct Entry {                                                                                    
    1166     RGWChainedCache *cache; // 所属的chained cache                                                                         
    1167     const string& key; // email/swift_name/access_key/bucket name                                                                             
    1168     void *data;  // 指向bucket_info_entry/user_info_entry的指针                                                                                   
    1169                                                                                                     
    1170     Entry(RGWChainedCache *_c, const string& _k, void *_d) : cache(_c), key(_k), data(_d) {}        
    1171   };                                                                                                
    1172 }; 

RGWChainedCache::Entry示例:

    (gdb) print *chained_entry
    $38 = {
      cache = 0x110d220 <uinfo_cache>, 
      key = "0AA3R7LRW9H2H5LFTGZQ", // 本例中是user的ak 
      data = 0x7fd6e17f78e0
    }

ObjectCache::chain_cache_entry

将chained_entry添加到chained cache,并和cache_info_entries指示的ObjectCacheEntry
相关联。

    // cache_info_entries: 关联的ObjectCacheEntry
    // chained_entry: 待插入的entry
    67 bool ObjectCache::chain_cache_entry(list<rgw_cache_entry_info *>& cache_info_entries, RGWChainedCache::Entry *chained_entry)

    2306   bool put(RGWRados *store, const string& key, T *entry, list<rgw_cache_entry_info *>& cache_info_entries) {
    2307     Entry chain_entry(this, key, entry);                                                            
    2308                                                                                                     
    2309     /* we need the store cache to call us under its lock to maintain lock ordering */               
    2310     return store->chain_cache_entry(cache_info_entries, &chain_entry);                                                               
    2311   } 

    // rgw/rgw_user.cc
    228   rgw_cache_entry_info cache_info;                                                                  
    229                                                                                                     
    230   bufferlist::iterator iter = bl.begin();                                                           
    231   try {                                                                                             
    232     ::decode(uid, iter);                                                                            
    233     int ret = rgw_get_user_info_by_uid(store, uid.user_id, e.info, &e.objv_tracker, NULL, &cache_info);
    ...
    242   list<rgw_cache_entry_info *> cache_info_entries;                                                  
    243   cache_info_entries.push_back(&cache_info);                                                        
    244                                                                                                     
    245   uinfo_cache.put(store, key, &e, cache_info_entries);

- first verify that all entries are still valid

有效的判断标准就是,"cache_map.find(cache_info->cache_locator)" 可以在缓存中找到对应entry,并且
gen值是相同的。gen是entry的版本,每次更新都回加一。将有效的entry添加到 cache_entry_list 中。

- 将待添加entry添加到对应chain cache中 
101   chained_entry->cache->chain_cb(chained_entry->key, chained_entry->data);

- 将chained entry关联到指定的所有有效的ObjectCacheEntry
    105   for (liter = cache_entry_list.begin(); liter != cache_entry_list.end(); ++liter) {                                                  
    106     ObjectCacheEntry *entry = *liter;                                                               
    107                                                                                                     
    108     entry->chained_entries.push_back(make_pair<RGWChainedCache *, string>(chained_entry->cache, chained_entry->key));
    109   }  

ObjectCache::put

在ObjectCache有put(更新)操作时,回调entry.chained_entries上的所有chain cache的invalidate方法,将与本ObjectCacheEntry
关联的chain cache entry,从对应chain的 entries map中删除。

    133   for (list<pair<RGWChainedCache *, string> >::iterator iiter = entry.chained_entries.begin();      
    134        iiter != entry.chained_entries.end();ained_entriesy+iiter) {                                              
    135     RGWChainedCache *chained_cache = iiter->first;                                                  
    136     chained_cache->invalidate(iiter->second);                                                       
    137   }                                                                                                 
    138                                                                                                     
    139   entry.chained_entries.clear();                                                                    
    140   entry.gen++; // 增加entry的版本 

chain cache 使用场景示例

使用key从 chain cache 中检索 user info。key 可以是email/swift_name/access_key。

    207 int rgw_get_user_info_from_index(RGWRados *store, string& key, rgw_bucket& bucket, RGWUserInfo& info,
    208                                  RGWObjVersionTracker *objv_tracker, time_t *pmtime)                
    209 {                                                                                                   
    210   user_info_entry e;     
          // 使用key 从 uinfo_cache (chain cache) 中检索 user info
          // 如果找到了,就返回其值
    211   if (uinfo_cache.find(key, &e)) {                                                                  
    212     info = e.info;                                                                                  
    213     if (objv_tracker)                                                                               
    214       *objv_tracker = e.objv_tracker;                                                               
    215     if (pmtime)                                                                                     
    216       *pmtime = e.mtime;                                                                            
    217     return 0;                                                                                       
    218   }                                                                                                 
    219                          
          // 如果在 chain cache 中没有找到 user info,那么就
          // 从后端ceph 集群读取user info 对象的内容来获取user info 
    220   bufferlist bl;                                                                                    
    221   RGWUID uid;                                                                                       
    222   RGWObjectCtx obj_ctx(store);                                                                      
    223         
          // rgw_get_system_obj ->
          // RGWRados::SystemObject::Read::read ->
          // RGWCache<RGWRados>::get_system_obj ->
          // ObjectCache::get // 这里首先会从ObjectCache中读取user info
    224   int ret = rgw_get_system_obj(store, obj_ctx, bucket, key, bl, NULL, &e.mtime);                    
    225   if (ret < 0)                                                                                      
    226     return ret;                                                                                     
    227                                                                                                     
    228   rgw_cache_entry_info cache_info;                                                                  
    229                                                                                                     
    230   bufferlist::iterator iter = bl.begin();                                                           
    231   try {                                                                                             
    232     ::decode(uid, iter);                                                                            
    233     int ret = rgw_get_user_info_by_uid(store, uid.user_id, e.info, &e.objv_tracker, NULL, &cache_info);
    234     if (ret < 0) {                                                                                  
    235       return ret;                                                                                   
    236     }                                                                                               
    237   } catch (buffer::error& err) {                                                                    
    238     ldout(store->ctx(), 0) << "ERROR: failed to decode user info, caught buffer::error" << dendl;   
    239     return -EIO;                                                                                    
    240   }                                                                                                 
    241                                                           
          // 将读取到的user info 信息 插入到 chain cache uinfo_cache中
    242   list<rgw_cache_entry_info *> cache_info_entries;                                                  
    243   cache_info_entries.push_back(&cache_info);                                                        
    244                                                                                                     
    245   uinfo_cache.put(store, key, &e, cache_info_entries);                                              
    246                                                                                                     
    247   info = e.info;                                                                                    
    248   if (objv_tracker)                                                                                 
    249     *objv_tracker = e.objv_tracker;                                                                 
    250   if (pmtime)                                                                                       
    251     *pmtime = e.mtime;                                                                              
    252                                                                                                     
    253   return 0;                                                                                         
    254 }                  

查看cache的命中情况

    [root@yhg-2 ~]# ceph daemon --cluster yhg  /var/run/ceph/yhg-client.radosgw.yhg-yhg-yhg-2.asok perf dump| grep cache
            "cache_hit": 82,
            "cache_miss": 727,

参考

[1] git commit id: 3e62d8a21609bb422cec68551ca2198f2325769b

[2] `Hello.cc librados api 使用示例 <http://192.168.12.69:81/index.php/Hello.cc_librados_api_%E4%BD%BF%E7%94%A8%E7%A4%BA%E4%BE%8B>`_


你可能感兴趣的:(cache,cache,radosgw,rgw,chained)