Ceph 之RGW Pub-Sub Module

Overview

Pub-Sub module 顾名思义是一个发布订阅相关的模块。Pub-Sub module 为对象存储的变更事件提供一种发布-订阅机制。而发布-订阅架构本身应用非常广泛,如公有云Google Cloud,AWS 的PubSub 服务,Redis 的发布订阅机制等等,发布订阅架构提供了将发送者和接收者分离的多对多异步消息传递。

事件会往预定义好的主题中发布,主题可以被订阅,也可以从主题中拉取事件。事件被确认后就会从订阅历史中删除,events_retention_days(默认7天)后会被自动确认。

Pub-Sub module 仍在开发中,且最近的一次完整backport Nautilus 还未发布(https://github.com/ceph/ceph/pull/30579 未包含在最新发布版本Nautilus 14.2.4)。

Pub-Sub module 中有四个基本概念:

  • Topic(主题):topic 关联特定的存储桶(需要通过notification 关联特定存储桶),一个存储桶可以关联多个topic,每个topic 拥有一个subscriptions 列表。
  • Notification(通知):指定topic 和bucket 创建notification,notification 发布指定存储桶的事件在关联的topic上。notification 不指定endpoint(需要在topic 指定推送endpoint)。notification API分为S3 兼容(bucket notification,属于bucket 下的操作)和非S3 兼容。
  • Subscription(订阅):指定topic 创建subscription,subscription接收订阅主题的事件推送,且可以拉取指定topic 上的事件。subscription 会指定endpoint,用于后面事件推送。
  • Event(事件):存储桶或其中的对象发生变更时即发生事件,如ObjectCreated、ObjectRemoved等等。事件根据是普通subscription 还是notification,选择存储、推送事件或仅推送事件通知(事件推送需要有指定的endpoint)。

Usage

目前pub-sub sync module 还在开发中,功能不完善,pub-sub 相关radosgw-admin api 未给出CLI 说明,CLI 拉取events 会触发core dump(见Q&A)。

配置multisite 

所有sync module 都是基于multisite 框架的,multisite 通过多个zone的关联,每个zone 包含一个或多个RGW,根据sync module 的不同,进行相应的数据或元数据同步 。

通常所说的multisite 即为sync module 中的default module,可以进行数据、元数据的同步。pub-sub module 作为sync module的一种,同样需要通过多个zone 之间的同步搭起multisite 框架,然后通过pub-sub module 进行相应数据同步。

更具体sync module 及multisite 原理说明可参考:RGW Sync Module,RGW Multisite。

下面以2个zone 的方式说明pub-sub module 的multisite 配置。

新建pubsub zone,并配置tier-type=pubsub 及tier-config

bin/radosgw-admin -c ceph.conf realm create --rgw-realm=default --default --master

bin/radosgw-admin -c ceph.conf zonegroup modify --rgw-realm=default --rgw-zonegroup=default --default --master --endpoints="http://192.168.180.138:8000"

bin/radosgw-admin -c ceph.conf zone modify --rgw-realm=default --rgw-zonegroup=default --rgw-zone=default --access-key=bl_deliver --secret-key=bl_deliver --bl-deliver --default --master --endpoints="http://192.168.180.138:8000"

bin/radosgw-admin -c ceph.conf zone modify --rgw-realm=default --rgw-zonegroup=default --rgw-zone=default --access-key=ms_sync --secret-key=ms_sync --system

bin/radosgw-admin -c ceph.conf zone create --rgw-zone pubsub --rgw-zonegroup default --rgw-realm default --tier-type=pubsub --tier-config=uid=user1,data_bucket_prefix=pubsub,data_oid_prefix=pubsub-,events_retention_days=1 --sync-from-all=false --sync-from=default --endpoints="http://192.168.180.138:8001"
bin/radosgw-admin -c ceph.conf zone modify --rgw-realm default --rgw-zonegroup default --rgw-zone pubsub --access-key=ms_sync --secret-key=ms_sync --system

bin/radosgw-admin -c ceph.conf period update --commit

将某一RGW 配置为pubsub rgw

[client.rgw.8001]
        rgw zone = pubsub

查看同步状态

[root@stor14 build]# bin/radosgw-admin sync status -c ceph.conf
          realm 3528d7ca-aac2-4161-ab7b-e16d63e7faaa (default)
      zonegroup 8a290331-13a5-4822-81c7-b840ef228312 (default)
           zone 49c1fd93-060c-441b-9a64-7b9e10efc7f6 (default)
  metadata sync no sync (zone is master)
      data sync source: daff70b4-6df3-4d21-ae20-9673d06e89db (pubsub)
                        not syncing from zone
[root@stor14 build]# bin/radosgw-admin sync status -c ceph.conf --rgw-zone pubsub
          realm 3528d7ca-aac2-4161-ab7b-e16d63e7faaa (default)
      zonegroup 8a290331-13a5-4822-81c7-b840ef228312 (default)
           zone daff70b4-6df3-4d21-ae20-9673d06e89db (pubsub)
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is caught up with master
      data sync source: 49c1fd93-060c-441b-9a64-7b9e10efc7f6 (default)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source
[root@stor14 build]# bin/radosgw-admin bucket list -c ceph.conf
[
    "test1"
]
[root@stor14 build]# bin/radosgw-admin bucket list -c ceph.conf --rgw-zone pubsub
[
    "test1"
]

可以看到同步正常,存储桶test1已经同步至pubsub zone

pub-sub config:

创建发布
[root@stor12 build]# radosgw-admin pubsub topic create --uid user1 --topic topic1 --rgw-zone pubsub --endpoints="http://192.168.180.136:8000" #推送远端设置为136节点
list 发布
[root@stor12 build]# radosgw-admin pubsub topics list --uid user1 --topic topic1 --rgw-zone pubsub
{
    "topics": [
        {
            "topic": {
                "user": "user1",
                "name": "topic1",
                "dest": {
                    "bucket_name": "",
                    "oid_prefix": "",
                    "push_endpoint": "",
                    "push_endpoint_args": "",
                    "push_endpoint_topic": ""
                },
                "arn": ""
            },
            "subs": []
        }
    ]
}
查询发布
[root@stor12 build]# radosgw-admin pubsub topic get --uid user1 --topic topic1 --rgw-zone pubsub
{
    "topic": {
        "user": "user1",
        "name": "topic1",
        "dest": {
            "bucket_name": "",
            "oid_prefix": "",
            "push_endpoint": "",
            "push_endpoint_args": "",
            "push_endpoint_topic": ""
        },
        "arn": ""
    },
    "subs": []
}
创建notification,并指定存储桶、关联topic,这样notification 发布指定存储桶相关事件至关联topic 
[root@stor12 build]# radosgw-admin pubsub notification create --uid user1 --topic topic1 --rgw-zone pubsub --bucket test1
创建订阅
[root@stor12 build]# radosgw-admin pubsub sub create --sub-name sub1 --uid user1 --topic topic1 --rgw-zone pubsub --sub-push-endpoint="http://192.168.180.136:8000" --sub-dest-bucket test1
查询订阅
[root@stor12 build]# radosgw-admin pubsub sub get --uid user1 --sub-name sub1 --rgw-zone pubsub
{
    "user": "user1",
    "name": "sub1",
    "topic": "topic1",
    "dest": {
        "bucket_name": "pubsubuser1-topic1",
        "oid_prefix": "pubsub-",
        "push_endpoint": "",
        "push_endpoint_args": "",
        "push_endpoint_topic": ""
    },
    "s3_id": ""
}
拉取events
[root@stor12 build]# radosgw-admin pubsub sub pull --sub-name sub1 --uid user1 --topic topic1 --rgw-zone pubsub
{
    "next_marker": "",
    "is_truncated": "false",
    "events": [
        {
            "id": "1576056050.369301.c7191f0d",
            "event": "OBJECT_CREATE",
            "timestamp": "2019-12-11T09:20:50.369301Z",
            "info": {
                "attrs": {
                    "mtime": "2019-12-11T09:20:48.036424Z"
                },
                "bucket": {
                    "bucket_id": "96733e46-48af-4b89-a3dc-3f3b39ffb2a3.4539.1",
                    "name": "test1",
                    "tenant": ""
                },
                "key": {
                    "instance": "",
                    "name": "f3"
                }
            }
        }
    ]
} 

Pub-Sub test

cd ceph/build
RGW_MULTI_TEST_CONF=./test_multi.conf nosetests -s --verbosity=2  ../src/test/rgw/test_multi.py -m "test_ps*"

Pub-Sub REST API

在pub-sub rest api 上,Ceph 实现和AWS 标准是有却别的。

API
AWS
Ceph
说明
Topic

AWS Simple Notification Service(SNS) API 包含很多

  • GetTopic 是获取所有topic 信息
  • CreateTopic 不支持指定endpoint

仅实现了CreateTopic、DeleteTopic、ListTopics、GetTopic

  • GetTopic可指定topic,也可以获取所有topic
  • CreateTopic 支持指定endpoint

 Ceph 仅实现了部分AWS SNS API,但Ceph相对做了部分扩展

Subscription 无相关API 有对应Subscription API AWS Pub-Sub 服务中Subscription 概念是隐含在notification 中的,创建或删除notification 就会创建或删除subscription
Notification

AWS S3 Bucket Notification API

  • GetNotification 获取指定bucket的所有notification
  • 不支持对象前缀、后缀筛选
  • 不支持相同事件发送至不同的notification
  • 删除notification:指定删除notification,删除对应存储桶或设置指定notification 为空

两类API:兼容AWS S3 Bucket Notification API 和非S3 兼容API

  • GetNotification可以指定notification获取
  • 支持对象前缀、后缀筛选
  • 支持相同事件发送至不同的notification
  • 删除notification:必须显式删除,不支持设置指定notification 为空来删除notification

Ceph 实现了两类API,且做了部分扩展

Event 获取notification 推送

对于两类API(是否S3兼容)的返回是不一样的(具体可见代码解析event 数据结构说明)

仅支持部分event type,且在bucket notification 中和pub-sub module 中支持的程度不一样,具体可见#event-types  。

需要说明的是pub-sub REST API 访问的endpoint 必须是pub-sub zone 下的RGW endpoint。

Topic

CREATE TOPIC

创建topic

PUT /topics/[?push-endpoint=[&amqp-exchange=][&amqp-ack-level=none|broker][&verify-ssl=true|false][&kafka-ack-level=none|broker]]
  • push-endpoint:发送推送通知的endpoint的URI(仅在notification 时需要,直接创建subscription 会自行指定endpoint),包含三类:
    • HTTP endpoint:http[s]://[:
    • AMQP endpoint:amqp://[:@][:][/]
    • KAFKA endpoint:kafka://[:
  • 其中topic 作为一种资源,在response中topic 以ARN 形式表示,格式如下:
    • arn:aws:sns:<zone-group>:<tenant>:<topic

GET TOPIC INFORMATION

获取topic 信息

GET /topics/<topic-name>

Response 是一段json 数据:

{
    "topic":{
        "user":"",
        "name":"",
        "dest":{
            "bucket_name":"",
            "oid_prefix":"",
            "push_endpoint":"",
            "push_endpoint_args":""
        },
        "arn":""
    },
    "subs":[]
}

DELETE TOPIC

删除指定topic

DELETE /topics/<topic-name>

LIST TOPICS

GET /topics

S3 兼容Notification

关于Notification 有以下说明:

  • 创建notification 会自动创建notification Id同名的订阅(subscription);
  • 删除notification 会自动删除自动创建的订阅(subscription);
  • 删除bucket 会自动删除notification,但是不会删除notification对应的subscription,subscription相关的事件仍然可以访问;
  • S3 notification 属于存储桶bucket 下操作。

CREATE NOTIFICATION

创建指定存储桶上的指定topic 的notification(publisher)

PUT /?notification HTTP/1.1

DELETE NOTIFICATION

删除指定存储桶上指定或所有notification

DELETE /bucket?notification[=] HTTP/1.1

GET/LIST NOTIFICATION

获取指定notification,或list 存储桶上所有的notifications

GET /bucket?notification[=] HTTP/1.1

非S3 兼容Notification

CREATE A NOTIFICATION

PUT /notifications/bucket/?topic=[&events=[,]]

DELETE NOTIFICATION INFORMATION

DELETE /notifications/bucket/?topic=

LIST NOTIFICATIONS

列出指定存储桶上的所有关联事件

GET /notifications/bucket/<bucket>

Subscription

CREATE A SUBSCRIPTION

创建一个订阅subscription.

PUT /subscriptions/?topic=[?push-endpoint=[&amqp-exchange=][&amqp-ack-level=none|broker][&verify-ssl=true|false][&kafka-ack-level=none|broker]]

Request parameters:

  • topic-name: name of topic

  • push-endpoint: 发送推送通知的endpoint的URI,同样包含三类:http,amqp及kafka

GET SUBSCRIPTION INFORMATION

获取指定订阅的信息

GET /subscriptions/<sub-name>

Response:

{
    "user":"",
    "name":"",
    "topic":"",
    "dest":{
        "bucket_name":"",
        "oid_prefix":"",
        "push_endpoint":"",
        "push_endpoint_args":""
    }
    "s3_id":""
}

DELETE SUBSCRIPTION

删除指定订阅subscription.

DELETE /subscriptions/<sub-name>

Events

PULL EVENTS

拉取指定订阅(sub)的关联事件.

GET /subscriptions/?events[&max-entries=][&marker=]

Request parameters:

  • marker: 事件列表的page number,若未指定则从最早的事件开始

  • max-entries: 返回的事件数的最大值,默认是100

ACK EVENT

对事件确认,确认后的事件会从订阅历史中删除。

POST /subscriptions/?ack&event-id=

Request parameters:

  • event-id: 待确认的事件id

Pub-Sub Module 的实现

代码组织架构

可以将Pub-Sub 模块的代码大致分为10个部分,PubSub module各子模块的架构层次如下:

0. sync service

RGW 服务化后,分出了sync module service。

具体请见sync module 通用部分解析。

services/svc_sync_modules.h
services/svc_sync_modules.cc
 
class RGWSI_SyncModules : public RGWServiceInstance

1. sync module 基类

sync module 的同步模块实例化基类,同步模块管理基类,同步模块同步处理基类,负责各子模块注册,各子模块继承:

rgw_sync_module.h
rgw_sync_module.cc

class RGWDataSyncModule 
class RGWSyncModuleInstance // sync module 实例基类
class RGWSyncModule
class RGWSyncModulesManager
class RGWStatRemoteObjCBCR : public RGWCoroutine
class RGWCallStatRemoteObjCR : public RGWCoroutine
void rgw_register_sync_modules(RGWSyncModulesManager *modules_manager);

2. pubsub submodule 实例化及管理

pubsub submodule管理的实例化及pubsub submodule管理,继承自rgw_sync_module

rgw_sync_module_pubsub.h
rgw_sync_module_pubsub.cc

class RGWPSSyncModule : public RGWSyncModule // 获取RGWPSSyncModuleInstance 实例
class RGWPSSyncModuleInstance : public RGWSyncModuleInstance // 
class RGWPSDataSyncModule : public RGWDataSyncModule // 数据同步实现,实际是生成多种Coroutine协程函数对象。

struct PSConfig // pubsub 基本配置,包含:s3用户,topics list,subs list等
struct PSTopicConfig // topic 配置:topic name,相关subs list
struct PSSubConfig // sub 配置,包含:预推送的endpoint的信息,相关topic 等
struct PSNotificationConfig // notification 配置:相关topic
struct objstore_event // 对象存储事件定义:id,bucket,obj,mtime,attrs list 等
class PSEvent
class RGWSingletonCR
class PSSubscription
class PSManager
class RGWPSFindBucketTopicsCR : public RGWCoroutine
class RGWPSHandleObjEventCR : public RGWCoroutine
class RGWPSHandleRemoteObjCR : public RGWCallStatRemoteObjCR 
class RGWPSGenericObjEventCBCR : public RGWCoroutine

class RGWPSDataSyncModule : public RGWDataSyncModule

3. pubsub op 实现

pubsub op 实现(在新增这2个文件之前,这部分放在rgw_sync_module_pubsub_rest中实现的)

rgw_rest_pubsub_common.h
rgw_rest_pubsub_common.cc

// create a topic
class RGWPSCreateTopicOp : public RGWDefaultResponseOp
// list all topics
class RGWPSListTopicsOp : public RGWOp
// get topic information
class RGWPSGetTopicOp : public RGWOp
// delete a topic
class RGWPSDeleteTopicOp : public RGWDefaultResponseOp
// create a subscription
class RGWPSCreateSubOp : public RGWDefaultResponseOp
// get subscription information (including push-endpoint if exist)
class RGWPSGetSubOp : public RGWOp
// delete subscription
class RGWPSDeleteSubOp : public RGWDefaultResponseOp
// acking of an event
class RGWPSAckSubEventOp : public RGWDefaultResponseOp
// fetching events from a subscription
// dpending on whether the subscription was created via s3 compliant API or not
// the matching events will be returned
class RGWPSPullSubEventsOp : public RGWOp
// notification creation
class RGWPSCreateNotifOp : public RGWDefaultResponseOp
// delete a notification
class RGWPSDeleteNotifOp : public RGWDefaultResponseOp
// get topics/notifications on a bucket
class RGWPSListNotifsOp : public RGWOp

4. pubsub rest api

pubsub rest api,继承自rgw_rest_pubsub_common.h 中类资源

rgw_sync_module_pubsub_rest.h
rgw_sync_module_pubsub_rest.cc

// command: PUT /topics/[&push-endpoint=[&=]]
class RGWPSCreateTopic_ObjStore : public RGWPSCreateTopicOp
// command: GET /topics
class RGWPSListTopics_ObjStore : public RGWPSListTopicsOp
// command: GET /topics/
class RGWPSGetTopic_ObjStore : public RGWPSGetTopicOp
// command: DELETE /topics/
class RGWPSDeleteTopic_ObjStore : public RGWPSDeleteTopicOp
// ceph specifc topics handler factory
class RGWHandler_REST_PSTopic : public RGWHandler_REST_S3
// command: PUT /subscriptions/?topic=[&push-endpoint=[&=]]...
class RGWPSCreateSub_ObjStore : public RGWPSCreateSubOp
// command: GET /subscriptions/
class RGWPSGetSub_ObjStore : public RGWPSGetSubOp
// command: DELETE /subscriptions/
class RGWPSDeleteSub_ObjStore : public RGWPSDeleteSubOp
....

5. pubsub 资源读写实现

pubsub 相关资源的增、删、查实现:数据结构定义,方法定义

rgw_pubsub.h
rgw_pubsub.cc

struct rgw_pubsub_event; // 事件
struct rgw_pubsub_sub_dest; // 订阅目标位置
struct rgw_pubsub_sub_config; // 订阅配置
struct rgw_pubsub_topic // 主题
struct rgw_pubsub_topic_subs // 主题相关的订阅列表
struct rgw_pubsub_bucket_topics // 存储桶相关的主题列表
struct rgw_pubsub_user_topics // 用户的主题列表

// pubsub 相关方法定义
class RGWUserPubSub:{
  class Bucke;
  class Sub}

6. pubsub 推送实现

pubsub 推送实现,endpoint 定义(http,amqp,kafka...),在pubsub notification 中的publish() 中会用到:

rgw_pubsub_push.h
rgw_pubsub_push.cc

class RGWPubSubEndpoint // endpoint 基类
class RGWPubSubHTTPEndpoint : public RGWPubSubEndpoint // HTTP endpoint 实现
class RGWPubSubAMQPEndpoint : public RGWPubSubEndpoint { // AMQP endpoint 实现
  class NoAckPublishCR : public RGWCoroutine
  class AckPublishCR : public RGWCoroutine, public RGWIOProvider
}
class RGWPubSubKafkaEndpoint : public RGWPubSubEndpoin //kafka endpoint 实现

7. pubsub: allow pubsub REST API on master

rgw_rest_pubsub.h
rgw_rest_pubsub.cc

class RGWHandler_REST_PSNotifs_S3 : public RGWHandler_REST_S3
class RGWHandler_REST_PSTopic_AWS : public RGWHandler_REST
class RGWPSCreateTopic_ObjStore_AWS : public RGWPSCreateTopicOp
class RGWPSListTopics_ObjStore_AWS : public RGWPSListTopicsOp

8. publish notification

发布函数: int publish(const req_state* s,
                                             const ceph::real_time& mtime,
                                             const std::string& etag,
                                             EventType event_type,
                                             rgw::sal::RGWRadosStore* store);
rgw_notify.h
rgw_notify.cc

rgw_notify_event_types.h
rgw_notify_event_types.cc 
 
// 事件类型
  enum EventType {
    ObjectCreated                        = 0xF,
    ObjectCreatedPut                     = 0x1,
    ObjectCreatedPost                    = 0x2,
    ObjectCreatedCopy                    = 0x4,
    ObjectCreatedCompleteMultipartUpload = 0x8,
    ObjectRemoved                        = 0xF0,
    ObjectRemovedDelete                  = 0x10,
    ObjectRemovedDeleteMarkerCreated     = 0x20,
    UnknownEvent                         = 0x100
  };

rgw_op.cc: 在各个OP 执行的末尾通过rgw::notify::publish()向notification manager发送请求
rgw_rest_s3.cc: 获取pubsub OP

9. 配套资源

rgw_arn.h: AWS resource namespace,详见[其他-ARN]
rgw_amqp.h: amqp resource
rgw_kafka.h: kafka resource

发布-订阅实现

发布及订阅发起

首先是要配置发布和订阅,这一部分较简单。

主要通过radosgw-admin CLI 和 HTTP API 调用实现主题、订阅的创建。

以CLI 创建订阅为例,其他路径类似。

在rgw_admin.cc 中会直接调到rgw_pubsub.cc。前面已经说明,rgw_pubsub.cc 实现了pubsub 相关资源(topic、subscription等等)的增、删、查实现,包含数据结构定义和方法定义等。

  if (opt_cmd == OPT_PUBSUB_SUB_CREATE) {
    if (get_tier_type(store) != "pubsub") {
      cerr << "ERROR: only pubsub tier type supports this command" << std::endl;
      return EINVAL;
    }
    ...

    rgw_pubsub_topic_subs topic;
    int ret = ups.get_topic(topic_name, &topic);
    ...

    rgw_pubsub_sub_dest dest_config;
    dest_config.bucket_name = sub_dest_bucket;
    dest_config.oid_prefix = sub_oid_prefix;
    dest_config.push_endpoint = sub_push_endpoint;

    auto psmodule = static_cast(store->getRados()->get_sync_module().get());
    auto conf = psmodule->get_effective_conf();
    ...
    auto sub = ups.get_sub(sub_name);
    ret = sub->subscribe(topic_name, dest_config); // 写入配置的subscription 信息
    ...
  }

订阅配置处理

int  RGWUserPubSub::Sub::subscribe( const  string& topic,  const  rgw_pubsub_sub_dest& dest,  const  std::string& s3_id)
{
   RGWObjVersionTracker user_objv_tracker;
   rgw_pubsub_user_topics topics;
   rgw::sal::RGWRadosStore *store = ps->store;
 
   int  ret = ps->read_user_topics(&topics, &user_objv_tracker);
   if  (ret < 0) {
     ldout(store->ctx(), 1) <<  "ERROR: failed to read topics info: ret="  << ret << dendl;
     return  ret != -ENOENT ? ret : -EINVAL;
   }
   auto iter = topics.topics.find(topic);
   ...
   auto& t = iter->second;
   rgw_pubsub_sub_config sub_conf;
   sub_conf.user = ps->user;
   sub_conf.name = sub;
   sub_conf.topic = topic;
   sub_conf.dest = dest;
   sub_conf.s3_id = s3_id;
   t.subs.insert(sub);
   ret = ps->write_user_topics(topics, &user_objv_tracker);
   if  (ret < 0) {
     ldout(store->ctx(), 1) <<  "ERROR: failed to write topics info: ret="  << ret << dendl;
     return  ret;
   }
   // 向当前用户的pubsub 上下文中加入配置的订阅信息
   ret = write_sub(sub_conf, nullptr);
   if  (ret < 0) {
     ldout(store->ctx(), 1) <<  "ERROR: failed to write subscription info: ret="  << ret << dendl;
     return  ret;
   }
   return  0;
}

发布及订阅的事件处理

接着sync module 的通用部分处理,从 rgw service 初始化开始:

  1. sync module 服务实例化:RGWSI_SyncModules,并生成data handler,data_handler:RGWPSDataSyncModule,包含init(),start_sync(),sync_object(),remove_object(),create_delete_marker()
  2. 已启动的data sync 线程启动RGWDataSyncCR 协程,该协程尝试获取data handler。
  3. data_handler 开始同步。
  4. 对topics->subs 下的事件处理:存储事件对象和推送至远端(http,amqp,kafka endpoint)。

1.sync module 服务实例化并生成data handler

先看下svc_sync_modules.h

class  RGWSI_SyncModules :  public  RGWServiceInstance
{
   RGWSyncModulesManager *sync_modules_manager{nullptr};
   RGWSyncModuleInstanceRef sync_module;
 
   struct  Svc {
     RGWSI_Zone *zone{nullptr};
   } svc;
public :
   RGWSI_SyncModules(CephContext *cct): RGWServiceInstance(cct) {}
   ~RGWSI_SyncModules();
   RGWSyncModulesManager *get_manager() {
     return  sync_modules_manager;
   }
   void  init(RGWSI_Zone *zone_svc);
   int  do_start() override;
   RGWSyncModuleInstanceRef& get_sync_module() {  return  sync_module; }
};
 
// 初始化结束后,会调用已注册的sync_modules_manager 来创建对应的sync_module 实例
int  RGWSI_SyncModules::do_start()
{
   auto& zone_public_config = svc.zone->get_zone();
   // 创建对应实例
   int  ret = sync_modules_manager->create_instance(cct, zone_public_config.tier_type, svc.zone->get_zone_params().tier_config, &sync_module);
   ...
     return  ret;
   }
...
   return  0;
}

生成data_handler

然后是调到RGWPSSyncModule::create_instance(), 其实就是生成RGWPSSyncModuleInstance对象。获取到data handler。

到rgw_sync_module_pubsub.cc 

int RGWPSSyncModule::create_instance(CephContext *cct, const JSONFormattable& config, RGWSyncModuleInstanceRef *instance) {
  instance->reset(new RGWPSSyncModuleInstance(cct, config));
  return 0;
}

// 看一下RGWPSSyncModuleInstance 构造函数都做了什么
RGWPSSyncModuleInstance::RGWPSSyncModuleInstance(CephContext *cct, const JSONFormattable& config)
{
  // 非常重要的一步,生成data_handler
  // 后续的数据同步操作主要由data_handler 处理:init(), start_sync(), sync_object(), remove_object(), create_delete_marker()
  data_handler = std::unique_ptr(new RGWPSDataSyncModule(cct, config));
  string jconf = json_str("conf", *data_handler->get_conf());
  JSONParser p;
  if (!p.parse(jconf.c_str(), jconf.size())) {
    ldout(cct, 1) << "ERROR: failed to parse sync module effective conf: " << jconf << dendl;
    effective_conf = config;
  } else {
    effective_conf.decode_json(&p);
  }
// 以下是按照配置生成AMQP或kafka endpoint
#ifdef WITH_RADOSGW_AMQP_ENDPOINT
  if (!rgw::amqp::init(cct)) {
    ldout(cct, 1) << "ERROR: failed to initialize AMQP manager in pubsub sync module" << dendl;
  }
#endif
#ifdef WITH_RADOSGW_KAFKA_ENDPOINT
  if (!rgw::kafka::init(cct)) {
    ldout(cct, 1) << "ERROR: failed to initialize Kafka manager in pubsub sync module" << dendl;
  }
#endif
}

2.获取data handler,并开始同步

那么handler 是怎么工作的呢,RGWDataSyncCR 工作方式也是一个函数对象封装的协程。

RGWDataSyncCR 协程的启动可见sync module 部分。

class RGWDataSyncCR : public RGWCoroutine {
  RGWDataSyncEnv *sync_env;
  uint32_t num_shards;
  rgw_data_sync_status sync_status;
  ...
public:
  RGWDataSyncCR(RGWDataSyncEnv *_sync_env, uint32_t _num_shards, RGWSyncTraceNodeRef& _tn, bool *_reset_backoff);
  ~RGWDataSyncCR() override;

  int operate() override {
    reenter(this) {
      // 获取同步状态
      yield call(new RGWReadDataSyncStatusCoroutine(sync_env, &sync_status));
      // 获取data handler
      data_sync_module = sync_env->sync_module->get_data_handler();
      ...
      //同步状态初始化
      if ((rgw_data_sync_info::SyncState)sync_status.sync_info.state == rgw_data_sync_info::StateInit) {
        tn->log(20, SSTR("init"));
        sync_status.sync_info.num_shards = num_shards;
        uint64_t instance_id;
        instance_id = ceph::util::generate_random_number();
        yield call(new RGWInitDataSyncStatusCoroutine(sync_env, num_shards, instance_id, tn, &sync_status));
        if (retcode < 0) {
          tn->log(0, SSTR("ERROR: failed to init sync, retcode=" << retcode));
          return set_cr_error(retcode);
        }
        // sets state = StateBuildingFullSyncMaps

        *reset_backoff = true;
      }
      //data handler 初始化
      data_sync_module->init(sync_env, sync_status.sync_info.instance_id);

      // fullsync时的同步状态更新
      if  ((rgw_data_sync_info::SyncState)sync_status.sync_info.state == rgw_data_sync_info::StateBuildingFullSyncMaps) {
        tn->log(10, SSTR("building full sync maps"));
        /* call sync module init here */
        sync_status.sync_info.num_shards = num_shards;
        yield call(data_sync_module->init_sync(sync_env));
        if (retcode < 0) {
          tn->log(0, SSTR("ERROR: sync module init_sync() failed, retcode=" << retcode));
          return set_cr_error(retcode);
        }
        /* state: building full sync maps */
        yield call(new RGWListBucketIndexesCR(sync_env, &sync_status));
        if (retcode < 0) {
          tn->log(0, SSTR("ERROR: failed to build full sync maps, retcode=" << retcode));
          return set_cr_error(retcode);
        }
        sync_status.sync_info.state = rgw_data_sync_info::StateSync;

        /* update new state */
        yield call(set_sync_info_cr());
        if (retcode < 0) {
          tn->log(0, SSTR("ERROR: failed to write sync status, retcode=" << retcode));
          return set_cr_error(retcode);
        }

        *reset_backoff = true;
      }
      // 调用子类的start_sync()
      yield call(data_sync_module->start_sync(sync_env));

      // 同步中的分片处理
      yield {
        if  ((rgw_data_sync_info::SyncState)sync_status.sync_info.state == rgw_data_sync_info::StateSync) {
          tn->log(10, SSTR("spawning " << num_shards << " shards sync"));
          for (map::iterator iter = sync_status.sync_markers.begin();
               iter != sync_status.sync_markers.end(); ++iter) {
            RGWDataSyncShardControlCR *cr = new RGWDataSyncShardControlCR(sync_env, sync_env->store->svc()->zone->get_zone_params().log_pool,
                                                                          iter->first, iter->second, tn);
            ...
          }
        }
      }

      return set_cr_done();
    }
    return 0;
  }
...
};

data handler 内部的处理如下:

// 在构造函数中,主要做两件事:
//  1.赋值 env, conf
//  2.初始化env
  RGWPSDataSyncModule(CephContext *cct, const JSONFormattable& config) : env(std::make_shared()), conf(env->conf) {
    env->init(cct, config);
  }
// PSConfigRef& conf 包含了当前pubsub 的基本配置
// PSEnv 结构如下:
struct PSEnv {
  PSConfigRef conf;
  shared_ptr data_user_info;
  PSManagerRef manager; //其中包含一个订阅相关的类class GetSubCR,可作为函数对象的coroutine
  PSEnv() : conf(make_shared()),
            data_user_info(make_shared()) {}
  void init(CephContext *cct, const JSONFormattable& config) {
    conf->init(cct, config); // 初始化之前赋值的PSConf 信息
  }
}
 
// 除了构造函数,RGWPSDataSyncModule 还有几个成员函数
// 这几个成员函数用于数据同步处理,而具体处理逻辑是在返回的函数对象中。
void init(RGWDataSyncEnv *sync_env, uint64_t instance_id);
RGWCoroutine *start_sync(RGWDataSyncEnv *sync_env);
RGWCoroutine *sync_object(RGWDataSyncEnv *sync_env, RGWBucketInfo& bucket_info, 
      rgw_obj_key& key, std::optional versioned_epoch, rgw_zone_set *zones_trace) override {
    ldout(sync_env->cct, 10) << conf->id << ": sync_object: b=" << bucket_info.bucket << 
          " k=" << key << " versioned_epoch=" << versioned_epoch.value_or(0) << dendl;
    return new RGWPSHandleObjCreateCR(sync_env, bucket_info, key, env, versioned_epoch); // 返回RGWPSHandleObjCreateCR 函数对象
  }
RGWCoroutine *remove_object(...);
RGWCoroutine *create_delete_marker(...);
在这之前zone 之间的同步已经开始了:

以sync_object()为例,分析返回的RGWCoroutine 函数对象的执行过程。
RGWPSHandleObjCreateCR 内部封装一个协程,具体实现就是执行这个协程。
class RGWPSHandleObjCreateCR : public RGWCoroutine {
  ...
public:
  RGWPSHandleObjCreateCR(RGWDataSyncEnv *_sync_env,...
  ~RGWPSHandleObjCreateCR() override {}
  // 这里重载() ,利用boost::asio::coroutine 创建RGWPSHandleObjCreateCR 协程处理
  int operate() override {
    reenter(this) { // reenter() 域内定义一段协程
      yield call(new RGWPSFindBucketTopicsCR(sync_env, env, bucket_info.owner, //RGWPSFindBucketTopicsCR 也是一个函数对象类,函数对象内部也是协程。用于获取bucket topics
                                             bucket_info.bucket, key,
                                             rgw::notify::ObjectCreated,
                                             &topics));
      ...
      yield call(new RGWPSHandleRemoteObjCR(sync_env, bucket_info, key, env, versioned_epoch, topics)); // 获取topics之后sync_object()的处理
      ...
    }
    return 0;
  }
};
  
// 不再是一个函数对象,包含一个回调函数
class RGWPSHandleRemoteObjCR : public RGWCallStatRemoteObjCR {
  PSEnvRef env;
  std::optional versioned_epoch;
  TopicsRef topics;
public:
  RGWPSHandleRemoteObjCR(RGWDataSyncEnv *_sync_env,
                        RGWBucketInfo& _bucket_info, rgw_obj_key& _key,
                        PSEnvRef _env, std::optional _versioned_epoch,
                        TopicsRef& _topics) : RGWCallStatRemoteObjCR(_sync_env, _bucket_info, _key),   // 基类 RGWCallStatRemoteObjCR,其中会触发回调函数allocate_callback()
                                                           env(_env), versioned_epoch(_versioned_epoch),
                                                           topics(_topics) {
  }
 
  ~RGWPSHandleRemoteObjCR() override {}
  // 回调函数中返回一个RGWPSHandleRemoteObjCBCR 函数对象包装的协程
  // 这个回调函数会覆盖基类的同名虚函数 RGWCallStatRemoteObjCR::allocate_callback()
  RGWStatRemoteObjCBCR *allocate_callback() override {
    return new RGWPSHandleRemoteObjCBCR(sync_env, bucket_info, key, env, versioned_epoch, topics);
  }
};
 
// coroutine invoked on remote object creation
class RGWPSHandleRemoteObjCBCR : public RGWStatRemoteObjCBCR {
  RGWDataSyncEnv *sync_env;
  PSEnvRef env;
  std::optional versioned_epoch;
  EventRef event; // ceph event
  EventRef record; // s3 record
  TopicsRef topics;
public:
  RGWPSHandleRemoteObjCBCR(RGWDataSyncEnv *_sync_env,
                          RGWBucketInfo& _bucket_info, rgw_obj_key& _key,
                          PSEnvRef _env, std::optional _versioned_epoch,
                          TopicsRef& _topics) : RGWStatRemoteObjCBCR(_sync_env, _bucket_info, _key),
                                                                      sync_env(_sync_env),
                                                                      env(_env),
                                                                      versioned_epoch(_versioned_epoch),
                                                                      topics(_topics) {
  }
  int operate() override {
    reenter(this) {
      ldout(sync_env->cct, 20) << ": stat of remote obj: z=" << sync_env->source_zone
                               << " b=" << bucket_info.bucket << " k=" << key << " size=" << size << " mtime=" << mtime
                               << " attrs=" << attrs << dendl;
      {
        std::vectorstring, std::string> > attrs;
        for (auto& attr : attrs) {
          string k = attr.first;
          if (boost::algorithm::starts_with(k, RGW_ATTR_PREFIX)) {
            k = k.substr(sizeof(RGW_ATTR_PREFIX) - 1);
          }
          attrs.push_back(std::make_pair(k, attr.second));
        }
        // at this point we don't know whether we need the ceph event or S3 record
        // this is why both are created here, once we have information about the
        // subscription, we will store/push only the relevant ones
        make_event_ref(sync_env->cct,
                       bucket_info.bucket, key,
                       mtime, &attrs,
                       rgw::notify::ObjectCreated, &event);
        make_s3_record_ref(sync_env->cct,
                       bucket_info.bucket, bucket_info.owner, key,
                       mtime, &attrs,
                       rgw::notify::ObjectCreated, &record);
      }
      // 这里开始对各个主题及订阅做处理,是pubsub 事件的具体处理部分
      // 根据订阅信息,选择存储/推送ceph event或s3 record
      yield call(new RGWPSHandleObjEventCR(sync_env, env, bucket_info.owner, event, record, topics));
      if (retcode < 0) {
        return set_cr_error(retcode);
      }
      return set_cr_done();
    }
    return 0;
  }
};

3.topics->subs 下的事件处理

在RGWPSHandleObjEventCR 同样会由函数对象封装一段协程。

这里处理是核心:

- 遍历存储桶/对象的所有的topics

- 接着遍历topic 下的所有subscriptions,针对是否是s3 兼容,分为两个部分,分别:

  • 在当前集群中存储事件;
  • 推送事件至已配置的endpoints:http endpoint,amqp endpoint,或者kafka endpoint

class RGWPSHandleObjEventCR : public RGWCoroutine {
  ...
public:
  RGWPSHandleObjEventCR(RGWDataSyncEnv* const _sync_env,....
 
  int operate() override {
    reenter(this) {
      ldout(sync_env->cct, 20) << ": handle event: obj: z=" << sync_env->source_zone
                               << " event=" << json_str("event", *event, false)
                               << " owner=" << owner << dendl;
 
      ldout(sync_env->cct, 20) << "pubsub: " << topics->size() << " topics found for path" << dendl;
      
      // outside caller should check that
      ceph_assert(!topics->empty());
 
      if (perfcounter) perfcounter->inc(l_rgw_pubsub_event_triggered);
 
      // loop over all topics related to the bucket/object
      for (titer = topics->begin(); titer != topics->end(); ++titer) {
        ldout(sync_env->cct, 20) << ": notification for " << event->source << ": topic=" <<
          (*titer)->name << ", has " << (*titer)->subs.size() << " subscriptions" << dendl;
        // loop over all subscriptions of the topic
        for (siter = (*titer)->subs.begin(); siter != (*titer)->subs.end(); ++siter) {
          ldout(sync_env->cct, 20) << ": subscription: " << *siter << dendl;
          has_subscriptions = true;
          sub_conf_found = false;
          // try to read subscription configuration from global/user cond
          // configuration is considered missing only if does not exist in either
          for (oiter = owners.begin(); oiter != owners.end(); ++oiter) {
            yield PSManager::call_get_subscription_cr(sync_env, env->manager, this, *oiter, *siter, &sub);
            if (retcode < 0) {
              if (sub_conf_found) {
                // not a real issue, sub conf already found
                retcode = 0;
              }
              last_sub_conf_error = retcode;
              continue;
            }
            sub_conf_found = true;
            // 根据是否订阅是否是S3 兼容API,分别处理
            if (sub->sub_conf->s3_id.empty()) {
              // subscription was not made by S3 compatible API
              ldout(sync_env->cct, 20) << "storing event for subscription=" << *siter << " owner=" << *oiter << " ret=" << retcode << dendl;
              // 存储事件:会调用RGWObjectSimplePutCR 协程处理事件对象的存储(rgw_cr_tools.h rgw_cr_rados.h )
              yield call(PSSubscription::store_event_cr(sync_env, sub, event)); // 非S3 兼容的rgw_pubsub_event event
              if (retcode < 0) {
                if (perfcounter) perfcounter->inc(l_rgw_pubsub_store_fail);
                ldout(sync_env->cct, 1) << "ERROR: failed to store event for subscription=" << *siter << " ret=" << retcode << dendl;
              } else {
                if (perfcounter) perfcounter->inc(l_rgw_pubsub_store_ok);
                event_handled = true;
              }
              if (sub->sub_conf->push_endpoint) {
                ldout(sync_env->cct, 20) << "push event for subscription=" << *siter << " owner=" << *oiter << " ret=" << retcode << dendl;
                // 推送事件至订阅中配置的endpoint
                yield call(PSSubscription::push_event_cr(sync_env, sub, event)); // 非S3 兼容的rgw_pubsub_event event
                if (retcode < 0) {
                  if (perfcounter) perfcounter->inc(l_rgw_pubsub_push_failed);
                  ldout(sync_env->cct, 1) << "ERROR: failed to push event for subscription=" << *siter << " ret=" << retcode << dendl;
                } else {
                  if (perfcounter) perfcounter->inc(l_rgw_pubsub_push_ok);
                  event_handled = true;
                }
              }
            } else {
              // subscription was made by S3 compatible API
              ldout(sync_env->cct, 20) << "storing record for subscription=" << *siter << " owner=" << *oiter << " ret=" << retcode << dendl;
              record->configurationId = sub->sub_conf->s3_id;
              yield call(PSSubscription::store_event_cr(sync_env, sub, record)); //S3 兼容的rgw_pubsub_s3_record record
              if (retcode < 0) {
                if (perfcounter) perfcounter->inc(l_rgw_pubsub_store_fail);
                ldout(sync_env->cct, 1) << "ERROR: failed to store record for subscription=" << *siter << " ret=" << retcode << dendl;
              } else {
                if (perfcounter) perfcounter->inc(l_rgw_pubsub_store_ok);
                event_handled = true;
              }
              if (sub->sub_conf->push_endpoint) {
                  ldout(sync_env->cct, 20) << "push record for subscription=" << *siter << " owner=" << *oiter << " ret=" << retcode << dendl;
                yield call(PSSubscription::push_event_cr(sync_env, sub, record)); //S3 兼容的rgw_pubsub_s3_record record
                if (retcode < 0) {
                  if (perfcounter) perfcounter->inc(l_rgw_pubsub_push_failed);
                  ldout(sync_env->cct, 1) << "ERROR: failed to push record for subscription=" << *siter << " ret=" << retcode << dendl;
                } else {
                  if (perfcounter) perfcounter->inc(l_rgw_pubsub_push_ok);
                  event_handled = true;
                }
              }
            }
          }
          if (!sub_conf_found) {
            // could not find conf for subscription at user or global levels
            ...
          }
        }
      }
      ....
      return set_cr_done();
    }
    return 0;
  }
};

这里会根据是否是S3 兼容API 对事件的细节处理也是不一样的。

rgw_pubsub_s3_record 完全按照AWS S3 Event Message Structure 的标准定义,目前在用的是version 2.1(目前S3 已不再使用ver2.0)。
而非S3兼容的Ceph 自定义的rgw_pubsub_event 就要简洁很多,仅记录一些必要信息:event id, event name, 待存储的事件对象event_obj。
struct rgw_pubsub_s3_record {
  constexpr static const char* const json_type_single = "Record";
  constexpr static const char* const json_type_plural = "Records";
  // 2.1
  std::string eventVersion;
  // aws:s3
  std::string eventSource;
  // zonegroup
  std::string awsRegion;
  // time of the request
  ceph::real_time eventTime;
  // type of the event
  std::string eventName;
  // user that sent the requet (not implemented)
  std::string userIdentity;
  // IP address of source of the request (not implemented)
  std::string sourceIPAddress;
  // request ID (not implemented)
  std::string x_amz_request_id;
  // radosgw that received the request
  std::string x_amz_id_2;
  // 1.0
  std::string s3SchemaVersion;
  // ID received in the notification request
  std::string configurationId;
  // bucket name
  std::string bucket_name;
  // bucket owner (not implemented)
  std::string bucket_ownerIdentity;
  // bucket ARN, ARN 详细介绍见文末
  std::string bucket_arn;
  // object key
  std::string object_key;
  // object size (not implemented)
  uint64_t object_size;
  // object etag
  std::string object_etag;
  // object version id bucket is versioned
  std::string object_versionId;
  // hexadecimal value used to determine event order for specific key
  std::string object_sequencer;
  // this is an rgw extension (not S3 standard)
  // used to store a globally unique identifier of the event
  // that could be used for acking
  std::string id;
  // this is an rgw extension holding the internal bucket id
  std::string bucket_id;
  // meta data
  std::mapstring, std::string> x_meta_map;
...
}
S3 兼容的rgw_pubsub_s3_record 中包含AWS S3 Event Message Structure 标准。
目前的标准版本v2.2
{  
   "Records":[  
      {  
         "eventVersion":"2.2",
         "eventSource":"aws:s3",
         "awsRegion":"us-west-2",
         "eventTime":The time, in ISO-8601 format, for example, 1970-01-01T00:00:00.000Z, when Amazon S3 finished processing the request,
         "eventName":"event-type",
         "userIdentity":{  
            "principalId":"Amazon-customer-ID-of-the-user-who-caused-the-event"
         },
         "requestParameters":{  
            "sourceIPAddress":"ip-address-where-request-came-from"
         },
         "responseElements":{  
            "x-amz-request-id":"Amazon S3 generated request ID",
            "x-amz-id-2":"Amazon S3 host that processed the request"
         },
         "s3":{  
            "s3SchemaVersion":"1.0",
            "configurationId":"ID found in the bucket notification configuration",
            "bucket":{  
               "name":"bucket-name",
               "ownerIdentity":{  
                  "principalId":"Amazon-customer-ID-of-the-bucket-owner"
               },
               "arn":"bucket-ARN" // 见ARN 说明
            },
            "object":{  
               "key":"object-key",
               "size":object-size,
               "eTag":"object eTag",
               "versionId":"object version if bucket is versioning-enabled, otherwise null",
               "sequencer": "a string representation of a hexadecimal value used to determine event sequence, 
                   only used with PUTs and DELETEs"
            }
         },
         "glacierEventData": {
            "restoreEventData": {
               "lifecycleRestorationExpiryTime": "The time, in ISO-8601 format, for example, 1970-01-01T00:00:00.000Z, of Restore Expiry",
               "lifecycleRestoreStorageClass": "Source storage class for restore"
            }
         }
      }
   ]
}
可以对比rgw_pubsub_s3_record 和S3标准,RGW 除了多了object_sequencer,其他都是依照S3 标准。
object_sequencer 是一个后面用于确认的全局唯一的事件编号。
 
struct rgw_pubsub_event {
  constexpr static const char* const json_type_single = "event";
  constexpr static const char* const json_type_plural = "events";
  std::string id; // 事件ID
  std::string event_name; // 事件名
  std::string source; // 发生事件的存储桶+对象:bucket.name + "/" + key.name;
  ceph::real_time timestamp; // 事件发生时间
  JSONFormattable info; // 其实就是struct objstore_event:bucket,key,mtime,attrs 等
}


- 事件存储会通过rgw_cr_ 相关方式,异步写入rados 集群。

  事件会被存储在特定用户的特定存储桶中,且不可以直接访问,只能通过提供API 访问事件。 

  特定用户uid、存储事件对象的特定存储桶前缀参数data_oid_prefix 会在zone tier-config 中设置。而且可以通过data_oid_prefix 指定存储事件对象的前缀。

class PSSubscription {
  class InitCR;
  friend class InitCR;
  friend class RGWPSHandleObjEventCR;

  RGWDataSyncEnv *sync_env;
  PSEnvRef env;
  PSSubConfigRef sub_conf;
  std::shared_ptr get_bucket_info_result;
  RGWBucketInfo *bucket_info{nullptr};
  RGWDataAccessRef data_access;
  RGWDataAccess::BucketRef bucket; // 这个bucket 即为存储事件对象的存储桶

- 事件推送通过RGWPubSubEndpoint::send_to_completion_async() 发送出去。目前支持三类endpoints:

  • RGWPubSubHTTPEndpoint
  • RGWPubSubAMQPEndpoint
  • RGWPubSubKafkaEndpoint
// endpoint base class all endpoint  - types should derive from it
class RGWPubSubEndpoint {
public:
  RGWPubSubEndpoint() = default;
  // endpoint should not be copied
  RGWPubSubEndpoint(const RGWPubSubEndpoint&) = delete;
  const RGWPubSubEndpoint& operator=(const RGWPubSubEndpoint&) = delete;
 
  typedef std::unique_ptr Ptr;
 
  // factory method for the actual notification endpoint
  // derived class specific arguments are passed in http args format
  // may throw a configuration_error if creation fails
  static Ptr create(const std::string& endpoint, const std::string& topic, const RGWHTTPArgs& args, CephContext *cct=nullptr);
  
  // this method is used in order to send notification (Ceph specific) and wait for completion
  // in async manner via a coroutine when invoked in the data sync environment
  virtual RGWCoroutine* send_to_completion_async(const rgw_pubsub_event& event, RGWDataSyncEnv* env) = 0;
 
  // this method is used in order to send notification (S3 compliant) and wait for completion
  // in async manner via a coroutine when invoked in the data sync environment
  virtual RGWCoroutine* send_to_completion_async(const rgw_pubsub_s3_record& record, RGWDataSyncEnv* env) = 0;
 
  // this method is used in order to send notification (S3 compliant) and wait for completion
  // in async manner via a coroutine when invoked in the frontend environment
  virtual int send_to_completion_async(CephContext* cct, const rgw_pubsub_s3_record& record, optional_yield y) = 0;
 
  // present as string
  virtual std::string to_str() const { return ""; }
   
  virtual ~RGWPubSubEndpoint() = default;
   
  // exception object for configuration error
  struct configuration_error : public std::logic_error {
    configuration_error(const std::string& what_arg) :
      std::logic_error("pubsub endpoint configuration error: " + what_arg) {}
  };
};

通知实现

这里讲的通知即Bucket Notification。bucket notification 其实就是事件推送,它是兼容S3 事件推送,且不做事件存储。

test ref: https://github.com/ceph/ceph/pull/28971

主要实现就是rgw_notify.cc 中的rgw::notify::publish()。这个函数会在每个对象变更的OP 的执行末尾处被调用。如下RGWPutObj

void RGWPutObj::execute()
{
  // send request to notification manager
  const auto ret = rgw::notify::publish(s, mtime, etag, rgw::notify::ObjectCreatedPut, store);
  if (ret < 0) {
    ldpp_dout(this, 5) << "WARNING: publishing notification failed, with error: " << ret << dendl;
  // TODO: we should have conf to make send a blocking coroutine and reply with error in case sending failed
  // this should be global conf (probably returnign a different handler)
    // so we don't need to read the configured values before we perform it
  }
}

rgw::notify::publish()

int publish(const req_state* s,
        const ceph::real_time& mtime,
        const std::string& etag,
        EventType event_type,
        rgw::sal::RGWRadosStore* store) {
    RGWUserPubSub ps_user(store, s->user->user_id);
    RGWUserPubSub::Bucket ps_bucket(&ps_user, s->bucket);
    rgw_pubsub_bucket_topics bucket_topics;
    auto rc = ps_bucket.get_topics(&bucket_topics);
    ...
    rgw_pubsub_s3_record record;
    populate_record_from_request(s, mtime, etag, event_type, record);
    bool event_handled = false;
    bool event_should_be_handled = false;
    for (const auto& bucket_topic : bucket_topics.topics) {
        const rgw_pubsub_topic_filter& topic_filter = bucket_topic.second;
        const rgw_pubsub_topic& topic_cfg = topic_filter.topic;
        if (!match(topic_filter, s, event_type)) {
            // topic does not apply to req_state
            continue;
        }
        event_should_be_handled = true;
        record.configurationId = topic_filter.s3_id;
        ...
        try {
            // TODO add endpoint LRU cache
            const auto push_endpoint = RGWPubSubEndpoint::create(topic_cfg.dest.push_endpoint,
                    topic_cfg.dest.arn_topic,
                    RGWHTTPArgs(topic_cfg.dest.push_endpoint_args),
                    s->cct);
            const std::string push_endpoint_str = push_endpoint->to_str();
            ldout(s->cct, 20) << "push endpoint created: " << push_endpoint_str << dendl;
            auto rc = push_endpoint->send_to_completion_async(s->cct, record, s->yield); // 发送事件通知至远端,跟订阅主题触发的部分一样,目前支持三类:http,amqp,kafka
            ...
            if (perfcounter) perfcounter->inc(l_rgw_pubsub_push_ok);
            ldout(s->cct, 20) << "successfull push to endpoint " << push_endpoint_str << dendl;
            event_handled = true;
        } catch (const RGWPubSubEndpoint::configuration_error& e) {
            ...
        }
    }
 
    if (event_should_be_handled) {
        // not counting events with no notifications or events that are filtered
        // counting a single event, regardless of the number of notifications it sends
        if (perfcounter) perfcounter->inc(l_rgw_pubsub_event_triggered);
        if (!event_handled) {
            // all notifications for this event failed
            if (perfcounter) perfcounter->inc(l_rgw_pubsub_event_lost);
        }
    }
 
    return 0;
}

注:以上代码解析基于Ceph 社区最新的master 分支(截至2019/11/27)。

其他

ARN

在创建topic 时需要指定topic arn,在拉取事件时,如果返回的是标准S3 event record 时,其中的bucket 也是通过bucket arn 指定的。那什么是ARN 呢?

ARN(Amazon Resource Name)用来唯一标识 AWS 资源。要在 AWS 全局环境中(比如 IAM 策略、Amazon Relational Database Service (Amazon RDS) 标签和 API 调用中)明确指定一项资源时,必须使用 ARN。

ARN 格式

以下是 ARN 的一般格式;所用的具体组成部分和值取决于 AWS 服务。对应rgw_arn.h rgw_arn.cc 中定义了ARN。

arn:::::
arn:::::/
arn::::::
- partition

资源所处的分区。对于标准 AWS 区域,分区是 aws。如果资源位于其他分区,则分区是 aws-partitionname。例如,位于 中国(北京) 区域的资源的分区为 aws-cn

RGW 实现支持以下partition:

enum  struct  Partition {
   aws, aws_cn, aws_us_gov, wildcard
   // If we wanted our own ARNs for principal type unique to us
   // (maybe to integrate better with Swift) or for anything else we
   // provide that doesn't map onto S3, we could add an 'rgw'
   // partition type.
};

service

标识 AWS 产品(例如,Amazon S3、IAM 或 Amazon RDS)的服务命名空间。

RGW 实现支持以下service:

enum  struct  Service {
   apigateway, appstream, artifact, autoscaling, aws_portal, acm,
   cloudformation, cloudfront, cloudhsm, cloudsearch, cloudtrail,
   cloudwatch, events, logs, codebuild, codecommit, codedeploy,
   codepipeline, cognito_idp, cognito_identity, cognito_sync,
   config, datapipeline, dms, devicefarm, directconnect,
   ds, dynamodb, ec2, ecr, ecs, ssm, elasticbeanstalk, elasticfilesystem,
   elasticloadbalancing, elasticmapreduce, elastictranscoder, elasticache,
   es, gamelift, glacier, health, iam, importexport, inspector, iot,
   kms, kinesisanalytics, firehose, kinesis, lambda, lightsail,
   machinelearning, aws_marketplace, aws_marketplace_management,
   mobileanalytics, mobilehub, opsworks, opsworks_cm, polly,
   redshift, rds, route53, route53domains, sts, servicecatalog,
   ses, sns, sqs, s3, swf, sdb, states, storagegateway, support,
   trustedadvisor, waf, workmail, workspaces, wildcard
};
- region

资源所在的区域。一些资源的 ARN 不需要区域,因此,该组成部分可能会被省略。

- account-id

拥有资源的 AWS 账户的 ID(不含连字符)。例如:123456789012。一些资源的 ARN 不需要账号,因此,该组成部分可能会被省略。

- resource-type 或resource-id

ARN 这部分的内容因服务而异。资源标识符可以是资源的名称或 ID(例如,user/Bob 或 instance/i-1234567890abcdef0)或资源路径。例如,某些资源标识符包括父资源 (sub-resource-type/parent-resource/sub-resource) 或限定符(例如版本)(resource-type:resource-name:qualifier)。

rgw 目前的实现要求:

合法的Resource 格式 (only resource part):
 * 'resource'
 * 'resourcetype/resource'
 * 'resourcetype/resource/qualifier'
 * 'resourcetype/resource:qualifier'
 * 'resourcetype:resource'
 * 'resourcetype:resource:qualifier'
 
注:'resourceType'不允许使用通配符。像如下这样是不合法的:
   arn:aws:iam::123456789012:u*

下面是一个S3 存储桶的ARN,其中第二个ARN 包含路径 /Development/

arn:aws:s3:::my_corporate_bucket/*
arn:aws:s3:::my_corporate_bucket/Development/*

Resource ARNs

ref: https://docs.aws.amazon.com/IAM/latest/UserGuide/list_amazons3.html#amazons3-resources-for-iam-policies

Q&A

1.Notification 有两类API,两类API 有何区别?为什么要有两类API?

答:Notification 有两类API:S3兼容和非S3 兼容。S3 兼容API 依照AWS S3 Bucket Notification 标准,将notification 当做存储桶的属性,可参考:s3/bucketops/#create-notification。而非S3 兼容的API 如 PUT /notifications/bucket/ , notification 和存储桶关联,但不是存储桶的subresource。

    在pub-sub module 中,非兼容notification API 和topic、subscription 等的API 风格一致,都是直接作为根资源,这也就是非兼容notification API 的存在原因。而S3 兼容API 主要用于AWS Bucket Notification 的Ceph 实现(Bucket Notification),当然需要兼容S3 API 标准。 

2.Notification 的删除逻辑

答:1)直接删除notification 会删除subscription。2)删除bucket 会删除notification ,但是不会删除notification 对应的subscription(需要显式删除subscription)。

3.Subscription 和 Notification 的区别和联系?

答:Subscription 跟topic 的关系和 Notification 跟topic 的关系都是多对一的关系:一个主题可以被多个订阅,一个主题也可以有多个通知。

    1)创建notification 时会创建和notification id 同名的subscription,且该sub 可使用sub 相关api 访问。2)删除逻辑不同,删除subscription 直接调用其删除API 即可,notification 删除见上个问题。

4.事件的存储

答:pub-sub module 中会将事件以对象的形式存储在特定用户的特定存储桶中(可在zone tier-config 中设置),且该事件对象不可以直接访问,需要通过Pub-Sub 的REST API 访问。细节可见 发布与订阅事件处理 3.topics->subs 下的事件处理 中事件存储部分。

5.master zone 和pub-sub zone 之间的同步

答:zone 直接的同步会在data sync 线程跑到RGWDataSyncCR 协程中就会进行, 对各个bucket shard 同步进行处理,而同步又分full sync 和incremental sync,这部分主要流程大致为:

6. 目前pubsub module 的CLI 问题

答:目前pubsub module的CLI 未给出命令说明,且存在问题,拉取events 列表会触发core dump

目前发现是sub pull 拉取events 列表时获取sub 的问题

 if (opt_cmd == OPT_PUBSUB_SUB_PULL) {
    ...
    //auto sub = ups.get_sub(sub_name); 
    auto sub = ups.get_sub_with_events(sub_name);
    ret = sub->list_events(marker, max_entries);
    if (ret < 0) {
      cerr << "ERROR: could not list events: " << cpp_strerror(-ret) << std::endl;
      return -ret;
    }
    encode_json("result", *sub, formatter);
    formatter->flush(cout);
 }
在subscription 创建时也要改
 if (opt_cmd == OPT_PUBSUB_SUB_PULL) {
    ...
    //auto sub = ups.get_sub(sub_name);
    auto sub = ups.get_sub_with_events(sub_name);
    ret = sub->subscribe(topic_name, dest_config);
    ...
 }

原先调用get_sub() 获取到的是基类sub,造成后面调用list_events() 也是调用的基类函数,触发core dump。

Reference

  • https://github.com/ceph/ceph/pull/23298

  • https://docs.ceph.com/docs/master/radosgw/pubsub-module/

  • https://github.com/ceph/ceph/pull/27091

  • https://docs.aws.amazon.com/IAM/latest/UserGuide/list_amazons3.html#amazons3-resources-for-iam-policies

 

你可能感兴趣的:(Ceph 之RGW Pub-Sub Module)