Background on http frontends
-civetweb frontend
--thread per connection, requires a lot of threads
---qos or priority queuing would block frontend threads
-beast frontend
--boost::beast for http parsing
--boost::asio for async networking/io
--async for accepting connections and reading headers
---good model for qos - can queue requests without blocking threads
--synchronous call to process_request()
---thread per request, still need lots of threads
goal: scale requests independently of threads
-why boost::asio
--doesn't impose a threading model. io_service object is a reactor, call run() from any thread
--mature library, basis for C++ std::net library in Networking TS [1]
--the Extensible Asynchronous Model [2] provides several options for async primitives (callbacks, futures, coroutines)
--boost::asio::spawn() stackful coroutines: "enables programs to implement asynchronous logic in a synchronous manner" [3 http://www.boost.org/doc/libs/1_65_1/doc/html/boost_asio/reference/spawn.html]
proposed librados interfaces for asio [4 https://github.com/ceph/ceph/pull/19054]
--header-only wrapper over librados c++ api
--conform to the Extensible Asynchronous Model, so support the same primitives - see unit tests for examples
--deeper Objecter integration work in progress by Adam Emerson [5]
--gives radosgw a unified interface for async operations over http and rados
async process_request() [6]
-add optional yield_context* argument to process_request()
-beast frontend passes one, civetweb passes nullptr
-any librados calls use new interface when given a yield_context
-requires the yield_context* to be passed everywhere in between
--but we can stash it in req_state to make it available to all ops
-getting started with the easy stuff
--rgw_get_system_obj()
--reading user objects for authentication
--reading bucket/bucket instance objects (common to most s3/swift ops)
-this process leaves a lot of gaps. for example, rgw_get_system_obj() is in tons of call paths without access to a yield_context
--(either outside process_request(), or just aren't hooked up yet)
--just passing 'nullptr' makes it impossible to differentiate the yield_context argument from its 4 other arguments that default to nullptr!
--that makes it impossible to reason about which call paths could run asynchronously
-measurable progress towards full asynchrony
--new vocabulary type 'optional_yield_context' with 'null_yield' for empty value
--null_yield designates a call site that is definitely synchronous
--makes it easy to audit the code and find the pieces that still need conversion
-fighting regression once we're close
--have librados calls log warnings when called synchronously from a beast frontend thread (using a thread_local flag)
--scan those logs in teuthology runs to flag failures
and then?
-vastly reduce the number of frontend threads for beast
-consolidate other background threads
remaining work:
- RGWGetObj waits on AioCompletions - use AioThrottle from PutObj instead
- replace librados IoCtx::operate() calls with rgw_rados_operate() and optional_yield_context
- thread optional_yield_context all the way from beast frontend to rgw_rados_operate() calls
- some cls client calls use IoCtx::operate() directly
- block_while_resharding() sleeps on a condition variable
- no async interface for pool object listings with IoCtx.nobjects_begin()
- libcurl http requests for auth (Keystone and OPA)
[1] "C++ Technical Specification - Extensions for Networking"
http://cplusplus.github.io/networking-ts/draft.pdf
[2] "Library Foundations for Asynchronous Operations"
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3896.pdf
[3] Reference: boost::asio::spawn
http://www.boost.org/doc/libs/1_65_1/doc/html/boost_asio/reference/spawn.html
[4] "librados: add async interfaces for use with boost::asio"
https://github.com/ceph/ceph/pull/19054
[5] "osdc/Objecter: Boost.Asio (I object!)"
https://github.com/ceph/ceph/pull/16715
[6] work in progress branch:
https://github.com/cbodley/ceph/commits/wip-rgw-async-process-171120