taskflow是oslo中用于为OpenStack项目和其他Python项目实现一个高可用的,易于理解的,声明式的执行工作、任务、流等的库。这个库让任务执行更加容易、一致和可靠。本文将详细介绍taskflow的实现原理与使用方式。
taskflow库在oslo项目中是一个实现比较复杂的项目,要弄清楚其实现原理,首先需要对其中的相关概念有所了解。所以,本文首先总结了taskflow中常用的一些基本概念,这些概念主要包括如下几个:
在介绍taskflow的基本概念时说到Flow类分为三种类型,本小节将详细介绍oslo自定义的Flow的三种类型。oslo自定义的Flow的类型都放在taskflow.patterns包中,在这个包中,定义了三个模块:graph_flow、linear_flow和unordered_flow,在这三个模块中都定义了各自的Flow类。接下来,将分别介绍这三种类型:
class Flow(flow.Flow):
def __init__(self, name, retry=None):
super(Flow, self).__init__(name, retry)
self._graph = gr.OrderedDiGraph(name=name)
self._last_item = self._no_last_item
def add(self, *items):
"""Adds a given task/tasks/flow/flows to this flow."""
for item in items:
if not self._graph.has_node(item):
self._graph.add_node(item)
if self._last_item is not self._no_last_item:
self._graph.add_edge(self._last_item, item,
attr_dict={flow.LINK_INVARIANT: True})
self._last_item = item
return self
def __len__(self):
return len(self._graph)
def __iter__(self):
for item in self._graph.nodes_iter():
yield item
@property
def requires(self):
requires = set()
prior_provides = set()
if self._retry is not None:
requires.update(self._retry.requires)
prior_provides.update(self._retry.provides)
for item in self:
requires.update(item.requires - prior_provides)
prior_provides.update(item.provides)
return frozenset(requires)
def iter_nodes(self):
for (n, n_data) in self._graph.nodes_iter(data=True):
yield (n, n_data)
def iter_links(self):
for (u, v, e_data) in self._graph.edges_iter(data=True):
yield (u, v, e_data)
上述代码便是linear_flow的具体实现,可以看到,初始化对象时将OrderedDiGraph对象赋值给_graph属性,在添加Task/Flow,也就是调用add(*items)方法时,首先就是遍历整个有序图,如果图中没有添加给定的Task/Flow对象,则向该图添加一个节点并保存相应的Task/Flow对象。而在调用iter_nodes()和iter_links()方法时,其实就是遍历图的所有节点或所有边。另外,由于linear_flow是按照插入顺序来进行执行和回滚操作的,所以requires和provides属性的设置和遍历不涉及到相关图的遍历;而如果是graph_flow则还需要考虑到requires和provides的相关影响。
在1.1节中介绍到,在taskflow执行过程中,如果发生错误,可以通过Retry对象进行重试。Retry是一个抽象类,它继承自Atom类,因此,Retry的子类可以覆写execute()和revert()方法。除此之外,Retry对象还定义了一个on_failure(history, *args, **kwargs)方法,这个方法在Task/Flow执行或回滚发生错误时,通常会使用以前异常的信息(如果这个历史失败信息不可用或未保存,则提供的History对象为空),即一个History对象。而这个History对象是一个为了简化与重试历史内容交互的帮助类,其包含两个重要属性:_failure表示发生的异常,_contents表示相关异常的重试内容。当taskflow执行失败时,on_failure()方法会结合传入的History对象获取重试的策略。
关于重试的策略,taskflow通过一个枚举类型的Decision定义了三种策略:
上文介绍到taskflow在具体实现Task/Flow管理时,首先定义了一个Engine抽象类,所有实现都需要继承这个抽象类。这个抽象类定义了如下重要属性和方法:
from oslo_log import log as logging
from oslo_utils import excutils
import taskflow.engines
from taskflow.patterns import linear_flow
from cinder import exception
from cinder import flow_utils
from cinder.message import api as message_api
from cinder.message import message_field
from cinder import rpc
from cinder import utils
from cinder.volume.flows import common
LOG = logging.getLogger(__name__)
ACTION = 'volume:create'
class ExtractSchedulerSpecTask(flow_utils.CinderTask):
"""Extracts a spec object from a partial and/or incomplete request spec.
Reversion strategy: N/A
"""
default_provides = set(['request_spec'])
def __init__(self, **kwargs):
super(ExtractSchedulerSpecTask, self).__init__(addons=[ACTION],
**kwargs)
def _populate_request_spec(self, volume, snapshot_id, image_id, backup_id):
# Create the full request spec using the volume object.
#
# NOTE(dulek): At this point, a volume can be deleted before it gets
# scheduled. If a delete API call is made, the volume gets instantly
# delete and scheduling will fail when it tries to update the DB entry
# (with the host) in ScheduleCreateVolumeTask below.
volume_type_id = volume.volume_type_id
vol_type = volume.volume_type
return {
'volume_id': volume.id,
'snapshot_id': snapshot_id,
'image_id': image_id,
'backup_id': backup_id,
'volume_properties': {
'size': utils.as_int(volume.size, quiet=False),
'availability_zone': volume.availability_zone,
'volume_type_id': volume_type_id,
},
'volume_type': list(dict(vol_type).items()),
}
def execute(self, context, request_spec, volume, snapshot_id,
image_id, backup_id):
# For RPC version < 1.2 backward compatibility
if request_spec is None:
request_spec = self._populate_request_spec(volume,
snapshot_id, image_id,
backup_id)
return {
'request_spec': request_spec,
}
class ScheduleCreateVolumeTask(flow_utils.CinderTask):
"""Activates a scheduler driver and handles any subsequent failures.
Notification strategy: on failure the scheduler rpc notifier will be
activated and a notification will be emitted indicating what errored,
the reason, and the request (and misc. other data) that caused the error
to be triggered.
Reversion strategy: N/A
"""
FAILURE_TOPIC = "scheduler.create_volume"
def __init__(self, driver_api, **kwargs):
super(ScheduleCreateVolumeTask, self).__init__(addons=[ACTION],
**kwargs)
self.driver_api = driver_api
self.message_api = message_api.API()
def _handle_failure(self, context, request_spec, cause):
try:
self._notify_failure(context, request_spec, cause)
finally:
LOG.error("Failed to run task %(name)s: %(cause)s",
{'cause': cause, 'name': self.name})
@utils.if_notifications_enabled
def _notify_failure(self, context, request_spec, cause):
"""When scheduling fails send out an event that it failed."""
payload = {
'request_spec': request_spec,
'volume_properties': request_spec.get('volume_properties', {}),
'volume_id': request_spec['volume_id'],
'state': 'error',
'method': 'create_volume',
'reason': cause,
}
try:
rpc.get_notifier('scheduler').error(context, self.FAILURE_TOPIC,
payload)
except exception.CinderException:
LOG.exception("Failed notifying on %(topic)s "
"payload %(payload)s",
{'topic': self.FAILURE_TOPIC, 'payload': payload})
def execute(self, context, request_spec, filter_properties, volume):
try:
self.driver_api.schedule_create_volume(context, request_spec,
filter_properties)
except Exception as e:
self.message_api.create(
context,
message_field.Action.SCHEDULE_ALLOCATE_VOLUME,
resource_uuid=request_spec['volume_id'],
exception=e)
# An error happened, notify on the scheduler queue and log that
# this happened and set the volume to errored out and reraise the
# error *if* exception caught isn't NoValidBackend. Otherwise *do
# not* reraise (since what's the point?)
with excutils.save_and_reraise_exception(
reraise=not isinstance(e, exception.NoValidBackend)):
try:
self._handle_failure(context, request_spec, e)
finally:
common.error_out(volume, reason=e)
def get_flow(context, driver_api, request_spec=None,
filter_properties=None,
volume=None, snapshot_id=None, image_id=None, backup_id=None):
"""Constructs and returns the scheduler entrypoint flow.
This flow will do the following:
1. Inject keys & values for dependent tasks.
2. Extract a scheduler specification from the provided inputs.
3. Use provided scheduler driver to select host and pass volume creation
request further.
"""
create_what = {
'context': context,
'raw_request_spec': request_spec,
'filter_properties': filter_properties,
'volume': volume,
'snapshot_id': snapshot_id,
'image_id': image_id,
'backup_id': backup_id,
}
flow_name = ACTION.replace(":", "_") + "_scheduler"
scheduler_flow = linear_flow.Flow(flow_name)
# This will extract and clean the spec from the starting values.
scheduler_flow.add(ExtractSchedulerSpecTask(
rebind={'request_spec': 'raw_request_spec'}))
# This will activate the desired scheduler driver (and handle any
# driver related failures appropriately).
scheduler_flow.add(ScheduleCreateVolumeTask(driver_api))
# Now load (but do not run) the flow using the provided initial data.
return taskflow.engines.load(scheduler_flow, store=create_what)
由于cinder创建硬盘的功能步骤繁多,操作复杂,且容易出错,因此在api和scheduler服务中都使用了taskflow对创建硬盘过程中的多个任务进行了管理,上面的代码是scheduler服务中创建硬盘时定义的两个任务Task类:ExtractSchedulerSpecTask和ScheduleCreateVolumeTask。接着,cinder-scheduler服务定义了get_flow()方法获取一个Engine对象。这个方法中,首先定义了一个linear_flow类型的Flow对象,然后调用Flow对象的add()方法将上述两个Task添加到Flow对象中,接着根据Flow对象加载一个Engine对象用来执行实际的流程操作。
@objects.Volume.set_workers
def create_volume(self, context, volume, snapshot_id=None, image_id=None,
request_spec=None, filter_properties=None,
backup_id=None):
self._wait_for_scheduler()
try:
flow_engine = create_volume.get_flow(context,
self.driver,
request_spec,
filter_properties,
volume,
snapshot_id,
image_id,
backup_id)
except Exception:
msg = _("Failed to create scheduler manager volume flow")
LOG.exception(msg)
raise exception.CinderException(msg)
with flow_utils.DynamicLogListener(flow_engine, logger=LOG):
flow_engine.run()
如这个例子中,就是调用cinder-scheduler服务中定义的get_flow()方法获取一个Engine对象flow_engine,然后调用flow_engine.run()方法即可执行定义的流程。