Concepts
- workflow. A workflow is a set of activities that carry out some objective, together with logic that coordinates the activities.
- Each workflow runs in an AWS resource called a domain, which controls the workflow's scope.
- Workflow history The progress of every workflow execution is recorded in its workflow history, which Amazon SWF maintains. The workflow history contains every event that causes the execution state of the workflow execution to change, such as scheduled and completed activities, task timeouts, and signals.
- Each event has the following:
- A type, such as WorkflowExecutionStarted or ActivityTaskCompleted
- A timestamp in Unix time format
- An ID that uniquely identifies the event
- When designing an Amazon SWF workflow, you precisely define each of the required activities. You then register each activity with Amazon SWF as an activity type. When you register the activity, you provide information such as a name and version, and some timeout values based on how long you expect the activity to take.
- An activity worker is a program that receives activity tasks, performs them, and provides results back.
- activity task that represents one invocation of an activity
- The coordination logic in a workflow is contained in a software program called a decider. The decider schedules activity tasks, provides input data to the activity workers, processes events that arrive while the workflow is in progress, and ultimately ends (or closes) the workflow when the objective has been completed.
- The role of the Amazon SWF service is to function as a reliable central hub through which data is exchanged between the decider, the activity workers, and other relevant entities such as the person administering the workflow. Amazon SWF also maintains the state of each workflow execution, which saves your application from having to store the state in a durable way.
Key concepts list
-
Actor
- activity worker <-> perform task
- decider <-> make decision
- workflow starter <-> start workflow
- AWS SWF service <-> central hub, maintain workflow history
-
what we need to implement:
- decider <- contain workflow logic
- activity worker <- execute task
- workflow starter
Event
Everything that change the status of workflow is called an event. Workflow History will contains all the event that happened. Each event has a event type, which is predefined by SWF.
Each type of event has a distinct set of descriptive attributes that are appropriate to that type. For example, the ActivityTaskCompleted event has attributes that contain the IDs for the events that correspond to the time that the activity task was scheduled and when it was started, as well as an attribute that holds result data.
List of event type:
http://docs.aws.amazon.com/amazonswf/latest/apireference/API_HistoryEvent.html
Note: event type is defined by SWF. However, workflow type and activity type is defined by us.
task
Amazon SWF interacts with activity workers and deciders by providing them with work assignments known as tasks.-
task list
You can think of task lists as similar to dynamic queues. When a task is scheduled in Amazon SWF, you can specify a queue (task list) to put it in. Similarly, when you poll Amazon SWF for a task you say which queue (task list) to get the task from.- decision task list
- activity task list
** initially, workflow starter will initialize decision task list. After that, it is decider's responsibility to put task into both lists. **
- easy to confused:
- task
- event
- activity
Check this webpage, contains example of task, event, action and activity:
http://docs.aws.amazon.com/amazonswf/latest/developerguide/swf-dev-workflow-exec-lifecycle.html
Data exchange
Because Amazon SWF maintains the complete execution state of each workflow execution, including the inputs and the results of tasks, all actors can be stateless. As a result, workflow processing is highly scalable. As the load on your system grows, you can simply add more actors to increase capacity.
If a task is available on the specified task list, Amazon SWF returns it immediately in the response. If no task is available, Amazon SWF holds the TCP connection open for up to 60 seconds so that, if a task becomes available during that time, it can be returned in the same connection. If no task becomes available within 60 seconds, it returns an empty response and closes the connection. (An empty response is a Task structure in which the value of taskToken is an empty string.) If this happens, the decider or activity worker should poll again.
Workflow Execution
Bringing together the ideas discussed in the preceding sections, here is an overview of the steps to develop and run a workflow in Amazon SWF:
- Write activity workers that implement the processing steps in your workflow.
- Write a decider to implement the coordination logic of your workflow.
- Register your activities and workflow with Amazon SWF.
You can do this step programmatically or by using the AWS Management Console.
- Start your activity workers and decider.
These actors can run on any computing device that can access an Amazon SWF endpoint. For example, you could use compute instances in the cloud, such as Amazon Elastic Compute Cloud (Amazon EC2); servers in your data center; or even a mobile device, to host a decider or activity worker. Once started, the decider and activity workers should start polling Amazon SWF for tasks. - Start one or more executions of your workflow.
- View workflow executions using the AWS Management Console.
Developing an Activate Work
http://docs.aws.amazon.com/amazonswf/latest/developerguide/swf-dg-develop-activity.html
An activity worker provides the implementation of one or more activity types. An activity worker communicates with Amazon SWF to receive activity tasks and perform them. You can have a fleet of multiple activity workers performing activity tasks of the same activity type.
Amazon SWF makes an activity task available to activity workers when the decider schedules the activity task. When a decider schedules an activity task, it provides the data (which you determine) that the activity worker needs to perform the activity task. Amazon SWF inserts this data into the activity task before sending it to the activity worker.
Activity workers are managed by you. They can be written in any language. A worker can be run anywhere, as long as it can communicate with Amazon SWF through the API. Because Amazon SWF provides all the information needed to perform an activity task, all activity workers can be stateless. Statelessness enables your workflows to be highly scalable; to handle increased capacity requirements, simply add more activity workers.
This section explains how to implement an activity worker. The activity workers should repeatedly do the following.
- Poll Amazon SWF for an activity task.
- Begin performing the task.
- Periodically report a heartbeat to Amazon SWF if the task is long-lived.
- Report that the task completed or failed and return the results to Amazon SWF.
Developing Decider
http://docs.aws.amazon.com/amazonswf/latest/developerguide/swf-dg-dev-deciders.html
A decider is an implementation of the coordination logic of your workflow type that runs during the execution of your workflow. You can run multiple deciders for a single workflow type.
Because the execution state for a workflow execution is stored in its workflow history, deciders can be stateless. Amazon SWF maintains the workflow execution history and provides it to a decider with each decision task. This enables you to dynamically add and remove deciders as necessary, which makes the processing of your workflows highly scalable. As the load on your system grows, you simply add more deciders to handle the increased capacity. Note, however, that there can be only one decision task open at any time for a given workflow execution.
Every time a state change occurs for a workflow execution, Amazon SWF schedules a decision task. Each time a decider receives a decision task, it does the following:
- Interprets the workflow execution history provided with the decision task
- Applies the coordination logic based on the workflow execution history and makes decisions on what to do next. Each decision is represented by a Decision structure
- Completes the decision task and provides a list of decisions to Amazon SWF.
This section describes how to develop a decider, which involves:
- Programming your decider to poll for decision tasks
- Programming your decider to interpret the workflow execution history and make decisions
- Programming your decider to respond to a decision task.
- The examples in this section show how you might program a decider for the e-commerce example workflow.
You can implement the decider in any language that you like and run it anywhere, as long as it can communicate with Amazon SWF through its service API.
Launch decider
Once launched, your deciders should start polling Amazon SWF for tasks. Until you start workflow executions and Amazon SWF schedules decision tasks, these polls will time out and get empty responses. An empty response is a Task structure in which the value of taskToken is an empty string. Your deciders should simply continue to poll.
Amazon SWF ensures that only one decision task can be active for a workflow execution at any time. This prevents issues such as conflicting decisions. Additionally, Amazon SWF ensures that a single decision task is assigned to a single decider, regardless of the number of deciders that are running.
In order for workflow executions to progress, one or more deciders must be running. You can launch as many deciders as you like. Amazon SWF supports multiple deciders polling on the same task list.
Important actions
swf = boto3.client('swf')
from botocore.exceptions import ClientError
Initialize
register_domain()
register_activity_type()
register_workflow_type()
swf.register_workflow_type(
domain=DOMAIN,
name=WORKFLOW,
version=VERSION,
description="Test workflow",
defaultExecutionStartToCloseTimeout="250",
defaultTaskStartToCloseTimeout="NONE",
defaultChildPolicy="TERMINATE",
defaultTaskList={"name": TASKLIST}
)
Action for workflow starter
start_workflow_execution()
response = swf.start_workflow_execution(
domain=DOMAIN,
workflowId='test-1001',
workflowType={
"name": WORKFLOW,
"version": VERSION
},
taskList={
'name': TASKLIST
},
input=''
)
Action for activity worker
record_activity_task_heartbeat()
poll_for_activity_task()
task = swf.poll_for_activity_task(
domain=DOMAIN,
taskList={'name': TASKLIST},
identity='worker-1')
Return value syntax:
{
'taskToken': 'string',
'activityId': 'string',
'startedEventId': 123,
'workflowExecution': {
'workflowId': 'string',
'runId': 'string'
},
'activityType': {
'name': 'string',
'version': 'string'
},
'input': 'string'
}
respond_activity_task_completed()
swf.respond_activity_task_completed(
taskToken=task['taskToken'],
result='success'
)
Action for decider
poll_for_decision_task()
response = client.poll_for_decision_task(
domain='string',
taskList={
'name': 'string'
},
identity='string',
nextPageToken='string',
maximumPageSize=123,
reverseOrder=True|False
)
newTask = swf.poll_for_decision_task(
domain=DOMAIN,
taskList={'name': TASKLIST},
identity='decider-1',
reverseOrder=False)
if 'taskToken' not in newTask:
print "Poll timed out, no new task. Repoll"
elif 'events' in newTask:
eventHistory = [evt for evt in newTask['events'] if not evt['eventType'].startswith('Decision')]
lastEvent = eventHistory[-1]
return value syntax:
{
'taskToken': 'string',
'startedEventId': 123,
'workflowExecution': {
'workflowId': 'string',
'runId': 'string'
},
'workflowType': {
'name': 'string',
'version': 'string'
},
'events': [
{
'eventTimestamp': datetime(2015, 1, 1),
'eventType': 'WorkflowExecutionStarted', # or some other event type
'eventId': 123,
'workflowExecutionStartedEventAttributes': {
# detail of event attribute
}
- OR
'activityTaskCompletedEventAttributes': {
'result': 'string',
'scheduledEventId': 123,
'startedEventId': 123
},
-
},
],
'nextPageToken': 'string',
'previousStartedEventId': 123
}
respond_decision_task_completed()
syntax:
respond_decision_task_completed(
taskToken='string',
decisions=[
{
'decisionType': 'ScheduleActivityTask'|'RequestCancelActivityTask'|'CompleteWorkflowExecution'|'FailWorkflowExecution'|'CancelWorkflowExecution'|'ContinueAsNewWorkflowExecution'|'RecordMarker'|'StartTimer'|'CancelTimer'|'SignalExternalWorkflowExecution'|'RequestCancelExternalWorkflowExecution'|'StartChildWorkflowExecution'|'ScheduleLambdaFunction',
'scheduleActivityTaskDecisionAttributes': {
'activityType': {
'name': 'string',
'version': 'string'
},
'activityId': 'string',
'control': 'string',
'input': 'string',
'scheduleToCloseTimeout': 'string',
'taskList': {
'name': 'string'
},
'taskPriority': 'string',
'scheduleToStartTimeout': 'string',
'startToCloseTimeout': 'string',
'heartbeatTimeout': 'string'
}
],
executionContext='string'
)
swf.respond_decision_task_completed(
taskToken=newTask['taskToken'],
decisions=[
{
'decisionType': 'ScheduleActivityTask', # pre-defined by SWF
'scheduleActivityTaskDecisionAttributes': {
'activityType':{
'name': TASKNAME,
'version': VERSION
},
'activityId': 'activityid-' + str(uuid.uuid4()),
'input': '',
'scheduleToCloseTimeout': 'NONE',
'scheduleToStartTimeout': 'NONE',
'startToCloseTimeout': 'NONE',
'heartbeatTimeout': 'NONE',
'taskList': {'name': TASKLIST},
}
}
]
)
swf.respond_decision_task_completed(
taskToken=newTask['taskToken'],
decisions=[
{
'decisionType': 'CompleteWorkflowExecution',
'completeWorkflowExecutionDecisionAttributes': {
'result': 'success'
}
}
]
)
Collections of Attributes/ event type or whatever mentioned in the examples
- defaultTaskScheduleToStartTimeoutSeconds specifies how long the tasks can be queued in the activities task list, and is set to 300 seconds (5 minutes).
- defaultTaskStartToCloseTimeoutSeconds specifies the maximum time the activity can take to perform the task and is set to 10 seconds.
These timeouts ensure that the activity completes its task in a reasonable amount of time. If either timeout is exceeded, the framework generates an error and the workflow worker must decide how to handle the issue. For a discussion of how to handle such error, see Error Handling.
client.poll_for_decision_task(**kwargs)
Used by deciders to get a DecisionTask from the specified decision taskList . A decision task may be returned for any open workflow execution that is using the specified task list.