Checked 10+ workflow engines. Here are my conclusions:

  • It is not designed for high traffic (> 100 per sec) + short lived task case (< 10 sec). At most one requirement can be satisfied
  • However, WE should support many executing tasks, e.g., 100k tasks managed by the workflow engine
  • Evolution lineage
AWS Simple Workflow -> Uber Cadence -> Temporal
		    -> AWS Step Function

Concepts

  • Workflow: similiar to the coordinator in saga. Its code is hosted on the workflow worker, which is your process
    • The communication between workflow worker and cadence service is encapsulated in a decision task (also called workflow task), e.g., when an external event happens to the workflow, a decision task will be created and dispatched to WW
  • Activity: similar to the sub txn component in saga. Its code is hosted on the activity worker, which is your process and often the same process as the workflow worker
    • The communciation between activity worker and cadence service is encapsulated in the activity task, e.g., WW sends a ScheduleActivityTask to cadence, which will dispatch a corresponding activity task to the AW
  • Execution history: persistent log to support exactly-once, all-or-nothing semantics. All task data will be persisted too to support replay during recovery

Architecture

  • Front end: API gateway
  • Matching service: task scheduling, dispatching, and task list
    • backed by task storage
  • History service: workflow state, timer q, and transfer q
    • backed by event, workflow, visibility storage