On Uber's Cadence
Checked 10+ workflow engines. Here are my conclusions:
- It is not designed for high traffic (> 100 per sec) + short lived task case (< 10 sec). At most one requirement can be satisfied
- However, WE should support many executing tasks, e.g., 100k tasks managed by the workflow engine
- Evolution lineage
AWS Simple Workflow -> Uber Cadence -> Temporal
-> AWS Step Function
Concepts
- Workflow: similiar to the coordinator in saga. Its code is hosted on the workflow worker, which is your process
- The communication between workflow worker and cadence service is encapsulated in a decision task (also called workflow task), e.g., when an external event happens to the workflow, a decision task will be created and dispatched to WW
- Activity: similar to the sub txn component in saga. Its code is hosted on the activity worker, which is your process and often the same process as the workflow worker
- The communciation between activity worker and cadence service is encapsulated in the activity task, e.g., WW sends a ScheduleActivityTask to cadence, which will dispatch a corresponding activity task to the AW
- Execution history: persistent log to support exactly-once, all-or-nothing semantics. All task data will be persisted too to support replay during recovery
Architecture
- Front end: API gateway
- Matching service: task scheduling, dispatching, and task list
- backed by task storage
- History service: workflow state, timer q, and transfer q
- backed by event, workflow, visibility storage