Introduction
There are numerous scenarios where you might want to execute tasks in parallel. One common use case involves dividing data into batches, processing each batch in parallel, and combining the results in the end. This approach not only enhances the speed of the overall processing but it also allows for easier error detection in smaller tasks.
On the other hand, setting up parallel tasks, monitoring them, handling errors in each task, and combining the results in the end is not trivial. Thankfully, Google Cloud’s Workflows can help. In this post, we will explore how you can use a parent workflow to set up and execute parallel child workflows.
Let’s get started!
Setting up the child workflow
To begin, let’s create a child workflow that will serve as the foundation for our parallel execution.
The child workflow receives arguments from the parent workflow. In our example,
we’ll use a simple iteration
integer, but in real-world scenarios, it could
represent a data chunk passed from the parent workflow.
main:
params: [args]
steps:
- init:
assign:
- iteration : ${args.iteration}
The child workflow starts performing some work. In this example, it simply waits 10 seconds to simulate doing some work.
- wait:
call: sys.sleep
args:
seconds: 10
Afterwards, it returns the result or failure of the work. In this case, it just
uses whether the iteration
is even or odd to simulate success and failure:
- check_iteration_even_or_odd:
switch:
- condition: ${iteration % 2 == 0}
next: raise_error
- return_message:
return: ${"Hello world"+iteration}
- raise_error:
raise: ${"Error with iteration "+iteration}
You can see the full definition in
the workflow-child.yaml
file.
Deploy the child workflow:
gcloud workflows deploy workflow-child --source=workflow-child.yaml
Setting up the parent workflow
Now, let’s create the parent workflow, which orchestrates the parallel execution of the child workflows. The parent workflow starts by initializing a map to store the results of successful and failed executions.
main:
steps:
- init:
assign:
- execution_results: {} # results from each execution
- execution_results.success: {} # successful executions saved under 'success' key
- execution_results.failure: {} # failed executions saved under 'failure' key
Next, the parent workflow employs a parallel for-loop to execute the child
workflows with data chunks. In our example, we pass integers from 1 to 4 to
simulate data. As each iteration is independent, we parallelize them using
the parallel
keyword. Note that each for-loop iteration spins up a thread and
the for-loop is not waiting for a response before proceeding with the next
iteration.
- execute_child_workflows:
parallel:
shared: [execution_results]
for:
value: iteration
in: [1, 2, 3, 4]
steps:
- iterate:
Within each iteration, the child workflow is executed with the iteration argument. The parent workflow then waits for the success or failure of the child workflow execution and captures the results/failures in the map.
try:
steps:
- execute_child_workflow:
call: googleapis.workflowexecutions.v1.projects.locations.workflows.executions.run
args:
workflow_id: workflow-child
#location: ...
#project_id: ...
argument:
iteration: ${iteration}
result: execution_result
- save_successful_execution:
assign:
- execution_results.success[string(iteration)]: ${execution_result}
except:
as: e
steps:
- save_failed_execution:
assign:
- execution_results.failure[string(iteration)]: ${e}
Finally, the parent workflow returns the results/failures map.
- return_execution_results:
return: ${execution_results}
You can see the full definition
in workflow-parent.yaml
file.
Deploy the parent workflow:
gcloud workflows deploy workflow-parent --source=workflow-parent.yaml
Execute the workflow
With both workflows deployed, it’s time to execute the parent workflow:
gcloud workflows run workflow-parent
As the parent workflow runs, you will observe four parallel executions of the child workflow. Each child workflow represents a different batch of data being processed simultaneously.
Since they all run in parallel, after 10 seconds, you should see 2 of them succeeded and 2 failed:
The parent workflow displays the results of the successful executions:
And the errors of the failed executions:
At this point, the parent workflow has the option to retry the failed executions or proceed with the successful ones, depending on your requirements.
Summary
By dividing data into batches and executing them simultaneously, we can enhance overall processing speed and detect failures more easily in each execution. In this post, we explored how to implement parallel execution of workflows and combining the results using Google Cloud Workflows.
Check out the following video on more information on parallel steps in Workflows:
And as always, feel free to contact me on Twitter @meteatamel for any questions or feedback.
Originally published at https://cloud.google.com/blog/products/application-development/setup-parallel-task-execution-with-parent-and-child-workflows