Workflows executing other parallel workflows: A practical guide


Introduction

There are numerous scenarios where you might want to execute tasks in parallel. One common use case involves dividing data into batches, processing each batch in parallel, and combining the results in the end. This approach not only enhances the speed of the overall processing but it also allows for easier error detection in smaller tasks.

On the other hand, setting up parallel tasks, monitoring them, handling errors in each task, and combining the results in the end is not trivial. Thankfully, Google Cloud’s Workflows can help. In this post, we will explore how you can use a parent workflow to set up and execute parallel child workflows.

https://storage.googleapis.com/gweb-cloudblog-publish/images/0_workflows_executing_workflows_in_paralle.max-1300x1300.png

Let’s get started!

Setting up the child workflow

To begin, let’s create a child workflow that will serve as the foundation for our parallel execution.

The child workflow receives arguments from the parent workflow. In our example, we’ll use a simple iteration integer, but in real-world scenarios, it could represent a data chunk passed from the parent workflow.

main:
  params: [args]
  steps:
    - init:
        assign:
          - iteration : ${args.iteration}

The child workflow starts performing some work. In this example, it simply waits 10 seconds to simulate doing some work.

- wait:
    call: sys.sleep
    args:
        seconds: 10

Afterwards, it returns the result or failure of the work. In this case, it just uses whether the iteration is even or odd to simulate success and failure:

- check_iteration_even_or_odd:
        switch:
          - condition: ${iteration % 2 == 0}
            next: raise_error
    - return_message:
        return: ${"Hello world"+iteration}
    - raise_error:
        raise: ${"Error with iteration "+iteration}

You can see the full definition in the workflow-child.yaml file. Deploy the child workflow:

gcloud workflows deploy workflow-child --source=workflow-child.yaml

Setting up the parent workflow

Now, let’s create the parent workflow, which orchestrates the parallel execution of the child workflows. The parent workflow starts by initializing a map to store the results of successful and failed executions.

main:
  steps:
    - init:
        assign:
          - execution_results: {} # results from each execution
          - execution_results.success: {} # successful executions saved under 'success' key
          - execution_results.failure: {} # failed executions saved under 'failure' key

Next, the parent workflow employs a parallel for-loop to execute the child workflows with data chunks. In our example, we pass integers from 1 to 4 to simulate data. As each iteration is independent, we parallelize them using the parallel keyword. Note that each for-loop iteration spins up a thread and the for-loop is not waiting for a response before proceeding with the next iteration.

- execute_child_workflows:
    parallel:
        shared: [execution_results]
        for:
          value: iteration
          in: [1, 2, 3, 4]
          steps:
            - iterate:

Within each iteration, the child workflow is executed with the iteration argument. The parent workflow then waits for the success or failure of the child workflow execution and captures the results/failures in the map.

try:
    steps:
    - execute_child_workflow:
        call: googleapis.workflowexecutions.v1.projects.locations.workflows.executions.run
        args:
            workflow_id: workflow-child
            #location: ...
            #project_id: ...
            argument:
            iteration: ${iteration}
        result: execution_result
    - save_successful_execution:
        assign:
            - execution_results.success[string(iteration)]: ${execution_result}
    except:
    as: e
    steps:
        - save_failed_execution:
            assign:
            - execution_results.failure[string(iteration)]: ${e}

Finally, the parent workflow returns the results/failures map.

- return_execution_results:
        return: ${execution_results}

You can see the full definition in workflow-parent.yaml file. Deploy the parent workflow:

gcloud workflows deploy workflow-parent --source=workflow-parent.yaml

Execute the workflow

With both workflows deployed, it’s time to execute the parent workflow:

gcloud workflows run workflow-parent

As the parent workflow runs, you will observe four parallel executions of the child workflow. Each child workflow represents a different batch of data being processed simultaneously.

https://storage.googleapis.com/gweb-cloudblog-publish/images/1_parallel_executions_in_googlecloud_console.max-600x600.png

Since they all run in parallel, after 10 seconds, you should see 2 of them succeeded and 2 failed:

https://storage.googleapis.com/gweb-cloudblog-publish/images/2_parallel_execution_results_in_console.max-700x700.png

The parent workflow displays the results of the successful executions:

https://storage.googleapis.com/gweb-cloudblog-publish/images/3_successful_executions.max-600x600.png

And the errors of the failed executions:

https://storage.googleapis.com/gweb-cloudblog-publish/images/4_failed_executions.max-600x600.png

At this point, the parent workflow has the option to retry the failed executions or proceed with the successful ones, depending on your requirements.

Summary

By dividing data into batches and executing them simultaneously, we can enhance overall processing speed and detect failures more easily in each execution. In this post, we explored how to implement parallel execution of workflows and combining the results using Google Cloud Workflows. 

Check out the following video on more information on parallel steps in Workflows:

And as always, feel free to contact me on Twitter @meteatamel for any questions or feedback.


See also