Workflow Specification

YAML is the format used in AkôFlow to define workflows and their activities. This flexible and human-readable structure allows users to configure all stages, dependencies, and resource requirements for the efficient execution of scientific workflows. Below, we provide a detailed explanation of each YAML component, accompanied by generic examples to illustrate its usage.

General Structure of YAML

In AkôFlow, the YAML is composed of general metadata, environment configuration, and the list of activities. Each part plays a crucial role in defining and executing the workflow.

1. General Metadata

Metadata provides basic information about the workflow, such as its name and the namespace in Kubernetes where it will be executed.

name: A unique identifier for the workflow.
namespace: Specifies the Kubernetes namespace where the workflow will be managed.

Example:

name: example-workflow
namespace: example-namespace

2. Environment Configuration

Defines the technical aspects required for the workflow’s execution, such as the Docker image, storage, and resource policies.

image: The name of the Docker image used for the activities.
storageClassName: Specifies the type of storage (e.g., hostpath, nfs).
storageSize: The size of the allocated storage (e.g., 10Gi).
storagePolicy: Defines the storage behavior, such as distributed or standalone.
mountPath: The path where the storage will be mounted in the containers.

Example:

spec:
  image: "example-docker-image:latest"
  storageClassName: "hostpath"
  storageSize: "10Gi"
  storagePolicy:
    type: "distributed"
  mountPath: "/data-path"

3. List of Activities

Activities represent the individual steps of the workflow. Each activity is configured with commands, resource limits, and dependencies.

Main Fields:

name: The name of the activity.
run: The commands to be executed in the activity.
memoryLimit: The memory allocated to the activity (e.g., 1Gi).
cpuLimit: The number of CPUs allocated to the activity.
dependsOn: A list of activities that must be completed before this one.

Example:

activities:
  - name: step-1
    run: |
      echo "Executing step 1"
      some-command --arg1 value1
    memoryLimit: "1Gi"
    cpuLimit: 1

  - name: step-2
    run: |
      echo "Executing step 2"
      another-command --arg2 value2
    memoryLimit: "2Gi"
    cpuLimit: 2
    dependsOn:
      - "step-1"

Full Example

Below is a complete YAML example for a workflow in AkôFlow:

name: example-workflow
spec:
  image: "example-docker-image:latest"
  namespace: "example-namespace"
  storageClassName: "hostpath"
  storageSize: "10Gi"
  storagePolicy:
    type: "distributed"
  mountPath: "/data-path"
  activities:
    - name: step-1
      run: |
        echo "Step 1 is running"
        some-command --arg value
      memoryLimit: "1Gi"
      cpuLimit: 1

    - name: step-2
      run: |
        echo "Step 2 is running"
        another-command --arg value
      memoryLimit: "2Gi"
      cpuLimit: 2
      dependsOn:
        - "step-1"

    - name: step-3
      run: |
        echo "Step 3 is running"
        final-command --arg value
      memoryLimit: "4Gi"
      cpuLimit: 4
      dependsOn:
        - "step-2"

Workflow Execution Flow

Initial Step (step-1):
- Executes initial commands without dependencies.
- Requires 1Gi of memory and 1 CPU.
Intermediate Step (step-2):
- Executes after the completion of step-1.
- Requires 2Gi of memory and 2 CPUs.
Final Step (step-3):
- Executes after the completion of step-2.
- Requires 4Gi of memory and 4 CPUs.

Benefits of the YAML Structure

Readability: The format is easy to understand and edit.
Flexibility: Allows for multiple activities with different resources and dependencies.
Reusability: YAML-defined workflows can be easily adapted to other projects.
Portability: The YAML specification works seamlessly in any Kubernetes environment.

Field	Description	Example
`name`	Unique name of the workflow for identification.	`akf-wf-gawa-distributed`
`spec`	Defines the general specifications of the workflow.
`image`	Name of the Docker image used for the activities.	`"ovvesley/akoflow-wf-gawa:latest"`
`namespace`	Kubernetes namespace where the workflow will be executed.	`akoflow`
`storageClassName`	Type of storage configured in the Kubernetes environment.	`hostpath, premium, standard-rw...`
`storageSize`	Size of the storage allocated for the workflow.	`10Gi`
`storagePolicy.type`	Type of storage policy, such as distributed or local.	`distributed`
`mountPath`	Path where the storage will be mounted in the containers of the activities.	`/data-akoflow`
`activities`	List of activities that compose the workflow.
`activities.name`	Unique name of the activity within the workflow.	`gawastep01`
`activities.run`	Commands to be executed in the activity.	`Perform setup, process data, execute computations.`
`activities.memoryLimit`	Amount of memory allocated to the activity.	`1Gi`
`activities.cpuLimit`	Number of CPUs allocated to the activity.	`1`
`activities.dependsOn`	Specifies dependencies between activities, indicating that this activity will only run after the listed ones are completed.	`["gawastep01"]`

name and namespace: Ensure a unique identification of the workflow within the Kubernetes environment.
spec: Centralizes general workflow definitions, such as the Docker image and storage configuration.
activities: Each activity is individually described, including its commands (run), resource limits, and dependencies.
dependsOn: Defines the execution sequence of activities, enabling workflows with interdependent steps.