YAML is the format used in AkôFlow to define workflows and their activities. This flexible and human-readable structure allows users to configure all stages, dependencies, and resource requirements for the efficient execution of scientific workflows. Below, we provide a detailed explanation of each YAML component, accompanied by generic examples to illustrate its usage.
General Structure of YAML
In AkôFlow, the YAML is composed of general metadata, environment configuration, and the list of activities. Each part plays a crucial role in defining and executing the workflow.
1. General Metadata
Metadata provides basic information about the workflow, such as its name and the namespace in Kubernetes where it will be executed.
name
: A unique identifier for the workflow.namespace
: Specifies the Kubernetes namespace where the workflow will be managed.
Example:
name: example-workflow
namespace: example-namespace
2. Environment Configuration
Defines the technical aspects required for the workflow’s execution, such as the Docker image, storage, and resource policies.
image
: The name of the Docker image used for the activities.storageClassName
: Specifies the type of storage (e.g.,hostpath
,nfs
).storageSize
: The size of the allocated storage (e.g.,10Gi
).storagePolicy
: Defines the storage behavior, such asdistributed
orstandalone
.mountPath
: The path where the storage will be mounted in the containers.
Example:
spec:
image: "example-docker-image:latest"
storageClassName: "hostpath"
storageSize: "10Gi"
storagePolicy:
type: "distributed"
mountPath: "/data-path"
3. List of Activities
Activities represent the individual steps of the workflow. Each activity is configured with commands, resource limits, and dependencies.
Main Fields:
name
: The name of the activity.run
: The commands to be executed in the activity.memoryLimit
: The memory allocated to the activity (e.g.,1Gi
).cpuLimit
: The number of CPUs allocated to the activity.dependsOn
: A list of activities that must be completed before this one.
Example:
activities:
- name: step-1
run: |
echo "Executing step 1"
some-command --arg1 value1
memoryLimit: "1Gi"
cpuLimit: 1
- name: step-2
run: |
echo "Executing step 2"
another-command --arg2 value2
memoryLimit: "2Gi"
cpuLimit: 2
dependsOn:
- "step-1"
Full Example
Below is a complete YAML example for a workflow in AkôFlow:
name: example-workflow
spec:
image: "example-docker-image:latest"
namespace: "example-namespace"
storageClassName: "hostpath"
storageSize: "10Gi"
storagePolicy:
type: "distributed"
mountPath: "/data-path"
activities:
- name: step-1
run: |
echo "Step 1 is running"
some-command --arg value
memoryLimit: "1Gi"
cpuLimit: 1
- name: step-2
run: |
echo "Step 2 is running"
another-command --arg value
memoryLimit: "2Gi"
cpuLimit: 2
dependsOn:
- "step-1"
- name: step-3
run: |
echo "Step 3 is running"
final-command --arg value
memoryLimit: "4Gi"
cpuLimit: 4
dependsOn:
- "step-2"
Workflow Execution Flow
- Initial Step (
step-1
):- Executes initial commands without dependencies.
- Requires
1Gi
of memory and1 CPU
.
- Intermediate Step (
step-2
):- Executes after the completion of
step-1
. - Requires
2Gi
of memory and2 CPUs
.
- Executes after the completion of
- Final Step (
step-3
):- Executes after the completion of
step-2
. - Requires
4Gi
of memory and4 CPUs
.
- Executes after the completion of
Benefits of the YAML Structure
- Readability: The format is easy to understand and edit.
- Flexibility: Allows for multiple activities with different resources and dependencies.
- Reusability: YAML-defined workflows can be easily adapted to other projects.
- Portability: The YAML specification works seamlessly in any Kubernetes environment.
Field | Description | Example |
---|---|---|
name | Unique name of the workflow for identification. | akf-wf-gawa-distributed |
spec | Defines the general specifications of the workflow. | |
image | Name of the Docker image used for the activities. | "ovvesley/akoflow-wf-gawa:latest" |
namespace | Kubernetes namespace where the workflow will be executed. | akoflow |
storageClassName | Type of storage configured in the Kubernetes environment. | hostpath, premium, standard-rw... |
storageSize | Size of the storage allocated for the workflow. | 10Gi |
storagePolicy.type | Type of storage policy, such as distributed or local. | distributed |
mountPath | Path where the storage will be mounted in the containers of the activities. | /data-akoflow |
activities | List of activities that compose the workflow. | |
activities.name | Unique name of the activity within the workflow. | gawastep01 |
activities.run | Commands to be executed in the activity. | Perform setup, process data, execute computations. |
activities.memoryLimit | Amount of memory allocated to the activity. | 1Gi |
activities.cpuLimit | Number of CPUs allocated to the activity. | 1 |
activities.dependsOn | Specifies dependencies between activities, indicating that this activity will only run after the listed ones are completed. | ["gawastep01"] |
name
andnamespace
: Ensure a unique identification of the workflow within the Kubernetes environment.spec
: Centralizes general workflow definitions, such as the Docker image and storage configuration.activities
: Each activity is individually described, including its commands (run
), resource limits, and dependencies.dependsOn
: Defines the execution sequence of activities, enabling workflows with interdependent steps.