Akoflow – A Tool for Executing Scientific Workflows in Kubernetes
Akoflow is a command-line tool for executing scientific workflows in Kubernetes. It leverages the Kubernetes API to create and manage resources such as pods and jobs, enabling distributed and parallel execution of scientific workflows on a Kubernetes cluster.
Installation Guide #
GKE (Google Kubernetes Engine) #
To install Akoflow on a GKE cluster, you’ll need a Kubernetes cluster and admin permissions on it.
With these prerequisites, you can install Akoflow by running the following command:
kubectl apply -f https://raw.githubusercontent.com/ovvesley/akoflow/main/pkg/server/resource/akoflow-gcloud.yaml
This command will install the Akoflow server on your Kubernetes cluster, as well as the Kubernetes metrics-server.
That’s all you need to install Akoflow on your GKE cluster.
The Akoflow installation on GKE is done through a YAML configuration file, which defines the necessary resources for Akoflow to run in the cluster. This file will:
- Create a namespace called
akoflow
. - Create a
LoadBalancer
service for the Akoflow server. - Deploy the Akoflow server.
To check the host and port of the Akoflow server, run the following command:
kubectl get svc -n akoflow
Expected output:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
akoflow-server-service LoadBalancer <PRIVATE_IP> <YOUR_HOST> 8080:32191/TCP 5h38m
This command will return the host and port of the Akoflow server. You can access the Akoflow server using the Akoflow client at the specified host and port. The default port is 8080
, and the host is the external IP provided by the cloud provider.
After installation, you can install the Akoflow client on your local machine. To do this, download the client binary from here.
To execute a simple workflow with Akoflow, run:
akoflow --host <YOUR_HOST> --port <YOUR_PORT> --file <PATH_TO_YAML_FILE.yaml>
Usage Guide #
Workflows accepted by Akoflow are defined in YAML files. Each YAML file defines a workflow, which is a sequence of tasks to be executed in a Kubernetes cluster. Each task includes a name, a Docker image, and a command to run. Akoflow executes tasks in parallel, creating a pod for each one. It also supports defining dependencies between tasks, ensuring that a task is executed only after another task is completed.
Supported Workflow Types #
- Sequential Workflows
- Multiple tasks are executed one after the other in sequence.
- Data dependencies between tasks are sequential.
- A single persistent disk is created and shared across tasks. After one task completes, the disk is mounted to the next task. Example:
sequential-workflow
- Parallel Workflows
(coming soon) - Pegasus Workflow
(coming soon)
Workflow File #
The workflow file is a YAML document that defines a workflow. It consists of a list of tasks, with each task specifying a name, a Docker image, and a command to execute. You can also define task dependencies in the file to control execution order.
For more details, refer to the usage guide.
How It Works #
Akoflow uses the Kubernetes API to create and manage execution resources such as pods, jobs, and persistent volumes. It facilitates the distributed and parallel execution of scientific workflows on a Kubernetes cluster.
For additional information, see the how it works section.