Workflows in AkôFlow
In AkôFlow, a workflow is an abstraction used to model complex processes, often represented as a directed acyclic graph (DAG). This graph consists of two main elements:
- Activities (nodes): Represent individual programs or tasks that need to be executed.
- Data dependencies (edges): Indicate the flow of information between activities, defining which data is produced and consumed throughout the process.
Workflows in AkôFlow enable the execution of scientific processes in distributed and containerized environments, optimizing the use of computational resources such as CPU, memory, and disk. This structure facilitates the automation of complex tasks, reducing execution time and promoting portability across different infrastructures like clouds, clusters, and local machines.
Activities in AkôFlow
An activity in AkôFlow corresponds to a specific step within a workflow. Each activity is associated with a program or operation that needs to be executed with a defined set of input data and parameters. The key characteristics of activities include:
Execution in Containers:
- Each activity is mapped to one or more containers in the Kubernetes environment.
- Containers are configured with specific resources (CPU, memory, disk) as defined by the user.
Data Dependencies:
- Activities consume data produced by predecessor activities and generate data for successor activities, creating an ordered execution flow.
- For example, an activity that processes astronomical images may produce outputs used by another activity for image adjustments.
Resource Configuration:
- AkôFlow allows users to define specific resources for each activity, ensuring optimized use of vCPUs, memory, and local or shared storage.
Execution Models:
- Activities can be executed using different scheduling models, such as First-Data-First (FDF) or First-Activity-First (FAF):
- FDF: An activity is executed as soon as the required data is produced.
- FAF: All activities associated with a specific workflow stage must be completed before moving to the next stage.
Provenance:
- During execution, AkôFlow captures detailed information about the performance of each activity, including resource usage metrics (CPU, memory) and execution logs. This data is stored in its provenance model for future analysis.
By combining these features, AkôFlow provides a flexible and efficient way to manage scientific workflows, enabling researchers to configure and monitor their activities in an intuitive and optimized manner.