Architecture Overview#
What is bfabric-app-runner?#
bfabric-app-runner is a framework for executing computational applications that integrate with B-Fabric, a laboratory information management system. It manages the full lifecycle of running an application: fetching input data, executing processing logic, and registering outputs back into B-Fabric.
Applications are defined declaratively through YAML specification files that describe which commands to run, what inputs to fetch, and how to register outputs. The runner handles orchestration, so application developers can focus on their processing logic.
Workflow Phases#
An application execution proceeds through four phases:
Dispatch – The dispatch command receives a workunit reference and a work directory. It creates one or more “chunk” subdirectories, each containing an
inputs.ymlfile that declares the inputs needed for that chunk. Achunks.ymlfile in the work directory lists all chunk paths. Ifchunks.ymlis not created by the dispatch command, the runner auto-discovers chunks by scanning for subdirectories containinginputs.yml.Inputs – For each chunk, the runner reads the chunk’s
inputs.ymland downloads or prepares the declared input files into the chunk directory. Inputs can come from B-Fabric (resources, datasets, annotations, order FASTAs), from remote files (SSH, local), or be generated from static content.Process – The process command is called with the chunk directory as its argument. This is where the actual computational work happens. The command can be a direct executable, a Docker container, or a managed Python environment.
Outputs / Collect – After processing, an optional collect command generates an
outputs.ymlfile describing results to register. If no collect command is defined, the process command is expected to produceoutputs.ymldirectly. The runner then registers outputs back into B-Fabric (copying resources to storage, saving datasets, creating links).
Key Components#
Runner (bfabric_app_runner.app_runner.runner.Runner) orchestrates all four phases. It holds
a reference to an AppVersion (which defines the commands) and a Bfabric client instance.
AppSpec and AppVersion define the application configuration. An AppSpec contains a
BfabricAppSpec (runner version, optional workflow template) and a list of AppVersion entries.
Each AppVersion specifies a version string and a CommandsSpec with dispatch, process, and
optional collect commands. App specs support Mako templating with variables like ${app.id},
${app.name}, and ${app.version}.
Command executors run the actual commands. Four command types are available:
CommandShell– runs a command string split by spaces (deprecated in favor ofexec)CommandExec– runs a command split byshlex.split, with optional environment variables and PATH prependingCommandDocker– runs a command inside a Docker or Podman container, with configurable mounts, environment, networking, and custom argumentsCommandPythonEnv– provisions a Python virtual environment from a pylock file and runs a command inside it, with optional extra local dependencies and caching
Resolver resolves input specifications into concrete file references. Each input type has a
dedicated resolver. The result is a ResolvedInputs object containing ResolvedFile,
ResolvedStaticFile, and ResolvedDirectory entries that can be prepared (downloaded/written) or
cleaned.
Output registration reads outputs.yml and registers results into B-Fabric. Three output types
are supported: CopyResourceSpec (SCP a file to storage), SaveDatasetSpec (register a CSV/TSV
as a dataset), and SaveLinkSpec (attach a URL link to an entity).
How Chunks Work#
The chunking mechanism allows applications to split work into independent units. During dispatch,
the application creates subdirectories under the work directory – one per chunk. Each chunk
directory contains its own inputs.yml (and after processing, outputs.yml).
The ChunksFile model tracks which chunks exist. It can be written explicitly by the dispatch
command as chunks.yml, or the runner auto-discovers chunks by scanning for subdirectories
that contain an inputs.yml file.
The runner processes chunks sequentially: for each chunk, it prepares inputs, runs the process command, runs the optional collect command, and registers outputs.
Action Types#
The CLI exposes the workflow phases as individual actions:
ActionDispatch– runs only the dispatch phaseActionInputs– prepares inputs for all chunksActionProcess– runs the process command for all chunksActionOutputs– runs collect and registers outputs for all chunksActionRun– runs all phases end-to-end
These actions allow developers to test individual phases during development.
Dispatch Strategies#
The dispatch module provides reusable dispatch strategies for common patterns:
DispatchIndividualResources – creates one chunk per input resource, useful when each resource should be processed independently
DispatchSingleResourceFlow – creates a single chunk for all resources, useful when resources are processed together
DispatchSingleDatasetFlow – creates a single chunk driven by a dataset, useful when a dataset defines the processing parameters
These strategies handle the creation of chunk directories, writing inputs.yml files, and
producing the chunks.yml manifest.
App model#
App runner activity diagram#