Experimental: Workunit Definitions#
This guide covers the experimental workunit definition system for defining and managing B-Fabric workunits using YAML files.
Warning
Experimental features may change or be removed in future versions. Use at your own risk.
Overview#
Workunit definitions provide a structured way to define workunit execution details and registration information. This is particularly useful for:
Developing and testing B-Fabric applications
Defining workflow parameters in a declarative way
Persisting workunit configurations in version-controlled YAML files
Separating workunit logic from execution code
API Reference#
WorkunitExecutionDefinition#
- class bfabric.experimental.workunit_definition.WorkunitExecutionDefinition(*, raw_parameters: dict[str, str | None], dataset: int | None = None, resources: list[int] = [])#
Bases:
BaseModelDefines the execution details of a workunit, i.e. the inputs necessary to compute the results, but not the final details on how to register the results in B-Fabric.
- dataset: int | None#
Input dataset (for dataset-flow applications)
- either_dataset_or_resources() WorkunitExecutionDefinition#
Validates that either dataset or resources are provided.
- classmethod from_workunit(workunit: Workunit) WorkunitExecutionDefinition#
Loads the workunit execution definition from the provided B-Fabric workunit.
- model_config = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- mutually_exclusive_dataset_resources() WorkunitExecutionDefinition#
Validates that dataset and resources are mutually exclusive.
- raw_parameters: dict[str, str | None]#
The parameters passed to the workunit, in their raw form, i.e. everything is a string or None.
- resources: list[int]#
Input resources (for resource-flow applications
WorkunitRegistrationDefinition#
- class bfabric.experimental.workunit_definition.WorkunitRegistrationDefinition(*, application_id: int, application_name: Annotated[str, BeforeValidator(func=path_safe_name, json_schema_input_type=PydanticUndefined)], workunit_id: int, workunit_name: Annotated[str, BeforeValidator(func=path_safe_name, json_schema_input_type=PydanticUndefined)], container_id: int, container_type: Literal['project', 'order'], storage_id: int, storage_output_folder: Path, user_id: int | None = None)#
Bases:
BaseModelDefines the B-Fabric registration details of a workunit.
- application_id: int#
The ID of the executing application.
- application_name: PathSafeStr#
The name of the executing application.
- container_id: int#
The ID of the container.
- container_type: Literal['project', 'order']#
The type of the container.
- classmethod from_workunit(workunit: Workunit) WorkunitRegistrationDefinition#
Loads the workunit registration definition from the provided B-Fabric workunit.
- model_config = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- storage_id: int#
The ID of the storage.
- storage_output_folder: Path#
The output folder in the storage.
- user_id: int | None#
The ID of the user who created the workunit, if available.
- workunit_id: int#
The ID of the workunit.
- workunit_name: PathSafeStr#
The name of the workunit.
WorkunitDefinition#
- class bfabric.experimental.workunit_definition.WorkunitDefinition(*, execution: WorkunitExecutionDefinition, registration: WorkunitRegistrationDefinition | None)#
Bases:
BaseModelDefines a workunit, including details on how to execute it and where to register it. This class provides a simple way for developers to persist and run workunit definitions from YAML files, as well as loading the same from B-Fabric workunits. This abstraction ensures easier development and testing of applications.
- execution: WorkunitExecutionDefinition#
Execution details of the workunit.
- classmethod from_ref(workunit: Path | int, client: Bfabric, cache_file: Path | None = None) WorkunitDefinition#
Loads the workunit definition from the provided reference, which can be a path to a YAML file, or a workunit ID.
If the cache file is provided and exists, it will be loaded directly instead of resolving the reference. Otherwise, the result will be cached to the provided file. :param workunit: The workunit reference, which can be a path to a YAML file, or a workunit ID. :param client: The B-Fabric client to use for resolving the workunit. :param cache_file: The path to the cache file, if any.
- classmethod from_workunit(workunit: Workunit) WorkunitDefinition#
Loads the workunit definition from the provided B-Fabric workunit.
- classmethod from_yaml(path: Path) WorkunitDefinition#
Loads the workunit definition from the provided path.
- model_config = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- registration: WorkunitRegistrationDefinition | None#
Registration details of the workunit.
- to_yaml(path: Path) None#
Writes the workunit definition to the provided path.
Loading Workunit Definitions#
From YAML File#
Create a YAML file with workunit definition:
# workunit.yml
execution:
dataset: 12345
raw_parameters:
param1: "value1"
param2: "value2"
registration:
application_id: 100
application_name: "MyApplication"
workunit_id: 200
workunit_name: "ProcessingJob"
container_id: 300
container_type: "project"
storage_id: 400
storage_output_folder: "output/results"
user_id: 1
Load it in Python:
from pathlib import Path
from bfabric import Bfabric
from bfabric.experimental.workunit_definition import WorkunitDefinition
client = Bfabric.connect()
# Load from YAML file
workunit_def = WorkunitDefinition.from_ref(workunit=Path("workunit.yml"), client=client)
print(f"Workunit: {workunit_def.workunit_name}")
print(f"Application: {workunit_def.registration.application_name}")
print(f"Dataset: {workunit_def.execution.dataset}")
From Workunit ID#
Load definition from a workunit already in B-Fabric:
from bfabric import Bfabric
from bfabric.experimental.workunit_definition import WorkunitDefinition
client = Bfabric.connect()
# Load from workunit ID (resolves from B-Fabric)
workunit_def = WorkunitDefinition.from_ref(workunit=200, client=client)
print(f"Loaded definition for workunit #{workunit_def.registration.workunit_id}")
With Caching#
Cache the loaded definition to avoid repeated queries:
cache_file = Path("cache/workunit_def.yml")
# Loads from cache if available, otherwise queries B-Fabric
workunit_def = WorkunitDefinition.from_ref(
workunit=Path("workunit.yml"), client=client, cache_file=cache_file
)
Exporting Workunit Definitions#
Export a workunit definition to YAML:
from pathlib import Path
from bfabric import Bfabric
client = Bfabric.connect()
# Load from workunit ID
workunit_def = WorkunitDefinition.from_ref(workunit=200, client=client)
# Save to YAML
output_file = Path("definitions/workunit_200.yml")
workunit_def.to_yaml(output_file)
print(f"Saved definition to {output_file}")
Complete Example#
YAML Definition File#
# proteomics_analysis.yml
execution:
dataset: 12345
raw_parameters:
fdr_threshold: "0.01"
min_peptide_length: "7"
database: "uniprot_human"
registration:
application_id: 100
application_name: "ProteomicsProcessor"
workunit_id: 200
workunit_name: "ProteomicsAnalysis"
container_id: 300
container_type: "project"
storage_id: 400
storage_output_folder: "proteomics/output"
user_id: 1
Python Usage#
from pathlib import Path
from bfabric import Bfabric
from bfabric.experimental.workunit_definition import WorkunitDefinition
client = Bfabric.connect()
# 1. Load definition from YAML with caching
yaml_file = Path("workunits/proteomics_analysis.yml")
workunit_def = WorkunitDefinition.from_ref(
workunit=yaml_file, client=client, cache_file=Path("cache/workunit_defs.yml")
)
# 2. Access components
execution = workunit_def.execution
registration = workunit_def.registration
print(f"Workunit: {registration.workunit_name}")
print(f"Application: {registration.application_name}")
print(f"Input Dataset: {execution.dataset}")
print(f"Parameters: {execution.raw_parameters}")
# 3. Export definition (e.g., for version control)
output_file = Path("definitions/proteomics_analysis_export.yml")
workunit_def.to_yaml(output_file)
Best Practices#
Version control YAML files: Keep workunit definitions in git
Use caching: Load definitions with
cache_fileto avoid repeated API callsValidate early: Catch validation errors before execution
Document parameters: Use descriptive parameter names in
raw_parametersSeparate concerns: Keep application logic separate from workunit definitions
Next Steps#
Writing Data - Basic save and delete operations
ResultContainer API - Dataset operations