Experimental: Workunit Definitions#

This guide covers the experimental workunit definition system for defining and managing B-Fabric workunits using YAML files.

Warning

Experimental features may change or be removed in future versions. Use at your own risk.

Overview#

Workunit definitions provide a structured way to define workunit execution details and registration information. This is particularly useful for:

  • Developing and testing B-Fabric applications

  • Defining workflow parameters in a declarative way

  • Persisting workunit configurations in version-controlled YAML files

  • Separating workunit logic from execution code

API Reference#

WorkunitExecutionDefinition#

class bfabric.experimental.workunit_definition.WorkunitExecutionDefinition(*, raw_parameters: dict[str, str | None], dataset: int | None = None, resources: list[int] = [])#

Bases: BaseModel

Defines the execution details of a workunit, i.e. the inputs necessary to compute the results, but not the final details on how to register the results in B-Fabric.

dataset: int | None#

Input dataset (for dataset-flow applications)

either_dataset_or_resources() WorkunitExecutionDefinition#

Validates that either dataset or resources are provided.

classmethod from_workunit(workunit: Workunit) WorkunitExecutionDefinition#

Loads the workunit execution definition from the provided B-Fabric workunit.

model_config = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

mutually_exclusive_dataset_resources() WorkunitExecutionDefinition#

Validates that dataset and resources are mutually exclusive.

raw_parameters: dict[str, str | None]#

The parameters passed to the workunit, in their raw form, i.e. everything is a string or None.

resources: list[int]#

Input resources (for resource-flow applications

WorkunitRegistrationDefinition#

class bfabric.experimental.workunit_definition.WorkunitRegistrationDefinition(*, application_id: int, application_name: Annotated[str, BeforeValidator(func=path_safe_name, json_schema_input_type=PydanticUndefined)], workunit_id: int, workunit_name: Annotated[str, BeforeValidator(func=path_safe_name, json_schema_input_type=PydanticUndefined)], container_id: int, container_type: Literal['project', 'order'], storage_id: int, storage_output_folder: Path, user_id: int | None = None)#

Bases: BaseModel

Defines the B-Fabric registration details of a workunit.

application_id: int#

The ID of the executing application.

application_name: PathSafeStr#

The name of the executing application.

container_id: int#

The ID of the container.

container_type: Literal['project', 'order']#

The type of the container.

classmethod from_workunit(workunit: Workunit) WorkunitRegistrationDefinition#

Loads the workunit registration definition from the provided B-Fabric workunit.

model_config = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

storage_id: int#

The ID of the storage.

storage_output_folder: Path#

The output folder in the storage.

user_id: int | None#

The ID of the user who created the workunit, if available.

workunit_id: int#

The ID of the workunit.

workunit_name: PathSafeStr#

The name of the workunit.

WorkunitDefinition#

class bfabric.experimental.workunit_definition.WorkunitDefinition(*, execution: WorkunitExecutionDefinition, registration: WorkunitRegistrationDefinition | None)#

Bases: BaseModel

Defines a workunit, including details on how to execute it and where to register it. This class provides a simple way for developers to persist and run workunit definitions from YAML files, as well as loading the same from B-Fabric workunits. This abstraction ensures easier development and testing of applications.

execution: WorkunitExecutionDefinition#

Execution details of the workunit.

classmethod from_ref(workunit: Path | int, client: Bfabric, cache_file: Path | None = None) WorkunitDefinition#

Loads the workunit definition from the provided reference, which can be a path to a YAML file, or a workunit ID.

If the cache file is provided and exists, it will be loaded directly instead of resolving the reference. Otherwise, the result will be cached to the provided file. :param workunit: The workunit reference, which can be a path to a YAML file, or a workunit ID. :param client: The B-Fabric client to use for resolving the workunit. :param cache_file: The path to the cache file, if any.

classmethod from_workunit(workunit: Workunit) WorkunitDefinition#

Loads the workunit definition from the provided B-Fabric workunit.

classmethod from_yaml(path: Path) WorkunitDefinition#

Loads the workunit definition from the provided path.

model_config = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

registration: WorkunitRegistrationDefinition | None#

Registration details of the workunit.

to_yaml(path: Path) None#

Writes the workunit definition to the provided path.

Loading Workunit Definitions#

From YAML File#

Create a YAML file with workunit definition:

# workunit.yml

execution:
  dataset: 12345
  raw_parameters:
    param1: "value1"
    param2: "value2"

registration:
  application_id: 100
  application_name: "MyApplication"
  workunit_id: 200
  workunit_name: "ProcessingJob"
  container_id: 300
  container_type: "project"
  storage_id: 400
  storage_output_folder: "output/results"
  user_id: 1

Load it in Python:

from pathlib import Path
from bfabric import Bfabric
from bfabric.experimental.workunit_definition import WorkunitDefinition

client = Bfabric.connect()

# Load from YAML file
workunit_def = WorkunitDefinition.from_ref(workunit=Path("workunit.yml"), client=client)

print(f"Workunit: {workunit_def.workunit_name}")
print(f"Application: {workunit_def.registration.application_name}")
print(f"Dataset: {workunit_def.execution.dataset}")

From Workunit ID#

Load definition from a workunit already in B-Fabric:

from bfabric import Bfabric
from bfabric.experimental.workunit_definition import WorkunitDefinition

client = Bfabric.connect()

# Load from workunit ID (resolves from B-Fabric)
workunit_def = WorkunitDefinition.from_ref(workunit=200, client=client)
print(f"Loaded definition for workunit #{workunit_def.registration.workunit_id}")

With Caching#

Cache the loaded definition to avoid repeated queries:

cache_file = Path("cache/workunit_def.yml")

# Loads from cache if available, otherwise queries B-Fabric
workunit_def = WorkunitDefinition.from_ref(
    workunit=Path("workunit.yml"), client=client, cache_file=cache_file
)

Exporting Workunit Definitions#

Export a workunit definition to YAML:

from pathlib import Path
from bfabric import Bfabric

client = Bfabric.connect()

# Load from workunit ID
workunit_def = WorkunitDefinition.from_ref(workunit=200, client=client)

# Save to YAML
output_file = Path("definitions/workunit_200.yml")
workunit_def.to_yaml(output_file)
print(f"Saved definition to {output_file}")

Complete Example#

YAML Definition File#

# proteomics_analysis.yml

execution:
  dataset: 12345
  raw_parameters:
    fdr_threshold: "0.01"
    min_peptide_length: "7"
    database: "uniprot_human"

registration:
  application_id: 100
  application_name: "ProteomicsProcessor"
  workunit_id: 200
  workunit_name: "ProteomicsAnalysis"
  container_id: 300
  container_type: "project"
  storage_id: 400
  storage_output_folder: "proteomics/output"
  user_id: 1

Python Usage#

from pathlib import Path
from bfabric import Bfabric
from bfabric.experimental.workunit_definition import WorkunitDefinition

client = Bfabric.connect()

# 1. Load definition from YAML with caching
yaml_file = Path("workunits/proteomics_analysis.yml")
workunit_def = WorkunitDefinition.from_ref(
    workunit=yaml_file, client=client, cache_file=Path("cache/workunit_defs.yml")
)

# 2. Access components
execution = workunit_def.execution
registration = workunit_def.registration

print(f"Workunit: {registration.workunit_name}")
print(f"Application: {registration.application_name}")
print(f"Input Dataset: {execution.dataset}")
print(f"Parameters: {execution.raw_parameters}")

# 3. Export definition (e.g., for version control)
output_file = Path("definitions/proteomics_analysis_export.yml")
workunit_def.to_yaml(output_file)

Best Practices#

  1. Version control YAML files: Keep workunit definitions in git

  2. Use caching: Load definitions with cache_file to avoid repeated API calls

  3. Validate early: Catch validation errors before execution

  4. Document parameters: Use descriptive parameter names in raw_parameters

  5. Separate concerns: Keep application logic separate from workunit definitions

Next Steps#