Working with Inputs#
This guide covers how bfabric-app-runner handles input files: specification, resolution, preparation, and management.
Overview#
The input system follows a two-phase pipeline:
Resolution: Input specifications (from YAML) are converted into standardized resolved types.
Preparation: Resolved inputs are fetched, generated, or linked into the working directory.
This separation keeps the “what to process” logic independent from “how to process it”, making the system extensible and testable.
Input YAML Format#
Inputs are defined in a YAML file (typically inputs.yml) as a list of specifications:
- type: bfabric_resource
id: 12345
filename: input_data.raw
check_checksum: true
- type: static_file
content: "sample_id,condition\n1,control\n2,treated"
filename: metadata.csv
- type: bfabric_dataset
id: 6789
filename: samples.csv
separator: ","
Each entry must include a type field that determines which resolver handles it.
Input Spec Types#
bfabric_resource#
Downloads a resource file from B-Fabric.
- type: bfabric_resource
id: 12345
filename: data.raw
check_checksum: true
id(int)B-Fabric resource ID.
filename(str)Target filename in the working directory.
check_checksum(bool, optional)Verify the file checksum after download.
bfabric_dataset#
Downloads a dataset from B-Fabric as a tabular file.
- type: bfabric_dataset
id: 6789
filename: samples.csv
separator: ","
format: csv
id(int)B-Fabric dataset ID.
filename(str)Target filename.
separator(str, optional)Column separator character.
format(str, optional)Output format:
"csv"or"parquet".
bfabric_resource_archive#
Downloads a resource and extracts it as an archive.
- type: bfabric_resource_archive
id: 12345
filename: extracted_data
extract: zip
include_patterns:
- "*.mzML"
strip_root: true
check_checksum: true
id(int)B-Fabric resource ID.
filename(str)Target directory name.
extract(str)Archive format. Currently only
"zip"is supported.include_patterns(list of str, optional)Glob patterns for files to include from the archive.
exclude_patterns(list of str, optional)Glob patterns for files to exclude from the archive.
strip_root(bool, optional)Remove the root directory from extracted paths.
check_checksum(bool, optional)Verify the resource checksum.
bfabric_resource_dataset#
Downloads multiple resources referenced in a dataset column.
- type: bfabric_resource_dataset
id: 100
column: resource_id
filename: "{name}.raw"
check_checksum: true
output_dataset_filename: manifest.csv
output_dataset_file_column: local_path
id(int)B-Fabric dataset ID containing resource references.
column(str)Column name containing resource IDs.
filename(str)Filename template for downloaded resources.
include_patterns/exclude_patterns(list of str, optional)Filter which resources to download.
check_checksum(bool, optional)Verify checksums.
output_dataset_filename(str, optional)Write an output dataset mapping resources to local paths.
output_dataset_file_column(str, optional)Column name for local file paths in the output dataset.
output_dataset_only(bool, optional)Only generate the output dataset without downloading files.
bfabric_order_fasta#
Downloads FASTA data associated with an order or workunit.
- type: bfabric_order_fasta
id: 500
entity: workunit
filename: sequences.fasta
required: true
id(int)Entity ID.
entity(str)Entity type:
"workunit"or"order".filename(str)Target filename.
required(bool, optional)Whether the FASTA must exist (raises error if missing and required).
bfabric_annotation#
Downloads annotation data linking resources to samples.
- type: bfabric_annotation
annotation: resource_sample
filename: annotations.csv
separator: ","
resource_ids:
- 100
- 200
annotation(str)Annotation type. Currently
"resource_sample".filename(str)Target filename.
separator(str, required)Column separator.
resource_ids(list of int, required)Resource IDs to include.
format(str, optional)Output format.
file#
Copies or links a file from a local or SSH source.
# Local file
- type: file
source:
local: /data/reference/genome.fa
filename: genome.fa
link: true
# SSH file
- type: file
source:
ssh:
host: server.example.com
path: /data/reference/genome.fa
filename: genome.fa
checksum: abc123...
source(object)Either
{local: path}or{ssh: {host, path}}.filename(str)Target filename.
link(bool, optional)Create a symlink instead of copying (local sources only).
checksum(str, optional)Expected file checksum for verification.
static_file#
Creates a file with inline content.
- type: static_file
content: "key=value\nother=setting"
filename: config.ini
content(str or bytes)File content to write.
filename(str)Target filename.
static_yaml#
Creates a YAML file from inline structured data.
- type: static_yaml
data:
param1: 100
param2: "hello"
items:
- a
- b
filename: params.yml
data(dict or list)Data to serialize as YAML.
filename(str)Target filename.
Resolution Pipeline#
When inputs are processed, the resolver converts each spec into one of three resolved types:
ResolvedFile: A file with a source location (local or SSH).
ResolvedStaticFile: In-memory content to be written directly.
ResolvedDirectory: A directory with source location and extraction options.
The resolved inputs are then passed to the preparation phase, which fetches, copies, or writes each one into the working directory.
CLI Commands#
Prepare inputs#
Download and prepare all input files:
bfabric-app-runner inputs prepare inputs.yml [target_folder]
inputs_yamlPath to the inputs YAML file.
target_folderOptional. Working directory for prepared files (defaults to current directory).
--ssh-userSSH user for remote file access.
--filterOnly prepare inputs matching the given filename pattern.
List inputs#
Show all defined inputs and their status:
bfabric-app-runner inputs list inputs.yml [target_folder]
--checkAlso verify whether each input file exists in the target folder.
Check inputs#
Verify that all inputs are present:
bfabric-app-runner inputs check inputs.yml [target_folder]
Clean inputs#
Remove prepared input files:
bfabric-app-runner inputs clean inputs.yml [target_folder]
--filterOnly clean inputs matching the given filename pattern.
Filtering#
The --filter flag on inputs prepare and inputs clean accepts a filename pattern to selectively process inputs:
# Only prepare a specific file
bfabric-app-runner inputs prepare inputs.yml --filter "genome.fa"
# Clean specific files
bfabric-app-runner inputs clean inputs.yml --filter "*.raw"
Validating Input Specs#
You can validate an inputs YAML file without executing it:
bfabric-app-runner validate inputs-spec inputs.yml