Feeder#

The bfabric-cli feeder command provides feeder operations, primarily for creating importresources from files.

Overview#

bfabric-cli feeder --help

Available subcommands:

Subcommand	Purpose
`create-importresource`	Create importresources for files in a storage

Creating Importresources#

Create importresources for one or more files in a B-Fabric storage.

Basic Usage#

bfabric-cli feeder create-importresource [STORAGE_ID] [FILES]...

Parameters#

Parameter	Required	Description
`storage_id`	Yes	ID of the target storage
`files`	Yes	One or more file paths to create importresources for

Examples#

Create a single importresource:

bfabric-cli feeder create-importresource 1 /path/to/data/file.raw

Create importresources for multiple files:

bfabric-cli feeder create-importresource 1 \
    /path/to/data/file1.raw \
    /path/to/data/file2.raw \
    /path/to/data/file3.raw

Use glob pattern for multiple files:

bfabric-cli feeder create-importresource 1 /path/to/data/*.raw

What It Does#

The command:

Validates files - Checks that the specified files exist
Parses file paths - Analyzes the path structure using the storage’s path convention
Computes file metadata - Calculates MD5 checksum, file size, file date
Creates importresources - Creates importresource entities in B-Fabric for each file

Path Convention (CompMS)#

For CompMS (Mass Spectrometry) data, the command uses the PathConventionCompMS parser, which expects files to follow a specific directory structure:

/storage_root/
    ├── application_name/
    │   ├── container_id/
    │   │   ├── sample_id/
    │   │   │   └── file.raw

The parser extracts:

Application name - From the directory name
Container ID - From the container directory
Sample ID - Optional, if present
Relative path - Path relative to the storage root

Output#

The command provides feedback on:

Successful creations: “Importresource X created for file /path/to/file.raw”
Updates: “Importresource X updated for file /path/to/file.raw” (if an importresource already exists)
Errors: “Application Y not found in B-Fabric. Skipping file /path/to/file.raw”

Workflow Examples#

Initial Data Ingestion#

# 1. First, verify the storage exists
bfabric-cli api read storage id 1

# 2. Create importresources for new files
bfabric-cli feeder create-importresource 1 /data/2025-01/*.raw

Monitoring File Addition#

# Create a script to monitor for new files
#!/bin/bash
# check_new_files.sh
STORAGE_ID=1
DATA_DIR="/data/incoming"

find "$DATA_DIR" -name "*.raw" -type f | while read file; do
    echo "Processing $file..."
    bfabric-cli feeder create-importresource $STORAGE_ID "$file"
done

Batch Processing Multiple Storages#

# Process files across multiple storages
for storage_id in 1 2 3; do
    bfabric-cli feeder create-importresource $storage_id /data/storage_$storage_id/*.raw
done

Finding Storage Information#

Before creating importresources, verify your storage configuration:

List All Storages#

bfabric-cli api read storage --limit 20

Show Specific Storage#

bfabric-cli api read storage id 1

Check Storage Path Convention#

The storage information will show:

Storage ID and name
Base URL/path
Path convention type (e.g., CompMS)

Working with Importresources#

After creating importresources, you can work with them:

List Importresources#

# List all importresources
bfabric-cli api read importresource --limit 50

# Filter by storage
bfabric-cli api read importresource storageid 1

# Filter by date
bfabric-cli api read importresource createdafter 2024-12-01 --limit 20

Check Importresource Details#

# Show specific importresource
bfabric-cli api read importresource id 12345

Tips and Best Practices#

Verify Files Before Processing#

# Check files exist before creating importresources
ls -lh /data/incoming/*.raw

# Verify file integrity
md5sum /data/incoming/*.raw

Use Absolute Paths#

# Use absolute paths to avoid ambiguity
bfabric-cli feeder create-importresource 1 /full/path/to/data/file.raw

Process in Batches#

# For large numbers of files, process in batches
find /data/incoming -name "*.raw" -type f | head -100 | while read file; do
    bfabric-cli feeder create-importresource 1 "$file"
done

Monitor for Errors#

# Capture and review errors
bfabric-cli feeder create-importresource 1 /data/*.raw 2> errors.log

# Review any failures
grep "error\|Error\|ERROR" errors.log

Test on Small Batch First#

# Test with a few files before processing everything
bfabric-cli feeder create-importresource 1 /data/test/*.raw

# If successful, process the full batch
bfabric-cli feeder create-importresource 1 /data/production/*.raw

Common Issues#

Storage Not Found#

Error: Storage with ID X not found

Solution: Verify the storage exists:

bfabric-cli api read storage id <storage-id>

Files Do Not Exist#

Error: Files /path/to/file1.raw, /path/to/file2.raw do not exist

Solution: Check file paths and permissions:

ls -la /path/to/

Application Not Found#

Error: Application X not found in B-Fabric. Skipping file /path/to/file.raw

Solution: The application derived from the path doesn’t exist in B-Fabric. Options:

Create the application in B-Fabric
Rename the directory to match an existing application
Verify the path convention is correct

# Check available applications
bfabric-cli api read application

Path Convention Mismatch#

Error: Files don’t follow the expected path structure

Solution: Ensure files are organized according to the storage’s path convention:

# Check storage configuration
bfabric-cli api read storage id <storage-id>

# Verify file structure
tree /data/

Integration with Data Ingestion Workflows#

The feeder command is typically used as part of a larger data ingestion pipeline:

File Transfer: Data is transferred to the storage location
Validation: File integrity is verified (checksums, sizes)
Importresource Creation: Feeder command creates importresources
Import Process: B-Fabric imports the data based on importresources
Sample Creation: Associated samples are created/updated
Analysis: Data becomes available for analysis

Example Ingestion Pipeline#

#!/bin/bash
# ingest_data.sh

STORAGE_ID=1
SOURCE_DIR="/data/incoming"
PROCESSED_DIR="/data/processed"

# 1. Validate files
echo "Validating files..."
for file in "$SOURCE_DIR"/*.raw; do
    if [ ! -f "$file" ]; then
        echo "Error: $file does not exist"
        exit 1
    fi
done

# 2. Create importresources
echo "Creating importresources..."
bfabric-cli feeder create-importresource $STORAGE_ID "$SOURCE_DIR"/*.raw

# 3. Move to processed directory
echo "Moving files to processed..."
mv "$SOURCE_DIR"/*.raw "$PROCESSED_DIR/"

echo "Ingestion complete!"

Feeder

Contents