NebiusForge
Model CatalogTrainingAgentsDocsStatusAccountSign in
← Back to training catalog
black-forest-labs-flux-1-dev·General·black-forest-labs-flux-1-dev-license

FLUX.1 Dev

Fine-tune FLUX.1 Dev as a LoRA from your own image bucket. Compare measured GPU performance, open a preloaded Serverless Job, choose your dataset and output buckets, and start training in your Nebius account.

Fine-tune workloadsflux-lora
Start training job ↓
Best throughput
1.54 img/s

Fastest measured GPU: B200.

Parameters
12.0B params
GPUs benchmarked
6

GPU types with results or in progress.

Training performance

Per-GPU fine-tune throughput and time-per-step, measured on Forge GPUs. Pick a target before you start; cells still being measured show as in progress.

GPURegionWorkloadStatusThroughputTime / stepFLOP util.
B200fastestus-central1flux-loraMeasured1.54 img/s651 ms5.0%
B300uk-south1flux-loraMeasured1.49 img/s672 ms4.4%
H100eu-north1flux-loraMeasured

Train in your account

Serverless training

Start a training job

Open Nebius with the training image, GPU preset, and command preloaded. Then choose your dataset bucket and output bucket in your account.

Start Serverless training job ↗Endpoint after training ↗
1

Upload data

Put images or records in your Object Storage bucket.

2

Open job

Use the preloaded Serverless Job form.

3

Start training

Select your dataset and output bucket, then run it.

License: FLUX.1 dev Non-Commercial License. Good ecosystem support, but users need accepted Hugging Face access and commercial BFL terms.

Input data

Image folder with .jpg/.jpeg/.png files and optional same-name .txt captions

s3://my-bucket/flux-subject-lora/
  customtoken_001.jpg
  customtoken_001.txt  # "photo of customtoken person, studio portrait, natural skin texture"
  customtoken_002.png
  customtoken_002.txt  # "customtoken person in everyday clothing, outdoor natural light"
  customtoken_003.jpeg

Captions are optional for image LoRA. If filenames start with a custom token, the training command can infer it automatically.

Advanced details: CLI, image, tracking, agent checks

Accepted inputs

  • Object Storage bucket mounted as /workspace/dataset with image files.
  • Same-name .txt captions for each image. Use filenames like customtoken_001.jpg so the trainer can infer the custom trigger token.
  • Optional FORGE_TRAIN_TRIGGER_WORD override when the token should not be inferred from filenames.
  • Optional W&B run tracking through WANDB_* environment variables.

Outputs

  • LoRA adapter weights, usually .safetensors, in your output bucket.
  • Generated sample images at the configured sample interval.
  • Training config and logs; W&B run history when WANDB_* is configured.

Optional tracking

WANDB_API_KEY (secret): Optional W&B API key. Store it in MysteryBox and pass it with --env-secret.

WANDB_PROJECT: Optional W&B project name for training progress and sample tracking.

WANDB_RUN_NAME: Optional W&B run name, e.g. flux-klein-subject-lora.

FORGE_TRAIN_TRIGGER_WORD: Optional FLUX trigger token override. When unset, the trainer infers it from image filenames such as customtoken_001.jpg.

HF_TOKEN (secret): Required for gated BFL checkpoints after you accept the model terms on Hugging Face.

Readiness checks

job -> output -> endpoint

Serverless job URL

ready

Nebius Jobs create link is generated with training image, GPU platform, preset, command, and dataset mount defaults.

Open link ↗

Serverless endpoint URL

verify after run

Endpoint create link preloads the serving image and output mount; after training, attach the produced adapter/checkpoint and run a health check plus representative sample request.

Open link ↗

Input data guidance

ready

Dataset format, accepted input methods, and an example are present: Image folder with .jpg/.jpeg/.png files and optional same-name .txt captions.

Agent handoff

ready

Agent steps cover job creation, monitoring, output verification, endpoint smoke test, and user-facing closeout.

Full instructions

  1. 11. This fine-tune runs in YOUR Nebius account on YOUR data. You own the produced weights and you pay for the GPUs — Forge does not run or bill this job.
  2. 22. Authenticate the Nebius CLI to your account and project: `nebius iam whoami` to confirm, `nebius iam project list` to find your project/parent ID.
  3. 33. Put your training data in your own bucket. Expected format: Image folder with .jpg/.jpeg/.png files and optional same-name .txt captions. Replace `s3://YOUR-BUCKET/your-training-data/` and `s3://YOUR-BUCKET/fine-tuned-output/` with bucket paths you own; Forge never sees your data or weights.
  4. 44. Submit the training job in your account: run the `command_template` below as a Nebius Jobs workload, e.g. `nebius ai job create --parent-id <YOUR_PROJECT_ID> --image docker.io/ostris/aitoolkit@sha256:220d85e443589c6b52521c594a2d9f052d733afe360966d24bb8a5fe853745f7 --volume <DATASET_S3>:/workspace/dataset:ro --volume <OUTPUT_S3>:/workspace/output:rw ...` — or open the console link below, which preloads the image, GPU preset, and fine-tune command.
  5. 55. The base model is `black-forest-labs/FLUX.1-dev` (fine-tuned with the `flux-lora` workload). Pick a GPU/preset that matches the fine-tune performance shown for this model in the training catalog.
  6. 66. Optional W&B tracking: create a W&B key in MysteryBox and pass `WANDB_API_KEY` via `--env-secret`; pass `WANDB_PROJECT` and `WANDB_RUN_NAME` via `--env` if you want named runs.
  7. 77. Monitor the run in your account with `nebius ai job list --parent-id <YOUR_PROJECT_ID>` and `nebius ai job get <JOB_ID>`; the fine-tuned weights and samples land in your output bucket when it completes.
  8. 88. After the job completes, create a Serverless Endpoint from the endpoint console link when available, mount or attach the output adapter/checkpoint, and verify a health check plus one representative sample request before considering the model ready for the user.

Agent instructions

  1. 01Fetch `/v1/training/models/{slug}` and use `jobs_handoff`; do not infer bucket paths, image refs, or commands from memory.
  2. 02Open or validate `jobs_handoff.console_url`; it must point to the Nebius Serverless Job create page and preload the training image, GPU preset, and command.
  3. 03Confirm the user supplied a dataset in their own bucket matching: Image folder with .jpg/.jpeg/.png files and optional same-name .txt captions. Replace `s3://YOUR-BUCKET/your-training-data/` and `s3://YOUR-BUCKET/fine-tuned-output/` with user-owned bucket URIs.
  4. 04Create the Nebius AI Job with image `docker.io/ostris/aitoolkit@sha256:220d85e443589c6b52521c594a2d9f052d733afe360966d24bb8a5fe853745f7`, base model `black-forest-labs/FLUX.1-dev`, and workload `flux-lora`; pass optional W&B/HF secrets only through the user's approved secret mechanism.
  5. 05Monitor `nebius ai job get` and `nebius ai logs` until the run completes; verify that adapter/checkpoint files, samples, and logs exist in the output bucket.
  6. 06After the job writes weights to the output bucket, open `jobs_handoff.endpoint_console_url`, attach or mount that output, create the Serverless Endpoint in the user's project, and run a health check plus one representative sample request before telling the user it is ready.
  7. 07Return concise user instructions: where their dataset should live, where outputs were written, the endpoint URL/status, and how to reproduce or tune the run.
Nebius AI Jobs CLI
# Runs in YOUR Nebius account, on YOUR data — you own the weights and
# you pay for the GPUs. Forge does not run this job; this just starts it.
# Uses Nebius AI Jobs CLI (`nebius ai job create`).
# Fill in these customer-owned values before running:
#   FORGE_NEBIUS_PROJECT_ID: your Nebius project / parent ID.
#   FORGE_TRAIN_PLATFORM/FORGE_TRAIN_PRESET: pick GPU resources available in your project.
#   FORGE_TRAIN_DATASET_URI: point this at your bucket, e.g. s3://my-bucket/train.jsonl.
#   FORGE_TRAIN_OUTPUT_URI: bucket path where trained weights are written.
# Verify the command starts a user-data fine-tune, not a benchmark/probe.
# After completion: verify output artifacts, create the Serverless Endpoint,
# then run endpoint health and one representative sample request.
# Optional training environment:
#   --env-secret WANDB_API_KEY=... Optional W&B API key. Store it in MysteryBox and pass it with --env-secret.
#   --env WANDB_PROJECT=... Optional W&B project name for training progress and sample tracking.
#   --env WANDB_RUN_NAME=... Optional W&B run name, e.g. flux-klein-subject-lora.
#   --env FORGE_TRAIN_TRIGGER_WORD=... Optional FLUX trigger token override. When unset, the trainer infers it from image filenames such as customtoken_001.jpg.
#   --env-secret HF_TOKEN=... Required for gated BFL checkpoints after you accept the model terms on Hugging Face.
export FORGE_NEBIUS_PROJECT_ID="YOUR_PROJECT_ID"
export FORGE_TRAIN_PLATFORM="YOUR_GPU_PLATFORM"
export FORGE_TRAIN_PRESET="YOUR_GPU_PRESET"
export FORGE_TRAIN_JOB_NAME="forge-fine-tune"
export FORGE_TRAIN_DATASET_URI="s3://my-bucket/train.jsonl"
export FORGE_TRAIN_OUTPUT_URI="s3://my-bucket/outputs/"
FORGE_TRAIN_COMMAND='set -eu
mkdir -p /workspace/config /workspace/dataset /workspace/output
# Dataset source: '"$FORGE_TRAIN_DATASET_URI"' (mounted at /workspace/dataset by the Jobs CLI).
# Output destination: '"$FORGE_TRAIN_OUTPUT_URI"' (mounted at /workspace/output by the Jobs CLI).
DATASET_IMAGE_ROOT="/workspace/dataset"
if [ -d /workspace/dataset/target ]; then DATASET_IMAGE_ROOT="/workspace/dataset/target"; fi
sanitize_trigger_word() {
  printf '\''%s'\'' "$1" | tr '\''[:upper:]'\'' '\''[:lower:]'\'' | sed -E '\''s/[^a-z0-9_-]+/-/g; s/^-+|-+$//g'\''
}
TRIGGER_WORD="$(sanitize_trigger_word "${FORGE_TRAIN_TRIGGER_WORD:-}")"
if [ -z "$TRIGGER_WORD" ]; then
  FIRST_IMAGE="$(find "$DATASET_IMAGE_ROOT" -maxdepth 2 -type f \( -iname '\''*.jpg'\'' -o -iname '\''*.jpeg'\'' -o -iname '\''*.png'\'' \) | sort | head -n 1 || true)"
  if [ -n "$FIRST_IMAGE" ]; then
    STEM="$(basename "$FIRST_IMAGE")"
    STEM="${STEM%.*}"
    STEM="$(printf '\''%s'\'' "$STEM" | sed -E '\''s/([_-]?[0-9]+)$//'\'')"
    TRIGGER_WORD="$(sanitize_trigger_word "$STEM")"
  fi
fi
if [ -z "$TRIGGER_WORD" ]; then TRIGGER_WORD="subject"; fi
export FORGE_TRAIN_TRIGGER_WORD="$TRIGGER_WORD"
echo "Using FLUX trigger token: ${FORGE_TRAIN_TRIGGER_WORD}"
for candidate in /app/ai-toolkit /workspace/ai-toolkit /root/ai-toolkit /ai-toolkit /app /workspace; do
  if [ -f "$candidate/run.py" ]; then cd "$candidate"; break; fi
done
test -f run.py
cat > /workspace/config/forge-flux-lora.yaml <<YAML
---
job: extension
config:
  name: "forge_black_forest_labs_flux_1_dev_lora"
  process:
    - type: '\''sd_trainer'\''
      training_folder: "/workspace/output"
      performance_log_every: 50
      device: cuda:0
      trigger_word: "${FORGE_TRAIN_TRIGGER_WORD}"
      network:
        type: "lora"
        linear: 16
        linear_alpha: 16
      save:
        dtype: float16
        save_every: 250
        max_step_saves_to_keep: 4
        push_to_hub: false
      datasets:
        - folder_path: "/workspace/dataset"
          caption_ext: "txt"
          caption_dropout_rate: 0.05
          shuffle_tokens: false
          cache_latents_to_disk: true
          resolution: [512, 768, 1024]
      train:
        batch_size: 1
        steps: 2000
        gradient_accumulation_steps: 1
        train_unet: true
        train_text_encoder: false
        gradient_checkpointing: true
        noise_scheduler: "flowmatch"
        optimizer: "adamw8bit"
        lr: 1e-4
        ema_config:
          use_ema: true
          ema_decay: 0.99
        dtype: bf16
      model:
        name_or_path: "black-forest-labs/FLUX.1-dev"
        is_flux: true
        quantize: true
      sample:
        sampler: "flowmatch"
        sample_every: 250
        width: 1024
        height: 1024
        prompts:
          - "photo of ${FORGE_TRAIN_TRIGGER_WORD} person, studio portrait, natural skin texture"
          - "${FORGE_TRAIN_TRIGGER_WORD} person in everyday clothing, outdoor natural light"
        neg: ""
        seed: 42
        walk_seed: true
        guidance_scale: 4
        sample_steps: 20
meta:
  name: "FLUX.1 dev LoRA"
  version: '\''1.0'\''
YAML
python run.py /workspace/config/forge-flux-lora.yaml'

nebius ai job create \
  --parent-id "$FORGE_NEBIUS_PROJECT_ID" \
  --name "$FORGE_TRAIN_JOB_NAME" \
  --platform "$FORGE_TRAIN_PLATFORM" \
  --preset "$FORGE_TRAIN_PRESET" \
  --image 'docker.io/ostris/aitoolkit@sha256:220d85e443589c6b52521c594a2d9f052d733afe360966d24bb8a5fe853745f7' \
  --volume "$FORGE_TRAIN_DATASET_URI":/workspace/dataset:ro \
  --volume "$FORGE_TRAIN_OUTPUT_URI":/workspace/output:rw \
  --container-command "/bin/sh" \
  --args "-lc \"$FORGE_TRAIN_COMMAND\""
Training image

docker.io/ostris/aitoolkit@sha256:220d85e443589c6b52521c594a2d9f052d733afe360966d24bb8a5fe853745f7

1.21 img/s
828 ms
9.0%
H200eu-north1flux-loraMeasured1.29 img/s773 ms9.6%
H200eu-north2flux-loraMeasured1.27 img/s787 ms9.5%
H200us-central1flux-loraMeasured1.3 img/s767 ms9.7%
L40Seu-north1flux-loraMeasured0.79 img/s1.26 s16.2%
RTX6000us-central1flux-loraFailed———
How we measure

Throughput is reported in the model’s relevant unit: tokens/sec for text fine-tunes and images/sec for image LoRA runs. Time/step is the wall-clock time for one optimizer step. FLOP utilization compares the benchmark’s estimated achieved TFLOP/s with the GPU peak TFLOP/s used for that precision mode. All are measured on Forge GPUs for the listed workload (LoRA, SFT, FLUX LoRA, …). Each model is onboarded only after a real benchmark run — cells marked In progress are still being measured and show no number until verified. The fastest chip marks the highest measured throughput.