Fine-tune OLMo-2 1B Instruct from your own dataset bucket. Compare measured GPU performance, open a preloaded Serverless Job, choose your dataset and output buckets, and start training in your Nebius account.
Fastest measured GPU: H100.
GPU types with results or in progress.
Per-GPU fine-tune throughput and time-per-step, measured on Forge GPUs. Pick a target before you start; cells still being measured show as in progress.
| GPU | Region | Workload | Status | Throughput | Time / step | FLOP util. |
|---|---|---|---|---|---|---|
| H100fastest | eu-north1 | lora | Measured | 8,679 tok/s | 59 ms | — |
| L40S | eu-north1 | lora | Measured | 6,617 tok/s | 77 ms | — |
Throughput is reported in the model’s relevant unit: tokens/sec for text fine-tunes and images/sec for image LoRA runs. Time/step is the wall-clock time for one optimizer step. FLOP utilization compares the benchmark’s estimated achieved TFLOP/s with the GPU peak TFLOP/s used for that precision mode. All are measured on Forge GPUs for the listed workload (LoRA, SFT, FLUX LoRA, …). Each model is onboarded only after a real benchmark run — cells marked are still being measured and show no number until verified. The fastest chip marks the highest measured throughput.
Serverless training
Open Nebius with the training image, GPU preset, and command preloaded. Then choose your dataset bucket and output bucket in your account.
Upload data
Put images or records in your Object Storage bucket.
Open job
Use the preloaded Serverless Job form.
Start training
Select your dataset and output bucket, then run it.
JSONL chat or prompt/completion training data
s3://my-bucket/llm-lora/train.jsonl
{"messages":[{"role":"user","content":"Summarize this ticket"},{"role":"assistant","content":"Short summary..."}]}
{"prompt":"Classify this support request","completion":"billing"}Captions are optional for image LoRA. If filenames start with a custom token, the training command can infer it automatically.
Serverless job URL
readyNebius Jobs create link is generated with training image, GPU platform, preset, command, and dataset mount defaults.
Open link ↗Serverless endpoint URL
verify after runEndpoint create link preloads the serving image and output mount; after training, attach the produced adapter/checkpoint and run a health check plus representative sample request.
Open link ↗Input data guidance
readyDataset format, accepted input methods, and an example are present: JSONL chat or prompt/completion training data.
Agent handoff
readyAgent steps cover job creation, monitoring, output verification, endpoint smoke test, and user-facing closeout.
# Runs in YOUR Nebius account, on YOUR data — you own the weights and # you pay for the GPUs. Forge does not run this job; this just starts it. # Uses Nebius AI Jobs CLI (`nebius ai job create`). # Fill in these customer-owned values before running: # FORGE_NEBIUS_PROJECT_ID: your Nebius project / parent ID. # FORGE_TRAIN_PLATFORM/FORGE_TRAIN_PRESET: pick GPU resources available in your project. # FORGE_TRAIN_DATASET_URI: point this at your bucket, e.g. s3://my-bucket/train.jsonl. # FORGE_TRAIN_OUTPUT_URI: bucket path where trained weights are written. # Verify the command starts a user-data fine-tune, not a benchmark/probe. # After completion: verify output artifacts, create the Serverless Endpoint, # then run endpoint health and one representative sample request. export FORGE_NEBIUS_PROJECT_ID="YOUR_PROJECT_ID" export FORGE_TRAIN_PLATFORM="YOUR_GPU_PLATFORM" export FORGE_TRAIN_PRESET="YOUR_GPU_PRESET" export FORGE_TRAIN_JOB_NAME="forge-fine-tune" export FORGE_TRAIN_DATASET_URI="s3://my-bucket/train.jsonl" export FORGE_TRAIN_OUTPUT_URI="s3://my-bucket/outputs/" FORGE_TRAIN_COMMAND='python -m forge_finetune \ --base-model allenai/OLMo-2-0425-1B-Instruct \ --method lora \ --dataset '"$FORGE_TRAIN_DATASET_URI"' # <-- point this at YOUR OWN bucket \ --output '"$FORGE_TRAIN_OUTPUT_URI"' # <-- your bucket; you own the weights' nebius ai job create \ --parent-id "$FORGE_NEBIUS_PROJECT_ID" \ --name "$FORGE_TRAIN_JOB_NAME" \ --platform "$FORGE_TRAIN_PLATFORM" \ --preset "$FORGE_TRAIN_PRESET" \ --image 'cr.eu-north1.nebius.cloud/e00h91c5sa606xfwpj/forge-finetune:training-flop-util-74f0a06c@sha256:77640f8f47850193a9cb98678a1fb95056b9e75e46050d5c948c76d6bc14eaa3' \ --volume "$FORGE_TRAIN_DATASET_URI":/workspace/dataset:ro \ --volume "$FORGE_TRAIN_OUTPUT_URI":/workspace/output:rw \ --container-command "/bin/sh" \ --args "-lc \"$FORGE_TRAIN_COMMAND\""
cr.eu-north1.nebius.cloud/e00h91c5sa606xfwpj/forge-finetune:training-flop-util-74f0a06c@sha256:77640f8f47850193a9cb98678a1fb95056b9e75e46050d5c948c76d6bc14eaa3