NebiusForge
Model CatalogTrainingAgentsDocsStatusAccountSign in
← Back to catalog
Healthcare / Life Science·nvidia_nim·NVIDIA AI Product/NIM; NVIDIA Open Model; Llama 3.3 Community License·Added May 25, 2026

Llama 3.1 Nemotron Nano 8B Healthcare Text2SQL

NVIDIA's Llama 3.1 Nemotron Nano 8B Healthcare Text2SQL NIM translates natural-language healthcare analytics questions plus DDL into SQL.

text, schema→sql, text
healthcareclinical-analyticstext-to-sqlllm+5
Try in playground ↓Deploy Serverless ↓Open nvidia_nim source ↗API docs
Selected targetH200 in eu-north2Using your requested verified target. Playground and API docs links stay pinned to it until you choose another GPU or region.
VRAM needed
41.0 GB

Observed working set on a supported GPU.

API route
POST /v1/chat/completions
Weights dtype
BF16
Pulled image size
9.6 GB

After the last request a backend stays warm on its GPU for about 15 minutes, then frees the GPU. The next request triggers a fresh cold start.

Status
cold
Not running.
API target

nvidia-healthcare-text2sql version nim-1-15-1-candidate on H200 in eu-north2

Use the model field with the OpenAI-compatible SDK or API docs curl snippets.

Open docsTry target
API route
/v1/chat/completions
HTTP method
POST
Model field
nvidia-healthcare-text2sql
Version field
model_version: nim-1-15-1-candidate
GPU field
gpu_type: H200
Region field
region: eu-north2
  1. 1Verify targetRuns the auth guard and selected endpoint/model/routing check.
  2. 2Validate targetConfirm the selected GPU or region is still verified, or print copyable best-target exports.
  3. 3Estimate runValidate warm and first-cold request cost before prewarming or first traffic.
  4. 4Check runtimeConfirm whether the selected version is warm or starting.
  5. 5Prewarm targetStart the selected version on its pinned GPU or region before latency-sensitive traffic.
  6. 6Open docsUse the selected target snippets for the first request.Open docs
One-block API check

Terminal-ready smoke test for this selected target.

View command
set -euo pipefail
# Forge API smoke test
# Forge selected target: route=/v1/chat/completions model=nvidia-healthcare-text2sql version=nim-1-15-1-candidate gpu=H200 region=eu-north2
FORGE_API_BASE=${FORGE_API_BASE:-'https://YOUR_FORGE_HOST'}
export MODEL_OR_FAMILY_SLUG=${MODEL_OR_FAMILY_SLUG:-'nvidia-healthcare-text2sql'}
export FORGE_MODEL_VERSION=${FORGE_MODEL_VERSION:-'nim-1-15-1-candidate'}
export FORGE_GPU_TYPE=${FORGE_GPU_TYPE:-'H200'}
export FORGE_REGION=${FORGE_REGION:-'eu-north2'}
case "${FORGE_API_KEY:-}" in
  ""|replace-with-your-forge-api-key)
    echo 'Set FORGE_API_KEY to a real Forge API key before running this snippet; browser SSO sessions are not sent to copied curl or SDK clients.' >&2
    exit 1
    ;;
esac
forge_api_url() {
  endpoint="$1"
  base="${FORGE_API_BASE%/}"
  case "$base:$endpoint" in
    */v1:/v1|*/v1:/v1/*|*/v1:/v1\?*) printf '%s%s\n' "$base" "${endpoint#/v1}" ;;
    *) printf '%s%s\n' "$base" "$endpoint" ;;
  esac
}
python3 - <<'PY' |
import json
import os

payload = {
    "model": os.environ["MODEL_OR_FAMILY_SLUG"],
    "messages": [
        {"role": "user", "content": "Write a one sentence status update."},
    ],
}
model_version = os.environ.get("FORGE_MODEL_VERSION")
if model_version:
    payload["model_version"] = model_version
gpu_type = os.environ.get("FORGE_GPU_TYPE", "H200")
if gpu_type:
    payload["gpu_type"] = gpu_type
region = os.environ.get("FORGE_REGION", "eu-north2")
if region:
    payload["region"] = region
print(json.dumps(payload))
PY
curl -sS --fail-with-body "$(forge_api_url '/v1/chat/completions')" \
  --max-time "${FORGE_REQUEST_TIMEOUT_SECONDS:-600}" \
  -X POST \
  -H "Authorization: Bearer ${FORGE_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @- | \
python3 -c 'import json, sys

data = json.load(sys.stdin)
message = (((data.get("choices") or [{}])[0].get("message") or {}).get("content"))
if message:
    print(message)
else:
    print(json.dumps(data, indent=2))'
Client fit

OpenAI SDK route · Best for chat, completions, embeddings, and clients that already support an OpenAI-compatible base URL.

Routing pinned

Copied snippets include gpu_type and region, so the first request targets this verified GPU and region. Remove those fields to let Forge choose another compatible target.

Target availability

8 free GPUs · Live capacity for H200 in eu-north2.

Request URL
https://YOUR_FORGE_HOST/v1/chat/completions
Authentication

Client auth: Set FORGE_API_KEY to a real Forge API key before running copied curl, fetch, or SDK snippets. Browser SSO only authenticates this web session.

Open Account
Authorization: Bearer $FORGE_API_KEY
Pinned setup
export FORGE_API_BASE='https://YOUR_FORGE_HOST'
export FORGE_API_KEY="${FORGE_API_KEY:-replace-with-your-forge-api-key}"
export FORGE_REQUEST_TIMEOUT_SECONDS="${FORGE_REQUEST_TIMEOUT_SECONDS:-600}"
export FORGE_API_ROUTE='/v1/chat/completions'
export FORGE_OPENAI_BASE_URL='https://YOUR_FORGE_HOST/v1'
export MODEL_OR_FAMILY_SLUG='nvidia-healthcare-text2sql'
export FORGE_MODEL_VERSION='nim-1-15-1-candidate'
export FORGE_GPU_TYPE='H200'
export FORGE_REGION='eu-north2'
Project .env

Copy these values into a local .env file when moving the selected target into an app or SDK client.

# Forge selected target: route=/v1/chat/completions model=nvidia-healthcare-text2sql version=nim-1-15-1-candidate gpu=H200 region=eu-north2
FORGE_API_BASE="https://YOUR_FORGE_HOST"
FORGE_API_ROUTE="/v1/chat/completions"
FORGE_OPENAI_BASE_URL="https://YOUR_FORGE_HOST/v1"
FORGE_API_KEY="replace-with-your-forge-api-key"
FORGE_REQUEST_TIMEOUT_SECONDS="600"
MODEL_OR_FAMILY_SLUG="nvidia-healthcare-text2sql"
FORGE_MODEL_VERSION="nim-1-15-1-candidate"
FORGE_GPU_TYPE="H200"
FORGE_REGION="eu-north2"
Project .gitignore

Add these rules before replacing the placeholder API key so local Forge secrets stay out of commits while .env.example can remain tracked.

# Forge local API secrets
.env
.env.*
!.env.example
Preflight URLs and commands
Run estimate URL
https://YOUR_FORGE_HOST/v1/models/nvidia-healthcare-text2sql/run-estimate?model_version=nim-1-15-1-candidate&gpu_type=H200&region=eu-north2
Selected target reliability
set -euo pipefail # Forge selected target: route=/v1/chat/completions model=nvidia-healthcare-text2sql version=nim-1-15-1-candidate gpu=H200 region=eu-north2 FORGE_API_BASE=${FORGE_API_BASE:-'https://YOUR_FORGE_HOST'} export MODEL_OR_FAMILY_SLUG=${MODEL_OR_FAMILY_SLUG:-'nvidia-healthcare-text2sql'} export FORGE_MODEL_VERSION=${FORGE_MODEL_VERSION:-'nim-1-15-1-candidate'} export FORGE_GPU_TYPE=${FORGE_GPU_TYPE:-'H200'} export FORGE_REGION=${FORGE_REGION:-'eu-north2'} case "${FORGE_API_KEY:-}" in ""|replace-with-your-forge-api-key) echo 'Set FORGE_API_KEY to a real Forge API key before running this snippet; browser SSO sessions are not sent to copied curl or SDK clients.' >&2 exit 1 ;; esac forge_api_url() { endpoint="$1" base="${FORGE_API_BASE%/}" case "$base:$endpoint" in */v1:/v1|*/v1:/v1/*|*/v1:/v1\?*) printf '%s%s\n' "$base" "${endpoint#/v1}" ;; *) printf '%s%s\n' "$base" "$endpoint" ;; esac } reliability_path="$(python3 -c 'import os from urllib.parse import quote, urlencode model = os.environ.get("MODEL_OR_FAMILY_SLUG", "").strip() if not model: raise SystemExit("Set MODEL_OR_FAMILY_SLUG from search or route finder output before checking reliability.") params = {} model_version = os.environ.get("FORGE_MODEL_VERSION", "").strip() if model_version: params["model_version"] = model_version gpu_type = os.environ.get("FORGE_GPU_TYPE", "").strip() if gpu_type: params["gpu_type"] = gpu_type region = os.environ.get("FORGE_REGION", "").strip() if region: params["region"] = region path = "/v1/models/" + quote(model, safe="") + "/reliability" if params: path += "?" + urlencode(params) print(path)')" curl -sS --fail-with-body "$(forge_api_url "$reliability_path")" \ --max-time "${FORGE_REQUEST_TIMEOUT_SECONDS:-600}" \ -H "Authorization: Bearer ${FORGE_API_KEY}" | \ python3 -c 'import json, shlex, sys payload = json.load(sys.stdin) print( f"{payload.get('\''slug'\'')} reliability={payload.get('\''reliability_status'\'')} " f"supported={payload.get('\''supported_rows'\'', 0)}/{payload.get('\''total_rows'\'', 0)}" ) filters = payload.get("filters") or {} if filters: print("filters: " + ", ".join(f"{key}={value}" for key, value in filters.items())) def describe_target(target): details = [] request_ms = target.get("request_ms_p50") or target.get("request_ms") if request_ms is not None: details.append(f"p50={request_ms}ms") warm_cost = target.get("estimated_warm_request_cost_usd") if warm_cost is not None: details.append(f"warm_cost_usd={warm_cost}") elif target.get("cost_per_gpu_hour_usd") is not None: details.append(f"gpu_hour_usd={target['\''cost_per_gpu_hour_usd'\'']}") success_rate = target.get("observed_success_rate") if isinstance(success_rate, (int, float)): details.append(f"success={success_rate:.0%}") return ", ".join(details) or target.get("status") or "supported" exports = {} for label, key in ( ("fastest supported", "fastest_supported_target"), ("lowest-cost supported", "lowest_cost_supported_target"), ): target = payload.get(key) or {} gpu_type = target.get("gpu_type") if not gpu_type: continue identity = (str(gpu_type), str(target.get("region") or "")) exports.setdefault(identity, {"labels": [], "target": target})["labels"].append(label) if not exports: print("No supported GPU/region target returned.", file=sys.stderr) print(json.dumps({ "status_counts": payload.get("status_counts", {}), "failure_reason_counts": payload.get("failure_reason_counts", {}), }, indent=2)) raise SystemExit(1) for (gpu_type, region), entry in exports.items(): assignments = [f"FORGE_GPU_TYPE={shlex.quote(gpu_type)}"] if region: assignments.append(f"FORGE_REGION={shlex.quote(region)}") labels = " + ".join(entry["labels"]) details = describe_target(entry["target"]) print(f"export {'\'' '\''.join(assignments)} # {labels}: {details}")'
Runtime status URL
https://YOUR_FORGE_HOST/v1/model-families/nvidia-healthcare-text2sql/status?version=nim-1-15-1-candidate
Runtime warmup command
set -euo pipefail # Forge selected target: route=/v1/chat/completions model=nvidia-healthcare-text2sql version=nim-1-15-1-candidate gpu=H200 region=eu-north2 FORGE_API_BASE=${FORGE_API_BASE:-'https://YOUR_FORGE_HOST'} export MODEL_OR_FAMILY_SLUG=${MODEL_OR_FAMILY_SLUG:-'nvidia-healthcare-text2sql'} export FORGE_MODEL_VERSION=${FORGE_MODEL_VERSION:-'nim-1-15-1-candidate'} export FORGE_GPU_TYPE=${FORGE_GPU_TYPE:-'H200'} export FORGE_REGION=${FORGE_REGION:-'eu-north2'} export FORGE_KEEP_WARM=${FORGE_KEEP_WARM:-false} case "${FORGE_API_KEY:-}" in ""|replace-with-your-forge-api-key) echo 'Set FORGE_API_KEY to a real Forge API key before running this snippet; browser SSO sessions are not sent to copied curl or SDK clients.' >&2 exit 1 ;; esac forge_api_url() { endpoint="$1" base="${FORGE_API_BASE%/}" case "$base:$endpoint" in */v1:/v1|*/v1:/v1/*|*/v1:/v1\?*) printf '%s%s\n' "$base" "${endpoint#/v1}" ;; *) printf '%s%s\n' "$base" "$endpoint" ;; esac } runtime_start_path="$(python3 -c 'import os from urllib.parse import quote model = os.environ.get("MODEL_OR_FAMILY_SLUG", "").strip() if not model: raise SystemExit("Set MODEL_OR_FAMILY_SLUG from the model picker output") print("/v1/model-families/" + quote(model, safe="") + "/start")')" python3 -c 'import json, os def env_value(name): value = os.environ.get(name, "").strip() return value or None payload = {} version = env_value("FORGE_MODEL_VERSION") if version: payload["version"] = version gpu_type = env_value("FORGE_GPU_TYPE") if gpu_type: payload["gpu_type"] = gpu_type region = env_value("FORGE_REGION") if region: payload["region"] = region keep_warm = env_value("FORGE_KEEP_WARM") payload["run_until_stopped"] = (keep_warm or "").lower() in {"1", "true", "yes", "on"} print(json.dumps(payload))' | \ curl -sS --fail-with-body "$(forge_api_url "$runtime_start_path")" \ --max-time "${FORGE_REQUEST_TIMEOUT_SECONDS:-600}" \ -X POST \ -H "Authorization: Bearer ${FORGE_API_KEY}" \ -H "Content-Type: application/json" \ -d @- | \ python3 -c 'import json, sys payload = json.load(sys.stdin) slug = payload.get("slug") or "runtime" gpu_type = payload.get("gpu_type") or "scheduler-selected GPU" region = payload.get("region") or "scheduler-selected region" startup_ms = payload.get("startup_ms") state = "cold-started" if payload.get("was_cold_start") else "already warm" suffix = f"; startup_ms={startup_ms}" if startup_ms is not None else "" print(f"{slug} {state} on {gpu_type} in {region}{suffix}; keep_warm={payload.get('\''keep_warm'\'')}")'
OpenAI base URL
https://YOUR_FORGE_HOST/v1

GPU performance

Pick a verified target for repeatable runs. Failed or pending details appear on the status hover.

Try selected target
Runs on · NIM 1.15.1
8.0 B params · weights BF16 · floor 80 GB
Target readiness

5 verified targets

5/6 verified1 awaiting probe0 unavailable
Fastest verified
B300 in uk-south1
Use in playground

Lowest warm model time among verified targets: 163 ms p50 warm model time across 10 samples.

Model time
163 msp50 warm model time
p95 164 ms · 10 samples
Cold start
12m 29s
Most affordable
RTX6000 in us-central1
Use in playground

Lowest estimated GPU price among verified targets: $1.80/GPU-hr; 402 ms p50 warm model time across 10 samples.

Model time
402 msp50 warm model time
p95 405 ms · p99 407 ms · 10 samples
Cold start
9m 30s
GPURegionStatusVRAMCold startModel timeRelativeTokens/sEst. $/GPU-hrTarget
B200us-central1works162.3 GB10m 40s243 msp50 warm model time
10 samples
67% · -33%893$7.15Use in playground
B300fastestuk-south1works242.4 GB12m 29s163 msp50 warm model time
p95 164 ms · 10 samples
100%1331$7.85Use in playground
H100—not probed———————
H200eu-north2works127.2 GB3m 28s211 msp50 warm model time
p95 532 ms · p99 789 ms · 11 samples
77% · -23%1033$4.50Use in playground
L40Seu-north1works41.0 GB4m 29s506 msp50 warm model time
p95 507 ms · 10 samples
32% · -68%429$1.82Use in playground
RTX6000us-central1works86.8 GB9m 30s402 msp50 warm model time
p95 405 ms · p99 407 ms · 10 samples
41% · -59%540$1.80Use in playground
How we measure

Model time uses the p50 warm model-reported execution time when available, then falls back to the latest probe time; p95/p99 and sample count appear when there is enough probe history. Cold start excludes the first (uncached) run. VRAM is the peak GPU memory seen during the probe. Relative compares each row's model time to the highlighted baseline (fastest row by default; hover any row to re-root). The fastest chip marks only verified supported GPU-region rows. Estimated on-demand GPU price (Nebius pay-as-you-go); shown for performance/price comparison. Configured minimum GPU memory: 80 GB.

Try it out

cold·Healthcare / Life Science
Open Account
Leave GPU on “Any available GPU” to use a warm or verified backend automatically.API docs for this target
Request targetRoute/v1/chat/completionsModelnvidia-healthcare-text2sqlVersionnim-1-15-1-candidateGPUautomaticRegionautomatic

Inputs

API examples

Use the API

API docs

Snippet target: nvidia-healthcare-text2sql version nim-1-15-1-candidate using scheduler-selected GPU/region.

Client auth: Set FORGE_API_KEY to a real Forge API key before running copied curl, fetch, or SDK snippets. Browser SSO only authenticates this web session.

Open Account
import os

from openai import OpenAI

api_base = os.environ.get("FORGE_API_BASE", "https://YOUR_FORGE_HOST").rstrip("/")
openai_base = os.environ.get("FORGE_OPENAI_BASE_URL", "").strip().rstrip("/")
if not openai_base:
    openai_base = api_base if api_base.endswith("/v1") else f"{api_base}/v1"
request_timeout_seconds = float(os.environ.get("FORGE_REQUEST_TIMEOUT_SECONDS", "600"))
api_key = os.environ.get("FORGE_API_KEY")
if not api_key or api_key == "replace-with-your-forge-api-key":
    raise SystemExit("Set FORGE_API_KEY to a real Forge API key before running this snippet; browser SSO sessions are not sent to copied curl or SDK clients.")


client = OpenAI(
    api_key=api_key,
    base_url=openai_base,
    timeout=request_timeout_seconds,
)

response = client.chat.completions.create(
    model="nvidia-healthcare-text2sql",
    top_p=1,
    stream=True,
    messages=[
        {
            "role": "system",
            "content": "detailed thinking off. Generate SQL only for the supplied schema. Do not provide medical advice or clinical interpretation."
        },
        {
            "role": "user",
            "content": "Based on DDL statements, instructions, and the current date, generate a SQL query in sqlite to answer the question.\nIf the question cannot be answered using the available tables and columns in the DDL, return only: None.\nToday is 2026-05-25 00:00:00\nDDL statements:\nDROP TABLE IF EXISTS diagnosis;\nCREATE TABLE diagnosis (\n  diagnosisid INT NOT NULL PRIMARY KEY,\n  patientunitstayid INT NOT NULL,\n  diagnosisname VARCHAR(200) NOT NULL,\n  diagnosistime TIMESTAMP NOT NULL,\n  icd9code VARCHAR(100)\n);\nInstructions:\n- Use only the provided schema.\n- Return only the SQL query.\n- Do not provide medical advice or patient-specific interpretation.\nquestion: How many distinct patients have at least one diagnosis recorded?"
        }
    ],
    max_tokens=256,
    temperature=0,
    extra_body={
        "model_version": "nim-1-15-1-candidate"
    },
)
for chunk in response:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
print()
Setup & .env

Install for OpenAI SDK

Copy setup before the request when moving this snippet into a fresh shell. The default 600 second timeout is intentional for GPU cold starts and can be overridden with FORGE_REQUEST_TIMEOUT_SECONDS.

python3 -m pip install --upgrade openai
export FORGE_API_BASE='https://YOUR_FORGE_HOST'
export FORGE_API_KEY="${FORGE_API_KEY:-replace-with-your-forge-api-key}"
export FORGE_REQUEST_TIMEOUT_SECONDS="${FORGE_REQUEST_TIMEOUT_SECONDS:-600}"

Project .env

Copy these values into a local .env file when moving the selected target into an app or SDK client.

# Forge selected target: route=/v1/chat/completions model=nvidia-healthcare-text2sql version=nim-1-15-1-candidate
FORGE_API_BASE="https://YOUR_FORGE_HOST"
FORGE_API_ROUTE="/v1/chat/completions"
FORGE_OPENAI_BASE_URL="https://YOUR_FORGE_HOST/v1"
FORGE_API_KEY="replace-with-your-forge-api-key"
FORGE_REQUEST_TIMEOUT_SECONDS="600"
MODEL_OR_FAMILY_SLUG="nvidia-healthcare-text2sql"
FORGE_MODEL_VERSION="nim-1-15-1-candidate"
Output

Run a request to see output here.

Deploy to Nebius Serverless

Run a dedicated, autoscaling endpoint in your own Nebius account. The endpoint runs under your account and billing — Forge just pre-fills the configuration for you.

Deploy in your Nebius account ↗

Opens the Nebius Console with the image pre-filled for Llama 3.1 Nemotron Nano 8B Healthcare Text2SQL (Forge version NIM 1.15.1).

Prefer to create the endpoint from the CLI, or self-manage the container image? Use the commands below.

The image is hosted on cr.eu-north1.nebius.cloud; you may need registry credentials in the Console form. The CLI below includes placeholders.

The links use Forge’s eu-north1 Nebius Container Registry mirror. If your project can’t pull that private mirror, add pull credentials or a registry secret.

nebius CLI
# Runs in YOUR Nebius account (you own + pay for the endpoint).
# platform/preset must exist in your project — list them with:
#   nebius compute platform list
export ENDPOINT_NAME="nvidia-llama-3-1-nemotron-nano-8b-healthcare-text2sql-nim-priva"
export AUTH_TOKEN=$(openssl rand -hex 32)
export SUBNET_ID=$(nebius vpc subnet list --format jsonpath='{.items[0].metadata.id}')
export REGISTRY_USERNAME="YOUR_REGISTRY_USERNAME"
export REGISTRY_PASSWORD="YOUR_REGISTRY_PASSWORD"

# Note: the --image above points at Forge's regional Nebius CR mirror.
#   Serverless AI can pull Container Registry images without credentials
#   only when the image is public or in the same project. For a private
#   mirror in another project, provide pull credentials or a MysteryBox
#   registry secret with REGISTRY_USERNAME and REGISTRY_PASSWORD.

nebius ai endpoint create \
  --name "$ENDPOINT_NAME" \
  --image "cr.eu-north1.nebius.cloud/e00h91c5sa606xfwpj/models/nim-nvidia-llama-3.1-nemotron-nano-8b-healthcare-text2sql-v1.0:1.15.1@sha256:68e8f96f7eebb6e377510269104bf43706fa5f07c0a04003ed79a4b5322f61fd" \
  --registry-username "$REGISTRY_USERNAME" \
  --registry-password "$REGISTRY_PASSWORD" \
  --container-port 8000 \
  --auth token \
  --token "$AUTH_TOKEN" \
  --subnet-id "$SUBNET_ID"

export ENDPOINT_ID=$(nebius ai endpoint get-by-name --name "$ENDPOINT_NAME" --format jsonpath='{.metadata.id}')
nebius ai endpoint get "$ENDPOINT_ID"

Need a throughput- and cost-optimized build tuned for specific Nebius GPUs? Nebius Token Factory is coming soon — contact your Nebius account team for early access.