NVIDIA NIM
The NVIDIA provider connects promptfoo to NVIDIA's hosted inference API at https://integrate.api.nvidia.com/v1. The endpoint is OpenAI-compatible, so any model NVIDIA exposes through it can be used the same way you'd use OpenAI Chat Completions.
Setup
Set your API key as an environment variable:
export NVIDIA_API_KEY=your_api_key_here
Or add it to your .env file:
NVIDIA_API_KEY=your_api_key_here
Getting an API key
- Sign in at build.nvidia.com (free developer account).
- Open any model card (for example, Llama 3.3 70B Instruct).
- Click Get API Key. The key starts with
nvapi-.
NVIDIA's developer program currently grants a recurring allowance of free request credits per account, which is usually enough for prompt iteration and small evals before any paid usage is needed. Current credit limits and pricing are documented at build.nvidia.com; check there for what is in effect today rather than assuming the value listed in any blog post.
Configuration
Use the nvidia: prefix followed by the full model id as listed on the model card:
providers:
- nvidia:meta/llama-3.3-70b-instruct
- nvidia:qwen/qwen2.5-coder-32b-instruct
- nvidia:nvidia/llama-3.1-nemotron-70b-instruct
Standard OpenAI-compatible parameters are passed through:
providers:
- id: nvidia:meta/llama-3.3-70b-instruct
config:
temperature: 0.7
max_tokens: 1024
top_p: 0.9
stop: ['END']
To override the base URL (for example, when routing through a corporate proxy or to a self-hosted NIM):
providers:
- id: nvidia:meta/llama-3.3-70b-instruct
config:
apiBaseUrl: https://your-proxy.example.com/nvidia/v1
apiKeyEnvar: CUSTOM_NVIDIA_KEY
A few common models
The full list is on build.nvidia.com. Some commonly used ids:
| Model | Provider format |
|---|---|
| Llama 3.3 70B Instruct | nvidia:meta/llama-3.3-70b-instruct |
| Llama 3.1 405B Instruct | nvidia:meta/llama-3.1-405b-instruct |
| Llama 3.2 90B Vision Instruct | nvidia:meta/llama-3.2-90b-vision-instruct |
| Llama 3.1 Nemotron 70B Instruct | nvidia:nvidia/llama-3.1-nemotron-70b-instruct |
| Mistral Large 2 Instruct | nvidia:mistralai/mistral-large-2-instruct |
| Mixtral 8x22B Instruct | nvidia:mistralai/mixtral-8x22b-instruct-v0.1 |
| Qwen 2.5 Coder 32B Instruct | nvidia:qwen/qwen2.5-coder-32b-instruct |
| DeepSeek R1 | nvidia:deepseek-ai/deepseek-r1 |
Example
A minimal eval comparing two NIM-hosted models. Uses deterministic assertions so the example runs end-to-end with only NVIDIA_API_KEY configured — llm-rubric would otherwise fall back to promptfoo's default OpenAI grader and require a separate OPENAI_API_KEY.
providers:
- id: nvidia:meta/llama-3.3-70b-instruct
config:
temperature: 0.2
max_tokens: 256
- id: nvidia:nvidia/llama-3.1-nemotron-70b-instruct
config:
temperature: 0.2
max_tokens: 256
prompts:
- 'Summarise the following in one sentence: {{passage}}'
tests:
- vars:
passage: 'Photosynthesis is the process by which plants convert light energy into chemical energy stored in glucose.'
assert:
- type: icontains
value: plants
- type: icontains-any
value: [light, energy, glucose]
If you want a model-graded assertion, point llm-rubric at a NIM-hosted grader so the example stays self-contained:
defaultTest:
options:
provider: nvidia:meta/llama-3.3-70b-instruct
Notes
- Cost calculation is not built in for NVIDIA models. NIM bills against credits rather than per-token public price lists for many models, and the actual cost depends on your account tier. Set both
inputCostandoutputCoston the provider config if you want to record an estimate in eval output. - Tool calling and JSON-mode responses follow the same configuration as the OpenAI provider because the API surface is OpenAI-compatible. Streaming responses are not implemented by this provider.
- This provider supports NIM chat-completion models. Retrieval, embedding, reranking, and other NIM APIs require a provider that targets their corresponding endpoint.