Nscale
The Nscale provider enables you to use Nscale's Serverless Inference API models with promptfoo. Nscale offers cost-effective AI inference with up to 80% savings compared to other providers, zero rate limits, and no cold starts.
Setup
Set your Nscale service token as an environment variable:
export NSCALE_SERVICE_TOKEN=your_service_token_here
Alternatively, you can add it to your .env
file:
NSCALE_SERVICE_TOKEN=your_service_token_here
Obtaining Credentials
You can obtain service tokens by:
- Signing up at Nscale
- Navigating to your account settings
- Going to "Service Tokens" section
Configuration
To use Nscale models in your promptfoo configuration, use the nscale:
prefix followed by the model name:
providers:
- nscale:openai/gpt-oss-120b
- nscale:meta/llama-3.3-70b-instruct
- nscale:qwen/qwen-3-235b-a22b-instruct
Model Types
Nscale supports different types of models through specific endpoint formats:
Chat Completion Models (Default)
For chat completion models, you can use either format:
providers:
- nscale:chat:openai/gpt-oss-120b
- nscale:openai/gpt-oss-120b # Defaults to chat
Completion Models
For text completion models:
providers:
- nscale:completion:openai/gpt-oss-20b
Embedding Models
For embedding models:
providers:
- nscale:embedding:qwen/qwen3-embedding-8b
- nscale:embeddings:qwen/qwen3-embedding-8b # Alternative format
Popular Models
Nscale offers a wide range of popular AI models:
Text Generation Models
Model | Provider Format | Use Case |
---|---|---|
GPT OSS 120B | nscale:openai/gpt-oss-120b | General-purpose reasoning and tasks |
GPT OSS 20B | nscale:openai/gpt-oss-20b | Lightweight general-purpose model |
Qwen 3 235B Instruct | nscale:qwen/qwen-3-235b-a22b-instruct | Large-scale language understanding |
Qwen 3 235B Instruct 2507 | nscale:qwen/qwen-3-235b-a22b-instruct-2507 | Latest Qwen 3 235B variant |
Qwen 3 4B Thinking 2507 | nscale:qwen/qwen-3-4b-thinking-2507 | Reasoning and thinking tasks |
Qwen 3 8B | nscale:qwen/qwen-3-8b | Mid-size general-purpose model |
Qwen 3 14B | nscale:qwen/qwen-3-14b | Enhanced reasoning capabilities |
Qwen 3 32B | nscale:qwen/qwen-3-32b | Large-scale reasoning and analysis |
Qwen 2.5 Coder 3B Instruct | nscale:qwen/qwen-2.5-coder-3b-instruct | Lightweight code generation |
Qwen 2.5 Coder 7B Instruct | nscale:qwen/qwen-2.5-coder-7b-instruct | Code generation and programming |
Qwen 2.5 Coder 32B Instruct | nscale:qwen/qwen-2.5-coder-32b-instruct | Advanced code generation |
Qwen QwQ 32B | nscale:qwen/qwq-32b | Specialized reasoning model |
Llama 3.3 70B Instruct | nscale:meta/llama-3.3-70b-instruct | High-quality instruction following |
Llama 3.1 8B Instruct | nscale:meta/llama-3.1-8b-instruct | Efficient instruction following |
Llama 4 Scout 17B | nscale:meta/llama-4-scout-17b-16e-instruct | Image-Text-to-Text capabilities |
DeepSeek R1 Distill Llama 70B | nscale:deepseek/deepseek-r1-distill-llama-70b | Efficient reasoning model |
DeepSeek R1 Distill Llama 8B | nscale:deepseek/deepseek-r1-distill-llama-8b | Lightweight reasoning model |
DeepSeek R1 Distill Qwen 1.5B | nscale:deepseek/deepseek-r1-distill-qwen-1.5b | Ultra-lightweight reasoning |
DeepSeek R1 Distill Qwen 7B | nscale:deepseek/deepseek-r1-distill-qwen-7b | Compact reasoning model |
DeepSeek R1 Distill Qwen 14B | nscale:deepseek/deepseek-r1-distill-qwen-14b | Mid-size reasoning model |
DeepSeek R1 Distill Qwen 32B | nscale:deepseek/deepseek-r1-distill-qwen-32b | Large reasoning model |
Devstral Small 2505 | nscale:mistral/devstral-small-2505 | Code generation and development |
Mixtral 8x22B Instruct | nscale:mistral/mixtral-8x22b-instruct-v0.1 | Large mixture-of-experts model |
Embedding Models
Model | Provider Format | Use Case |
---|---|---|
Qwen 3 Embedding 8B | nscale:embedding:Qwen/Qwen3-Embedding-8B | Text embeddings and similarity |
Text-to-Image Models
Model | Provider Format | Use Case |
---|---|---|
Flux.1 Schnell | nscale:image:BlackForestLabs/FLUX.1-schnell | Fast image generation |
Stable Diffusion XL | nscale:image:stabilityai/stable-diffusion-xl-base-1.0 | High-quality image generation |
SDXL Lightning 4-step | nscale:image:ByteDance/SDXL-Lightning-4step | Ultra-fast image generation |
SDXL Lightning 8-step | nscale:image:ByteDance/SDXL-Lightning-8step | Balanced speed and quality |
Configuration Options
Nscale supports standard OpenAI-compatible parameters:
providers:
- id: nscale:openai/gpt-oss-120b
config:
temperature: 0.7
max_tokens: 1024
top_p: 0.9
frequency_penalty: 0.1
presence_penalty: 0.2
stop: ['END', 'STOP']
stream: true
Supported Parameters
temperature
: Controls randomness (0.0 to 2.0)max_tokens
: Maximum number of tokens to generatetop_p
: Nucleus sampling parameterfrequency_penalty
: Reduces repetition based on frequencypresence_penalty
: Reduces repetition based on presencestop
: Stop sequences to halt generationstream
: Enable streaming responsesseed
: Deterministic sampling seed
Example Configuration
Here's a complete example configuration:
providers:
- id: nscale-gpt-oss
config:
temperature: 0.7
max_tokens: 512
- id: nscale-llama
config:
temperature: 0.5
max_tokens: 1024
prompts:
- 'Explain {{concept}} in simple terms'
- 'What are the key benefits of {{concept}}?'
tests:
- vars:
concept: quantum computing
assert:
- type: contains
value: 'quantum'
- type: llm-rubric
value: 'Explanation should be clear and accurate'
Pricing
Nscale offers highly competitive pricing:
- Text Generation: Starting from $0.01 input / $0.03 output per 1M tokens
- Embeddings: $0.04 per 1M tokens
- Image Generation: Starting from $0.0008 per mega-pixel
For the most current pricing information, visit Nscale's pricing page.
Key Features
- Cost-Effective: Up to 80% savings compared to other providers
- Zero Rate Limits: No throttling or request limits
- No Cold Starts: Instant response times
- Serverless: No infrastructure management required
- OpenAI Compatible: Standard API interface
- Global Availability: Low-latency inference worldwide
Error Handling
The Nscale provider includes built-in error handling for common issues:
- Network timeouts and retries
- Rate limiting (though Nscale has zero rate limits)
- Invalid API key errors
- Model availability issues
Support
For support with the Nscale provider: