Nscale

The Nscale provider enables you to use Nscale's Serverless Inference API models with promptfoo. Nscale offers cost-effective AI inference with up to 80% savings compared to other providers, zero rate limits, and no cold starts.

Setup

Set your Nscale service token as an environment variable:

export NSCALE_SERVICE_TOKEN=your_service_token_here

Alternatively, you can add it to your .env file:

NSCALE_SERVICE_TOKEN=your_service_token_here

Obtaining Credentials

You can obtain service tokens by:

Signing up at Nscale
Navigating to your account settings
Going to "Service Tokens" section

Configuration

To use Nscale models in your promptfoo configuration, use the nscale: prefix followed by the model name:

providers:
  - nscale:openai/gpt-oss-120b
  - nscale:meta/llama-3.3-70b-instruct
  - nscale:qwen/qwen-3-235b-a22b-instruct

Model Types

Nscale supports different types of models through specific endpoint formats:

Chat Completion Models (Default)

For chat completion models, you can use either format:

providers:
  - nscale:chat:openai/gpt-oss-120b
  - nscale:openai/gpt-oss-120b # Defaults to chat

Completion Models

For text completion models:

providers:
  - nscale:completion:openai/gpt-oss-20b

Embedding Models

For embedding models:

providers:
  - nscale:embedding:qwen/qwen3-embedding-8b
  - nscale:embeddings:qwen/qwen3-embedding-8b # Alternative format

Popular Models

Nscale offers a wide range of popular AI models:

Text Generation Models

Model	Provider Format	Use Case
GPT OSS 120B	`nscale:openai/gpt-oss-120b`	General-purpose reasoning and tasks
GPT OSS 20B	`nscale:openai/gpt-oss-20b`	Lightweight general-purpose model
Qwen 3 235B Instruct	`nscale:qwen/qwen-3-235b-a22b-instruct`	Large-scale language understanding
Qwen 3 235B Instruct 2507	`nscale:qwen/qwen-3-235b-a22b-instruct-2507`	Latest Qwen 3 235B variant
Qwen 3 4B Thinking 2507	`nscale:qwen/qwen-3-4b-thinking-2507`	Reasoning and thinking tasks
Qwen 3 8B	`nscale:qwen/qwen-3-8b`	Mid-size general-purpose model
Qwen 3 14B	`nscale:qwen/qwen-3-14b`	Enhanced reasoning capabilities
Qwen 3 32B	`nscale:qwen/qwen-3-32b`	Large-scale reasoning and analysis
Qwen 2.5 Coder 3B Instruct	`nscale:qwen/qwen-2.5-coder-3b-instruct`	Lightweight code generation
Qwen 2.5 Coder 7B Instruct	`nscale:qwen/qwen-2.5-coder-7b-instruct`	Code generation and programming
Qwen 2.5 Coder 32B Instruct	`nscale:qwen/qwen-2.5-coder-32b-instruct`	Advanced code generation
Qwen QwQ 32B	`nscale:qwen/qwq-32b`	Specialized reasoning model
Llama 3.3 70B Instruct	`nscale:meta/llama-3.3-70b-instruct`	High-quality instruction following
Llama 3.1 8B Instruct	`nscale:meta/llama-3.1-8b-instruct`	Efficient instruction following
Llama 4 Scout 17B	`nscale:meta/llama-4-scout-17b-16e-instruct`	Image-Text-to-Text capabilities
DeepSeek R1 Distill Llama 70B	`nscale:deepseek/deepseek-r1-distill-llama-70b`	Efficient reasoning model
DeepSeek R1 Distill Llama 8B	`nscale:deepseek/deepseek-r1-distill-llama-8b`	Lightweight reasoning model
DeepSeek R1 Distill Qwen 1.5B	`nscale:deepseek/deepseek-r1-distill-qwen-1.5b`	Ultra-lightweight reasoning
DeepSeek R1 Distill Qwen 7B	`nscale:deepseek/deepseek-r1-distill-qwen-7b`	Compact reasoning model
DeepSeek R1 Distill Qwen 14B	`nscale:deepseek/deepseek-r1-distill-qwen-14b`	Mid-size reasoning model
DeepSeek R1 Distill Qwen 32B	`nscale:deepseek/deepseek-r1-distill-qwen-32b`	Large reasoning model
Devstral Small 2505	`nscale:mistral/devstral-small-2505`	Code generation and development
Mixtral 8x22B Instruct	`nscale:mistral/mixtral-8x22b-instruct-v0.1`	Large mixture-of-experts model

Embedding Models

Model	Provider Format	Use Case
Qwen 3 Embedding 8B	`nscale:embedding:Qwen/Qwen3-Embedding-8B`	Text embeddings and similarity

Text-to-Image Models

Model	Provider Format	Use Case
Flux.1 Schnell	`nscale:image:BlackForestLabs/FLUX.1-schnell`	Fast image generation
Stable Diffusion XL	`nscale:image:stabilityai/stable-diffusion-xl-base-1.0`	High-quality image generation
SDXL Lightning 4-step	`nscale:image:ByteDance/SDXL-Lightning-4step`	Ultra-fast image generation
SDXL Lightning 8-step	`nscale:image:ByteDance/SDXL-Lightning-8step`	Balanced speed and quality

Configuration Options

Nscale supports standard OpenAI-compatible parameters:

providers:
  - id: nscale:openai/gpt-oss-120b
    config:
      temperature: 0.7
      max_tokens: 1024
      top_p: 0.9
      frequency_penalty: 0.1
      presence_penalty: 0.2
      stop: ['END', 'STOP']
      stream: true

Supported Parameters

temperature: Controls randomness (0.0 to 2.0)
max_tokens: Maximum number of tokens to generate
top_p: Nucleus sampling parameter
frequency_penalty: Reduces repetition based on frequency
presence_penalty: Reduces repetition based on presence
stop: Stop sequences to halt generation
stream: Enable streaming responses
seed: Deterministic sampling seed

Example Configuration

Here's a complete example configuration:

providers:
  - id: nscale-gpt-oss
    config:
      temperature: 0.7
      max_tokens: 512
  - id: nscale-llama
    config:
      temperature: 0.5
      max_tokens: 1024

prompts:
  - 'Explain {{concept}} in simple terms'
  - 'What are the key benefits of {{concept}}?'

tests:
  - vars:
      concept: quantum computing
    assert:
      - type: contains
        value: 'quantum'
      - type: llm-rubric
        value: 'Explanation should be clear and accurate'

Pricing

Nscale offers highly competitive pricing:

Text Generation: Starting from $0.01 input / $0.03 output per 1M tokens
Embeddings: $0.04 per 1M tokens
Image Generation: Starting from $0.0008 per mega-pixel

For the most current pricing information, visit Nscale's pricing page.

Key Features

Cost-Effective: Up to 80% savings compared to other providers
Zero Rate Limits: No throttling or request limits
No Cold Starts: Instant response times
Serverless: No infrastructure management required
OpenAI Compatible: Standard API interface
Global Availability: Low-latency inference worldwide

Error Handling

The Nscale provider includes built-in error handling for common issues:

Network timeouts and retries
Rate limiting (though Nscale has zero rate limits)
Invalid API key errors
Model availability issues

Support

For support with the Nscale provider:

Setup​

Obtaining Credentials​

Configuration​

Model Types​

Chat Completion Models (Default)​

Completion Models​

Embedding Models​

Popular Models​

Text Generation Models​

Embedding Models​

Text-to-Image Models​

Configuration Options​

Supported Parameters​

Example Configuration​

Pricing​

Key Features​

Error Handling​

Support​

Setup

Obtaining Credentials

Configuration

Model Types

Chat Completion Models (Default)

Completion Models

Embedding Models

Popular Models

Text Generation Models

Embedding Models

Text-to-Image Models

Configuration Options

Supported Parameters

Example Configuration

Pricing

Key Features

Error Handling

Support