Moonshot (Kimi)

Moonshot AI provides an OpenAI-compatible API for its Kimi models — the Kimi K3 and K2 thinking models and the Moonshot v1 generation models. The Moonshot provider extends the OpenAI provider, so all of its options are supported.

Setup

Get an API key from the Kimi (Moonshot) platform.
Set the MOONSHOT_API_KEY environment variable or specify apiKey in your config.

providers:
  - id: moonshot:kimi-k3

Both moonshot:<model> and moonshot:chat:<model> resolve to the chat completions endpoint. If you omit the model, the provider defaults to kimi-k3.

Available Models

Moonshot's lineup rotates over time — call the list models API (GET https://api.moonshot.ai/v1/models) for the live set. As of writing:

Kimi K3 — flagship thinking model, 1M context: kimi-k3 (released July 2026). Always-on reasoning, native image and video understanding, and a top-level reasoning_effort control (see below).
Kimi K2 — thinking models, 256k context: kimi-k2.6, kimi-k2.5, kimi-k2.7-code, kimi-k2.7-code-highspeed. These reason before answering and emit a separate reasoning stream (see below).
Moonshot v1 — legacy generation models: moonshot-v1-8k, moonshot-v1-32k, moonshot-v1-128k (context-length variants) and the vision variants moonshot-v1-8k-vision-preview / moonshot-v1-32k-vision-preview / moonshot-v1-128k-vision-preview; existing accounts may also see the auto-router moonshot-v1-auto in the list-models API. The moonshot-v1 series and kimi-k2.5 are closed to newly registered users and sunset platform-wide on August 31, 2026.

The older kimi-k2-0711-preview, kimi-k2-0905-preview, kimi-k2-turbo-preview, kimi-k2-thinking, kimi-k2-thinking-turbo, and kimi-latest ids were discontinued in 2026 — Moonshot recommends migrating to kimi-k3.

Configuration

providers:
  - id: moonshot:kimi-k3 # flagship thinking model — leave sampling params unset
  - id: moonshot:moonshot-v1-8k # legacy generation model — accepts arbitrary sampling
    config:
      temperature: 0.2
      max_tokens: 1024

Configuration Options

The provider accepts every option the OpenAI provider supports. Commonly used:

temperature, max_tokens, top_p, presence_penalty, frequency_penalty
stream
response_format (JSON mode), tools / tool_choice (function calling)
showThinking — set to false to drop a thinking model's reasoning from the graded output (default true)
cost, inputCost, outputCost, cacheReadCost — Moonshot ships no built-in price table, so set these to track cost. Every override is in USD per token. Moonshot's official pricing page publishes rates in USD per 1 million tokens, so divide each published rate by 1,000,000 before configuring it. inputCost/outputCost take precedence over the flat cost; cacheReadCost prices cached prompt tokens.

For example, these illustrative per-million rates convert to per-token overrides. Check the official pricing page for the current rates for your model before using them.

providers:
  - id: moonshot:kimi-k3
    config:
      inputCost: 0.000003 # $3.00 per 1M tokens divided by 1,000,000
      cacheReadCost: 0.0000003 # $0.30 per 1M tokens divided by 1,000,000
      outputCost: 0.000015 # $15.00 per 1M tokens divided by 1,000,000
  - id: moonshot:kimi-k2.6
    config:
      inputCost: 0.00000095 # $0.95 per 1M tokens divided by 1,000,000
      cacheReadCost: 0.00000016 # $0.16 per 1M tokens divided by 1,000,000
      outputCost: 0.000004 # $4.00 per 1M tokens divided by 1,000,000

Any other parameter supported by the OpenAI provider is forwarded as-is.

Kimi thinking models

The kimi-k3 and kimi-k2.x models are reasoning models and behave differently from the moonshot-v1 family:

Fixed sampling parameters. Kimi pins temperature (1.0 with thinking on), top_p, n, and the penalties to fixed values and returns a 400 ("invalid temperature: only 1 is allowed for this model") for any other value. The provider therefore does not send promptfoo's default temperature/max_tokens for kimi-* models — leave them unset (recommended) or set temperature: 1. The moonshot-v1 models accept arbitrary sampling values.
Reasoning output. Kimi returns a separate reasoning_content stream that promptfoo surfaces with a Thinking: … prefix. Set showThinking: false when you assert on structured output (for example is-json) so the reasoning doesn't contaminate the parsed result.
Token budget. Reasoning tokens count against the output budget. When you leave the token limit unset the provider lets Moonshot apply its server default (32k for K2.x, 131k for K3); if you set one, leave generous headroom for the answer. Moonshot's canonical field is max_completion_tokens (max_tokens is a deprecated alias) — the provider sends max_completion_tokens for kimi-* models whichever of the two you configure.
Controlling thinking. The two generations use different, mutually exclusive controls:
- kimi-k3 is always thinking and accepts a top-level reasoning_effort field, which the provider forwards from config.reasoning_effort. Currently only max (the default) is accepted; Moonshot plans more levels. Do not send the K2.x thinking parameter to K3 — the API rejects it.
- kimi-k2.6 and kimi-k2.5 support thinking: { type: disabled } (pass it via config.passthrough); kimi-k2.7-code is always thinking. The thinking parameter is K2.x-only.
- Setting config.reasoning_effort on a non-K3 model is a configuration error and the provider fails fast with a clear message instead of sending it.

providers:
  - id: moonshot:kimi-k3
    config:
      showThinking: false
      reasoning_effort: max # optional: currently the only accepted value
  - id: moonshot:kimi-k2.6
    config:
      passthrough:
        thinking: { type: disabled } # K2.x only: turn reasoning off

See the Kimi K3 quickstart and Using Thinking Models for the full behavior matrix.

Vision

The vision models (moonshot-v1-*-vision-preview) and the multimodal Kimi models (kimi-k3, kimi-k2.5, kimi-k2.6, kimi-k2.7-code, kimi-k2.7-code-highspeed) accept base64-encoded image input using the standard OpenAI image_url content format. Moonshot does not accept remote image URLs — embed images as data: URIs or reference files uploaded to the Kimi platform as ms://<file-id>. Video input is supported by kimi-k3, kimi-k2.6, kimi-k2.7-code, and kimi-k2.7-code-highspeed, referenced the same way after uploading via the files API. See Use the Kimi Vision Model.

Example Usage

providers:
  - id: moonshot:kimi-k3
  - id: openai:gpt-4o-mini

prompts:
  - 'Summarize the following in one sentence: {{text}}'

tests:
  - vars:
      text: 'Promptfoo is an open-source tool for testing and evaluating LLM apps.'

A runnable comparison lives in examples/provider-moonshot.

API Details

Base URL: https://api.moonshot.ai/v1 (global). China-mainland keys use https://api.moonshot.cn/v1 — point at it with apiBaseUrl, since the global and China platforms issue region-locked keys.
OpenAI-compatible chat completions API.
Full API documentation.

Setup​

Available Models​

Configuration​

Configuration Options​

Kimi thinking models​

Vision​

Example Usage​

API Details​

See Also​