Moonshot (Kimi)
Moonshot AI provides an OpenAI-compatible API for its Kimi models — the Kimi K2 thinking models and the Moonshot v1 generation models. The Moonshot provider extends the OpenAI provider, so all of its options are supported.
Setup
- Get an API key from the Kimi (Moonshot) platform.
- Set the
MOONSHOT_API_KEYenvironment variable or specifyapiKeyin your config.
providers:
- id: moonshot:kimi-k2.6
Both moonshot:<model> and moonshot:chat:<model> resolve to the chat completions endpoint. If you omit the model, the provider defaults to kimi-k2.6.
Available Models
Moonshot's lineup rotates over time — call the list models API (GET https://api.moonshot.ai/v1/models) for the live set. As of writing:
- Kimi K2 — thinking models, 256k context:
kimi-k2.6,kimi-k2.5,kimi-k2.7-code,kimi-k2.7-code-highspeed. These reason before answering and emit a separate reasoning stream (see below). - Moonshot v1 — generation models:
moonshot-v1-8k,moonshot-v1-32k,moonshot-v1-128k(context-length variants), the vision variantsmoonshot-v1-8k-vision-preview/moonshot-v1-32k-vision-preview/moonshot-v1-128k-vision-preview, and the auto-routermoonshot-v1-auto.
The older kimi-k2-0711-preview, kimi-k2-0905-preview, kimi-k2-turbo-preview, kimi-k2-thinking, kimi-k2-thinking-turbo, and kimi-latest ids were discontinued in 2026 — use kimi-k2.6 instead.
Configuration
providers:
- id: moonshot:kimi-k2.6 # flagship thinking model — leave sampling params unset
- id: moonshot:moonshot-v1-8k # generation model — accepts arbitrary sampling
config:
temperature: 0.2
max_tokens: 1024
Configuration Options
The provider accepts every option the OpenAI provider supports. Commonly used:
temperature,max_tokens,top_p,presence_penalty,frequency_penaltystreamresponse_format(JSON mode),tools/tool_choice(function calling)showThinking— set tofalseto drop a thinking model's reasoning from the graded output (defaulttrue)cost,inputCost,outputCost,cacheReadCost— Moonshot ships no built-in price table, so set these to track cost.inputCost/outputCosttake precedence over the flatcost;cacheReadCostprices cached prompt tokens.
Any other parameter supported by the OpenAI provider is forwarded as-is.
Kimi K2 thinking models
The kimi-k2.x models are reasoning models and behave differently from the moonshot-v1 family:
- Fixed sampling parameters. Kimi pins
temperature(1.0with thinking on),top_p,n, and the penalties to fixed values and returns a400("invalid temperature: only 1 is allowed for this model") for any other value. The provider therefore does not send promptfoo's defaulttemperature/max_tokensforkimi-*models — leave them unset (recommended) or settemperature: 1. Themoonshot-v1models accept arbitrary sampling values. - Reasoning output. Kimi returns a separate
reasoning_contentstream that promptfoo surfaces with aThinking: …prefix. SetshowThinking: falsewhen you assert on structured output (for exampleis-json) so the reasoning doesn't contaminate the parsed result. - Token budget. Reasoning tokens count against
max_tokens. When you leavemax_tokensunset the provider lets Moonshot apply its 32k default; if you set it, leave generous headroom for the answer. - Disable thinking.
kimi-k2.6andkimi-k2.5supportthinking: { type: disabled }(pass it viaconfig.passthrough);kimi-k2.7-codeis always thinking.
providers:
- id: moonshot:kimi-k2.6
config:
showThinking: false
passthrough:
thinking: { type: disabled } # optional: turn reasoning off
See Using Thinking Models for the full behavior matrix.
Vision
The vision models (moonshot-v1-*-vision-preview) and the multimodal Kimi models (kimi-k2.5 / kimi-k2.6 / kimi-k2.7-code) accept base64-encoded image input using the standard OpenAI image_url content format. Moonshot does not accept remote image URLs — embed images as data: URIs. See Use the Kimi Vision Model.
Example Usage
providers:
- id: moonshot:kimi-k2.6
- id: openai:gpt-4o-mini
prompts:
- 'Summarize the following in one sentence: {{text}}'
tests:
- vars:
text: 'Promptfoo is an open-source tool for testing and evaluating LLM apps.'
A runnable comparison lives in examples/provider-moonshot.
API Details
- Base URL:
https://api.moonshot.ai/v1(global). China-mainland keys usehttps://api.moonshot.cn/v1— point at it withapiBaseUrl, since the global and China platforms issue region-locked keys. - OpenAI-compatible chat completions API.
- Full API documentation.
See Also
- OpenAI Provider — compatible configuration options
- Kimi model list and pricing