Skip to main content


The cohere provider is an interface to Cohere AI's chat inference API, with models such as Command R that are optimized for RAG and tool usage.


First, set the COHERE_API_KEY environment variable with your Cohere API key.

Next, edit the promptfoo configuration file to point to the Cohere provider.

  • cohere:<model name> - uses the specified Cohere model (e.g., command, command-light).

The following models are confirmed supported. For an up-to-date list of supported models, see Cohere Models.

  • command-light
  • command-light-nightly
  • command
  • command-nightly
  • command-r
  • command-r-plus

Here's an example configuration:

- id: cohere:command
temperature: 0.5
max_tokens: 256
prompt_truncation: 'AUTO'
- id: web-search

Control over prompting

By default, a regular string prompt will be automatically wrapped in the appropriate chat format and sent to the Cohere API via the message field:

- 'Write a tweet about {{topic}}'

- cohere:command

- vars:
topic: bananas

If desired, your prompt can reference a YAML or JSON file that has a more complex set of API parameters. For example:

- file://prompt1.yaml

- cohere:command

- vars:
question: What year was he born?
- vars:
question: What did he like eating for breakfast?

And in prompt1.yaml:

- role: USER
message: 'Who discovered gravity?'
- role: CHATBOT
message: 'Isaac Newton'
message: '{{question}}'
- id: web-search

Displaying searches and documents

When the Cohere API is called, the provider can optionally include the search queries and documents in the output. This is controlled by the showSearchQueries and showDocuments config parameters. If true, the content will be appending to the output.


Cohere parameters

apiKeyYour Cohere API key if not using an environment variable.
chatHistoryAn array of chat history objects with role, message, and optionally user_name and conversation_id.
connectorsAn array of connector objects for integrating with external systems.
documentsAn array of document objects for providing reference material to the model.
frequency_penaltyPenalizes new tokens based on their frequency in the text so far.
kControls the diversity of the output via top-k sampling.
max_tokensThe maximum length of the generated text.
modelNameThe model name to use for the chat completion.
pControls the diversity of the output via nucleus (top-p) sampling.
preamble_overrideA string to override the default preamble used by the model.
presence_penaltyPenalizes new tokens based on their presence in the text so far.
prompt_truncationControls how prompts are truncated ('AUTO' or 'OFF').
search_queries_onlyIf true, only search queries are processed.
temperatureControls the randomness of the output.

Special parameters

showSearchQueriesIf true, includes the search queries used in the output.
showDocumentsIf true, includes the documents used in the output.