Using Promptfoo in n8n Workflows

This guide shows how to run Promptfoo evaluations from an n8n workflow so you can:

schedule nightly or ad‑hoc LLM tests,
gate downstream steps (Slack/Teams alerts, merge approvals, etc.) on pass‑rates, and
publish rich results links generated by Promptfoo.

Prerequisites

What	Why
Self‑hosted n8n ≥ v1 (Docker or bare‑metal)	Gives access to the “Execute Command” node.
Promptfoo CLI available in the container/host	Needed to run `promptfoo eval`.
(Optional) LLM provider API keys set as environment variables or n8n credentials	Example: `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, …
(Optional) Slack / email / GitHub nodes in the same workflow	For notifications or comments once the eval finishes.

Shipping a custom Docker image (recommended)

The easiest way is to bake Promptfoo into your n8n image so every workflow run already has the CLI:

# Dockerfile
FROM n8nio/n8n:latest          # or a fixed tag
USER root                      # gain perms to install packages
RUN npm install -g promptfoo   # installs CLI system‑wide
USER node                      # drop back to non‑root

Update docker‑compose.yml:

services:
  n8n:
    build: .
    env_file: .env # where your OPENAI_API_KEY lives
    volumes:
      - ./data:/data # prompts & configs live here

If you prefer not to rebuild the image you can install Promptfoo on the fly inside the Execute Command node, but that adds 10‑15 s to every execution.

Basic “Run & Alert” workflow

Below is the minimal pattern most teams start with:

#	Node	Purpose
1	Trigger (Cron or Webhook)	Decide when to evaluate (nightly, on Git push webhook …).
2	Execute Command	Runs Promptfoo and emits raw stdout / stderr.
3	Code / Set node	Parses the resulting JSON, extracts pass/fail counts & share‑URL.
4	IF node	Branches on “failures > 0”.
5	Slack / Email / GitHub	Sends alert or PR comment when the gate fails.

Execute Command node configuration

promptfoo eval \
 -c /data/promptfooconfig.yaml \
 --prompts "/data/prompts/**/*.json" \
 --output /tmp/pf-results.json \
 --share --fail-on-error
cat /tmp/pf-results.json

Set the working directory to /data (mount it with Docker volume) and set it to execute once (one run per trigger).

The node writes a machine‑readable results file and prints it to stdout, so the next node can simply JSON.parse($json["stdout"]).

info

The Execute Command node that we rely on is only available in self‑hosted n8n. n8n Cloud does not expose it yet.

Sample “Parse & alert” snippet (Code node, TypeScript)

// Input: raw JSON string from previous node
const output = JSON.parse(items[0].json.stdout as string);

const { successes, failures } = output.results.stats;
items[0].json.passRate = successes / (successes + failures);
items[0].json.failures = failures;
items[0].json.shareUrl = output.shareableUrl;

return items;

An IF node can then route execution:

failures = 0 → take green path (maybe just archive the results).
failures > 0 → post to Slack or comment on the pull request.

Evaluating n8n AI Agent prompts and outputs

If your goal is to test the prompt inside an n8n AI Agent / OpenAI node (not just run Promptfoo from a workflow), treat the n8n node like any other app contract:

Put the agent prompt in a file,
Map incoming n8n fields to tests.vars, and
Assert on the exact JSON or tool-call shape that downstream n8n nodes expect.

This works well when you want to regression-test an agent before wiring it into a larger workflow.

Validate JSON that downstream n8n nodes consume

If your agent is supposed to emit structured data for a Set, Code, Switch, or HTTP Request node, validate the payload directly.

promptfooconfig.yaml
prompts:
  - file://./prompts/n8n-support-router.txt

providers:
  - openai:gpt-5-mini

tests:
  - vars:
      customer_message: 'Customer wants to cancel order #4815 and asks for a refund'
    assert:
      - type: contains-json
        value:
          type: object
          required: [route, priority, reply]
          properties:
            route:
              type: string
              enum: [billing, support, sales]
            priority:
              type: string
              enum: [low, medium, high]
            reply:
              type: string

Use contains-json when the model may wrap JSON in prose or a markdown code block. If your node must return only JSON, use is-json instead.

Validate tool calls for agent workflows

If your n8n setup uses an OpenAI-compatible agent that should call tools before continuing, validate that Promptfoo sees a real tool call and that it matches your schema.

promptfooconfig.yaml
prompts:
  - file://./prompts/n8n-calendar-agent.txt

providers:
  - id: openai:gpt-5-mini
    config:
      tools: file://./tools/calendar-tools.yaml

tests:
  - vars:
      user_request: "Move tomorrow's standup to 3pm and notify the team"
    assert:
      - type: finish-reason
        value: tool_calls
      - type: is-valid-openai-tools-call

That pattern is especially useful when your n8n workflow branches on whether the LLM produced a tool invocation versus a final answer.

Useful building blocks

/docs/configuration/tools for defining tool schemas
/docs/guides/evaluate-json for JSON and schema assertions
examples/openai-tools-call for a concrete OpenAI tool-calling config
examples/eval-tool-use for finish-reason and tool-use checks across providers

Advanced patterns

Run different configs in parallel

Make the first Execute Command node loop over an array of model IDs or config files and push each run as a separate item. Downstream nodes will automatically fan‑out and handle each result independently.

Version‑controlled prompts

Mount your prompts directory and config file into the container at /data. When you commit new prompts to Git, your CI/CD system can call the n8n REST API or a Webhook trigger to re‑evaluate immediately.

Auto‑fail the whole workflow

If you run n8n headless via n8n start --tunnel, you can call this workflow from CI pipelines (GitHub Actions, GitLab, …) with the CLI n8n execute command and then check the HTTP response code; returning exit 1 from the Execute Command node will propagate the failure.

Security & best practices

Keep API keys secret – store them in the n8n credential store or inject as environment variables from Docker secrets, not hard‑coded in workflows.
Resource usage – Promptfoo supports caching via PROMPTFOO_CACHE_PATH; mount that directory to persist across runs.
Timeouts – wrap promptfoo eval with timeout --signal=SIGKILL 15m … (Linux) if you need hard execution limits.
Logging – route the stderr field of Execute Command to a dedicated log channel so you don’t miss stack traces.

Troubleshooting

Symptom	Likely cause / fix
`Execute Command node not available`	You’re on n8n Cloud; switch to self‑hosted.
`promptfoo: command not found`	Promptfoo not installed inside the container. Rebuild your Docker image or add an install step.
Run fails with `ENOENT` on config paths	Make sure the prompts/config volume is mounted at the same path you reference in the command.
Large evals time‑out	Increase the node’s “Timeout (s)” setting or chunk your test cases and iterate inside the workflow.

Next steps

Combine Promptfoo with the n8n AI Transform node to chain evaluations into multi‑step RAG pipelines.
Use n8n Insights (self‑hosted EE) to monitor historical pass‑rates and surface regressions.
Check out the other CI integrations (GitHub Actions, CircleCI, etc) for inspiration.

Happy automating!

Prerequisites​

Shipping a custom Docker image (recommended)​

Basic “Run & Alert” workflow​

Execute Command node configuration​

Sample “Parse & alert” snippet (Code node, TypeScript)​

Evaluating n8n AI Agent prompts and outputs​

Validate JSON that downstream n8n nodes consume​

Validate tool calls for agent workflows​

Useful building blocks​

Advanced patterns​

Run different configs in parallel​

Version‑controlled prompts​

Auto‑fail the whole workflow​

Security & best practices​

Troubleshooting​

Next steps​