Using Promptfoo in n8n Workflows
This guide shows how to run Promptfoo evaluations from an n8n workflow so you can:
- schedule nightly or ad‑hoc LLM tests,
- gate downstream steps (Slack/Teams alerts, merge approvals, etc.) on pass‑rates, and
- publish rich results links generated by Promptfoo.
Prerequisites
| What | Why |
|---|---|
| Self‑hosted n8n ≥ v1 (Docker or bare‑metal) | Gives access to the “Execute Command” node. |
| Promptfoo CLI available in the container/host | Needed to run promptfoo eval. |
| (Optional) LLM provider API keys set as environment variables or n8n credentials | Example: OPENAI_API_KEY, ANTHROPIC_API_KEY, … |
| (Optional) Slack / email / GitHub nodes in the same workflow | For notifications or comments once the eval finishes. |
Shipping a custom Docker image (recommended)
The easiest way is to bake Promptfoo into your n8n image so every workflow run already has the CLI:
# Dockerfile
FROM n8nio/n8n:latest # or a fixed tag
USER root # gain perms to install packages
RUN npm install -g promptfoo # installs CLI system‑wide
USER node # drop back to non‑root
Update docker‑compose.yml:
services:
n8n:
build: .
env_file: .env # where your OPENAI_API_KEY lives
volumes:
- ./data:/data # prompts & configs live here
If you prefer not to rebuild the image you can install Promptfoo on the fly inside the Execute Command node, but that adds 10‑15 s to every execution.
Basic “Run & Alert” workflow
Below is the minimal pattern most teams start with:
| # | Node | Purpose |
|---|---|---|
| 1 | Trigger (Cron or Webhook) | Decide when to evaluate (nightly, on Git push webhook …). |
| 2 | Execute Command | Runs Promptfoo and emits raw stdout / stderr. |
| 3 | Code / Set node | Parses the resulting JSON, extracts pass/fail counts & share‑URL. |
| 4 | IF node | Branches on “failures > 0”. |
| 5 | Slack / Email / GitHub | Sends alert or PR comment when the gate fails. |
Execute Command node configuration
promptfoo eval \
-c /data/promptfooconfig.yaml \
--prompts "/data/prompts/**/*.json" \
--output /tmp/pf-results.json \
--share --fail-on-error
cat /tmp/pf-results.json
Set the working directory to /data (mount it with Docker volume) and set it to execute once (one run per trigger).
The node writes a machine‑readable results file and prints it to stdout,
so the next node can simply JSON.parse($json["stdout"]).
The Execute Command node that we rely on is only available in self‑hosted n8n. n8n Cloud does not expose it yet.
Sample “Parse & alert” snippet (Code node, TypeScript)
// Input: raw JSON string from previous node
const output = JSON.parse(items[0].json.stdout as string);
const { successes, failures } = output.results.stats;
items[0].json.passRate = successes / (successes + failures);
items[0].json.failures = failures;
items[0].json.shareUrl = output.shareableUrl;
return items;
An IF node can then route execution:
- failures = 0 → take green path (maybe just archive the results).
- failures > 0 → post to Slack or comment on the pull request.
Evaluating n8n AI Agent prompts and outputs
If your goal is to test the prompt inside an n8n AI Agent / OpenAI node (not just run Promptfoo from a workflow), treat the n8n node like any other app contract:
- Put the agent prompt in a file,
- Map incoming n8n fields to
tests.vars, and - Assert on the exact JSON or tool-call shape that downstream n8n nodes expect.
This works well when you want to regression-test an agent before wiring it into a larger workflow.
Validate JSON that downstream n8n nodes consume
If your agent is supposed to emit structured data for a Set, Code, Switch, or HTTP Request node, validate the payload directly.
prompts:
- file://./prompts/n8n-support-router.txt
providers:
- openai:gpt-5-mini
tests:
- vars:
customer_message: 'Customer wants to cancel order #4815 and asks for a refund'
assert:
- type: contains-json
value:
type: object
required: [route, priority, reply]
properties:
route:
type: string
enum: [billing, support, sales]
priority:
type: string
enum: [low, medium, high]
reply:
type: string
Use contains-json when the model may wrap JSON in prose or a markdown code block. If your node must return only JSON, use is-json instead.
Validate tool calls for agent workflows
If your n8n setup uses an OpenAI-compatible agent that should call tools before continuing, validate that Promptfoo sees a real tool call and that it matches your schema.
prompts:
- file://./prompts/n8n-calendar-agent.txt
providers:
- id: openai:gpt-5-mini
config:
tools: file://./tools/calendar-tools.yaml
tests:
- vars:
user_request: "Move tomorrow's standup to 3pm and notify the team"
assert:
- type: finish-reason
value: tool_calls
- type: is-valid-openai-tools-call
That pattern is especially useful when your n8n workflow branches on whether the LLM produced a tool invocation versus a final answer.
Useful building blocks
/docs/configuration/toolsfor defining tool schemas/docs/guides/evaluate-jsonfor JSON and schema assertionsexamples/openai-tools-callfor a concrete OpenAI tool-calling configexamples/eval-tool-usefor finish-reason and tool-use checks across providers
Advanced patterns
Run different configs in parallel
Make the first Execute Command node loop over an array of model IDs or config files and push each run as a separate item. Downstream nodes will automatically fan‑out and handle each result independently.
Version‑controlled prompts
Mount your prompts directory and config file into the container at
/data. When you commit new prompts to Git, your CI/CD system can call the
n8n REST API or a Webhook trigger to re‑evaluate immediately.
Auto‑fail the whole workflow
If you run n8n headless via n8n start --tunnel, you can call this workflow
from CI pipelines (GitHub Actions, GitLab, …) with the CLI n8n execute
command and then check the HTTP
response code; returning exit 1 from the Execute Command node will propagate
the failure.
Security & best practices
- Keep API keys secret – store them in the n8n credential store or inject as environment variables from Docker secrets, not hard‑coded in workflows.
- Resource usage – Promptfoo supports caching via
PROMPTFOO_CACHE_PATH; mount that directory to persist across runs. - Timeouts – wrap
promptfoo evalwithtimeout --signal=SIGKILL 15m …(Linux) if you need hard execution limits. - Logging – route the
stderrfield of Execute Command to a dedicated log channel so you don’t miss stack traces.
Troubleshooting
| Symptom | Likely cause / fix |
|---|---|
Execute Command node not available | You’re on n8n Cloud; switch to self‑hosted. |
promptfoo: command not found | Promptfoo not installed inside the container. Rebuild your Docker image or add an install step. |
Run fails with ENOENT on config paths | Make sure the prompts/config volume is mounted at the same path you reference in the command. |
| Large evals time‑out | Increase the node’s “Timeout (s)” setting or chunk your test cases and iterate inside the workflow. |
Next steps
- Combine Promptfoo with the n8n AI Transform node to chain evaluations into multi‑step RAG pipelines.
- Use n8n Insights (self‑hosted EE) to monitor historical pass‑rates and surface regressions.
- Check out the other CI integrations (GitHub Actions, CircleCI, etc) for inspiration.
Happy automating!