Using Promptfoo in n8n Workflows
This guide shows how to run Promptfoo evaluations from an n8n workflow so you can:
- schedule nightly or ad‑hoc LLM tests,
- gate downstream steps (Slack/Teams alerts, merge approvals, etc.) on pass‑rates, and
- publish rich results links generated by Promptfoo.
Prerequisites
What | Why |
---|---|
Self‑hosted n8n ≥ v1 (Docker or bare‑metal) | Gives access to the “Execute Command” node. |
Promptfoo CLI available in the container/host | Needed to run promptfoo eval . |
(Optional) LLM provider API keys set as environment variables or n8n credentials | Example: OPENAI_API_KEY , ANTHROPIC_API_KEY , … |
(Optional) Slack / email / GitHub nodes in the same workflow | For notifications or comments once the eval finishes. |
Shipping a custom Docker image (recommended)
The easiest way is to bake Promptfoo into your n8n image so every workflow run already has the CLI:
# Dockerfile
FROM n8nio/n8n:latest # or a fixed tag
USER root # gain perms to install packages
RUN npm install -g promptfoo # installs CLI system‑wide
USER node # drop back to non‑root
Update docker‑compose.yml
:
services:
n8n:
build: .
env_file: .env # where your OPENAI_API_KEY lives
volumes:
- ./data:/data # prompts & configs live here
If you prefer not to rebuild the image you can install Promptfoo on the fly inside the Execute Command node, but that adds 10‑15 s to every execution.
Basic “Run & Alert” workflow
Below is the minimal pattern most teams start with:
# | Node | Purpose |
---|---|---|
1 | Trigger (Cron or Webhook) | Decide when to evaluate (nightly, on Git push webhook …). |
2 | Execute Command | Runs Promptfoo and emits raw stdout / stderr. |
3 | Code / Set node | Parses the resulting JSON, extracts pass/fail counts & share‑URL. |
4 | IF node | Branches on “failures > 0”. |
5 | Slack / Email / GitHub | Sends alert or PR comment when the gate fails. |
Execute Command node configuration
promptfoo eval \
-c /data/promptfooconfig.yaml \
--prompts "/data/prompts/**/*.json" \
--output /tmp/pf-results.json \
--share --fail-on-error
cat /tmp/pf-results.json
Set the working directory to /data
(mount it with Docker volume) and set it to execute once (one run per trigger).
The node writes a machine‑readable results file and prints it to stdout,
so the next node can simply JSON.parse($json["stdout"])
.
The Execute Command node that we rely on is only available in self‑hosted n8n. n8n Cloud does not expose it yet.
Sample “Parse & alert” snippet (Code node, TypeScript)
// Input: raw JSON string from previous node
const output = JSON.parse(items[0].json.stdout as string);
const { successes, failures } = output.results.stats;
items[0].json.passRate = successes / (successes + failures);
items[0].json.failures = failures;
items[0].json.shareUrl = output.shareableUrl;
return items;
An IF node can then route execution:
- failures = 0 → take green path (maybe just archive the results).
- failures > 0 → post to Slack or comment on the pull request.
Advanced patterns
Run different configs in parallel
Make the first Execute Command node loop over an array of model IDs or config files and push each run as a separate item. Downstream nodes will automatically fan‑out and handle each result independently.
Version‑controlled prompts
Mount your prompts directory and config file into the container at
/data
. When you commit new prompts to Git, your CI/CD system can call the
n8n REST API or a Webhook trigger to re‑evaluate immediately.
Auto‑fail the whole workflow
If you run n8n headless via n8n start --tunnel
, you can call this workflow
from CI pipelines (GitHub Actions, GitLab, …) with the CLI n8n execute
command and then check the HTTP
response code; returning exit 1
from the Execute Command node will propagate
the failure.
Security & best practices
- Keep API keys secret – store them in the n8n credential store or inject as environment variables from Docker secrets, not hard‑coded in workflows.
- Resource usage – Promptfoo supports caching via
PROMPTFOO_CACHE_PATH
; mount that directory to persist across runs. - Timeouts – wrap
promptfoo eval
withtimeout --signal=SIGKILL 15m …
(Linux) if you need hard execution limits. - Logging – route the
stderr
field of Execute Command to a dedicated log channel so you don’t miss stack traces.
Troubleshooting
Symptom | Likely cause / fix |
---|---|
Execute Command node not available | You’re on n8n Cloud; switch to self‑hosted. |
promptfoo: command not found | Promptfoo not installed inside the container. Rebuild your Docker image or add an install step. |
Run fails with ENOENT on config paths | Make sure the prompts/config volume is mounted at the same path you reference in the command. |
Large evals time‑out | Increase the node’s “Timeout (s)” setting or chunk your test cases and iterate inside the workflow. |
Next steps
- Combine Promptfoo with the n8n AI Transform node to chain evaluations into multi‑step RAG pipelines.
- Use n8n Insights (self‑hosted EE) to monitor historical pass‑rates and surface regressions.
- Check out the other CI integrations (GitHub Actions, CircleCI, etc) for inspiration.
Happy automating!