Using Promptfoo in n8n Workflows

This guide shows how to run Promptfoo evaluations from an n8n workflow so you can:

schedule nightly or ad‑hoc LLM tests,
gate downstream steps (Slack/Teams alerts, merge approvals, etc.) on pass‑rates, and
publish rich results links generated by Promptfoo.

Prerequisites

What	Why
Self‑hosted n8n ≥ v1 (Docker or bare‑metal)	Gives access to the “Execute Command” node.
Promptfoo CLI available in the container/host	Needed to run `promptfoo eval`.
(Optional) LLM provider API keys set as environment variables or n8n credentials	Example: `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, …
(Optional) Slack / email / GitHub nodes in the same workflow	For notifications or comments once the eval finishes.

Shipping a custom Docker image (recommended)

The easiest way is to bake Promptfoo into your n8n image so every workflow run already has the CLI:

# Dockerfile
FROM n8nio/n8n:latest          # or a fixed tag
USER root                      # gain perms to install packages
RUN npm install -g promptfoo   # installs CLI system‑wide
USER node                      # drop back to non‑root

Update docker‑compose.yml:

services:
  n8n:
    build: .
    env_file: .env # where your OPENAI_API_KEY lives
    volumes:
      - ./data:/data # prompts & configs live here

If you prefer not to rebuild the image you can install Promptfoo on the fly inside the Execute Command node, but that adds 10‑15 s to every execution.

Basic “Run & Alert” workflow

Below is the minimal pattern most teams start with:

#	Node	Purpose
1	Trigger (Cron or Webhook)	Decide when to evaluate (nightly, on Git push webhook …).
2	Execute Command	Runs Promptfoo and emits raw stdout / stderr.
3	Code / Set node	Parses the resulting JSON, extracts pass/fail counts & share‑URL.
4	IF node	Branches on “failures > 0”.
5	Slack / Email / GitHub	Sends alert or PR comment when the gate fails.

Execute Command node configuration

promptfoo eval \
 -c /data/promptfooconfig.yaml \
 --prompts "/data/prompts/**/*.json" \
 --output /tmp/pf-results.json \
 --share --fail-on-error
cat /tmp/pf-results.json

Set the working directory to /data (mount it with Docker volume) and set it to execute once (one run per trigger).

The node writes a machine‑readable results file and prints it to stdout, so the next node can simply JSON.parse($json["stdout"]).

info

The Execute Command node that we rely on is only available in self‑hosted n8n. n8n Cloud does not expose it yet.

Sample “Parse & alert” snippet (Code node, TypeScript)

// Input: raw JSON string from previous node
const output = JSON.parse(items[0].json.stdout as string);

const { successes, failures } = output.results.stats;
items[0].json.passRate = successes / (successes + failures);
items[0].json.failures = failures;
items[0].json.shareUrl = output.shareableUrl;

return items;

An IF node can then route execution:

failures = 0 → take green path (maybe just archive the results).
failures > 0 → post to Slack or comment on the pull request.

Advanced patterns

Run different configs in parallel

Make the first Execute Command node loop over an array of model IDs or config files and push each run as a separate item. Downstream nodes will automatically fan‑out and handle each result independently.

Version‑controlled prompts

Mount your prompts directory and config file into the container at /data. When you commit new prompts to Git, your CI/CD system can call the n8n REST API or a Webhook trigger to re‑evaluate immediately.

Auto‑fail the whole workflow

If you run n8n headless via n8n start --tunnel, you can call this workflow from CI pipelines (GitHub Actions, GitLab, …) with the CLI n8n execute command and then check the HTTP response code; returning exit 1 from the Execute Command node will propagate the failure.

Security & best practices

Keep API keys secret – store them in the n8n credential store or inject as environment variables from Docker secrets, not hard‑coded in workflows.
Resource usage – Promptfoo supports caching via PROMPTFOO_CACHE_PATH; mount that directory to persist across runs.
Timeouts – wrap promptfoo eval with timeout --signal=SIGKILL 15m … (Linux) if you need hard execution limits.
Logging – route the stderr field of Execute Command to a dedicated log channel so you don’t miss stack traces.

Troubleshooting

Symptom	Likely cause / fix
`Execute Command node not available`	You’re on n8n Cloud; switch to self‑hosted.
`promptfoo: command not found`	Promptfoo not installed inside the container. Rebuild your Docker image or add an install step.
Run fails with `ENOENT` on config paths	Make sure the prompts/config volume is mounted at the same path you reference in the command.
Large evals time‑out	Increase the node’s “Timeout (s)” setting or chunk your test cases and iterate inside the workflow.

Next steps

Combine Promptfoo with the n8n AI Transform node to chain evaluations into multi‑step RAG pipelines.
Use n8n Insights (self‑hosted EE) to monitor historical pass‑rates and surface regressions.
Check out the other CI integrations (GitHub Actions, CircleCI, etc) for inspiration.

Happy automating!

Prerequisites​

Shipping a custom Docker image (recommended)​

Basic “Run & Alert” workflow​

Execute Command node configuration​

Sample “Parse & alert” snippet (Code node, TypeScript)​

Advanced patterns​

Run different configs in parallel​

Version‑controlled prompts​

Auto‑fail the whole workflow​

Security & best practices​

Troubleshooting​

Next steps​