Skip to main content

Using Promptfoo in n8n Workflows

This guide shows how to run Promptfoo evaluations from an n8n workflow so you can:

  • schedule nightly or ad‑hoc LLM tests,
  • gate downstream steps (Slack/Teams alerts, merge approvals, etc.) on pass‑rates, and
  • publish rich results links generated by Promptfoo.

Prerequisites​

WhatWhy
Self‑hosted n8n ≥ v1 (Docker or bare‑metal)Gives access to the “Execute Command” node.
Promptfoo CLI available in the container/hostNeeded to run promptfoo eval.
(Optional) LLM provider API keys set as environment variables or n8n credentialsExample: OPENAI_API_KEY, ANTHROPIC_API_KEY, …
(Optional) Slack / email / GitHub nodes in the same workflowFor notifications or comments once the eval finishes.

The easiest way is to bake Promptfoo into your n8n image so every workflow run already has the CLI:

# Dockerfile
FROM n8nio/n8n:latest # or a fixed tag
USER root # gain perms to install packages
RUN npm install -g promptfoo # installs CLI system‑wide
USER node # drop back to non‑root

Update docker‑compose.yml:

services:
n8n:
build: .
env_file: .env # where your OPENAI_API_KEY lives
volumes:
- ./data:/data # prompts & configs live here

If you prefer not to rebuild the image you can install Promptfoo on the fly inside the Execute Command node, but that adds 10‑15 s to every execution.

Basic “Run & Alert” workflow​

Below is the minimal pattern most teams start with:

#NodePurpose
1Trigger (Cron or Webhook)Decide when to evaluate (nightly, on Git push webhook …).
2Execute CommandRuns Promptfoo and emits raw stdout / stderr.
3Code / Set nodeParses the resulting JSON, extracts pass/fail counts & share‑URL.
4IF nodeBranches on “failures > 0”.
5Slack / Email / GitHubSends alert or PR comment when the gate fails.

Execute Command node configuration​

promptfoo eval \
-c /data/promptfooconfig.yaml \
--prompts "/data/prompts/**/*.json" \
--output /tmp/pf-results.json \
--share --fail-on-error
cat /tmp/pf-results.json

Set the working directory to /data (mount it with Docker volume) and set it to execute once (one run per trigger).

The node writes a machine‑readable results file and prints it to stdout, so the next node can simply JSON.parse($json["stdout"]).

info

The Execute Command node that we rely on is only available in self‑hosted n8n. n8n Cloud does not expose it yet.

Sample “Parse & alert” snippet (Code node, TypeScript)​

// Input: raw JSON string from previous node
const output = JSON.parse(items[0].json.stdout as string);

const { successes, failures } = output.results.stats;
items[0].json.passRate = successes / (successes + failures);
items[0].json.failures = failures;
items[0].json.shareUrl = output.shareableUrl;

return items;

An IF node can then route execution:

  • failures = 0 → take green path (maybe just archive the results).
  • failures > 0 → post to Slack or comment on the pull request.

Advanced patterns​

Run different configs in parallel​

Make the first Execute Command node loop over an array of model IDs or config files and push each run as a separate item. Downstream nodes will automatically fan‑out and handle each result independently.

Version‑controlled prompts​

Mount your prompts directory and config file into the container at /data. When you commit new prompts to Git, your CI/CD system can call the n8n REST API or a Webhook trigger to re‑evaluate immediately.

Auto‑fail the whole workflow​

If you run n8n headless via n8n start --tunnel, you can call this workflow from CI pipelines (GitHub Actions, GitLab, …) with the CLI n8n execute command and then check the HTTP response code; returning exit 1 from the Execute Command node will propagate the failure.

Security & best practices​

  • Keep API keys secret – store them in the n8n credential store or inject as environment variables from Docker secrets, not hard‑coded in workflows.
  • Resource usage – Promptfoo supports caching via PROMPTFOO_CACHE_PATH; mount that directory to persist across runs.
  • Timeouts – wrap promptfoo eval with timeout --signal=SIGKILL 15m … (Linux) if you need hard execution limits.
  • Logging – route the stderr field of Execute Command to a dedicated log channel so you don’t miss stack traces.

Troubleshooting​

SymptomLikely cause / fix
Execute Command node not availableYou’re on n8n Cloud; switch to self‑hosted.
promptfoo: command not foundPromptfoo not installed inside the container. Rebuild your Docker image or add an install step.
Run fails with ENOENT on config pathsMake sure the prompts/config volume is mounted at the same path you reference in the command.
Large evals time‑outIncrease the node’s “Timeout (s)” setting or chunk your test cases and iterate inside the workflow.

Next steps​

  1. Combine Promptfoo with the n8n AI Transform node to chain evaluations into multi‑step RAG pipelines.
  2. Use n8n Insights (self‑hosted EE) to monitor historical pass‑rates and surface regressions.
  3. Check out the other CI integrations (GitHub Actions, CircleCI, etc) for inspiration.

Happy automating!