Setting up Promptfoo with Looper
This guide shows you how to integrate Promptfoo evaluations into a Looper CI/CD workflow so that every pull‑request (and optional nightly job) automatically runs your prompt tests.
Prerequisites
- A working Looperinstallation with workflow execution enabled
- A build image (or declared tools) that provides Node 22+ and jq 1.6+
promptfooconfig.yaml
and your prompt fixtures (prompts/**/*.json
) committed to the repository
Create .looper.yml
Add the following file to the root of your repo:
language: workflow # optional but common
tools:
nodejs: 22 # Looper provisions Node.js
jq: 1.7
envs:
global:
variables:
PROMPTFOO_CACHE_PATH: "${HOME}/.promptfoo/cache"
triggers:
- pr # run on every pull‑request
- manual: "Nightly Prompt Tests" # manual button in UI
call: nightly # invokes the nightly flow below
flows:
# ---------- default PR flow ----------
default:
- (name Install Promptfoo) npm install -g promptfoo
- (name Evaluate Prompts) |
promptfoo eval \
-c promptfooconfig.yaml \
--prompts "prompts/**/*.json" \
--share \
-o output.json
- (name Quality gate) |
SUCC=$(jq -r '.results.stats.successes' output.json)
FAIL=$(jq -r '.results.stats.failures' output.json)
echo "✅ $SUCC ❌ $FAIL"
test "$FAIL" -eq 0 # non‑zero exit fails the build
# ---------- nightly scheduled flow ----------
nightly:
- call: default # reuse the logic above
- (name Upload artefacts) |
echo "TODO: push output.json to S3 files"
How it works
Section | Purpose |
---|---|
tools | Declares tool versions Looper should provision. |
envs.global.variables | Environment variables available to every step. |
triggers | Determines when the workflow runs (pr , manual , cron , etc.). |
flows | Ordered shell commands; execution stops on the first non‑zero exit. |
Caching Promptfoo results
Looper lacks a first‑class cache API. Two common approaches:
- Persistent volume – mount
${HOME}/.promptfoo/cache
on a reusable volume. - Persistence tasks – pull/push the cache at the start and end of the flow:
Setting quality thresholds
- (name Pass‑rate gate) |
TOTAL=$(jq '.results.stats.successes + .results.stats.failures' output.json)
PASS=$(jq '.results.stats.successes' output.json)
RATE=$(echo "scale=2; 100*$PASS/$TOTAL" | bc)
echo "Pass rate: $RATE%"
test $(echo "$RATE >= 95" | bc) -eq 1 # fail if <95 %
Multi‑environment evaluations
Evaluate both staging and production configs and compare failures:
flows:
compare-envs:
- (name Eval‑prod) |
promptfoo eval \
-c promptfooconfig.prod.yaml \
--prompts "prompts/**/*.json" \
-o output-prod.json
- (name Eval‑staging) |
promptfoo eval \
-c promptfooconfig.staging.yaml \
--prompts "prompts/**/*.json" \
-o output-staging.json
- (name Compare) |
PROD_FAIL=$(jq '.results.stats.failures' output-prod.json)
STAGE_FAIL=$(jq '.results.stats.failures' output-staging.json)
if [ "$STAGE_FAIL" -gt "$PROD_FAIL" ]; then
echo "⚠️ Staging has more failures than production!"
fi
Posting evaluation results to GitHub/GitLab
In order to send evaluation results elsewhere, use:
- GitHub task
- github --add-comment \
--repository "$CI_REPOSITORY" \
--issue "$PR_NUMBER" \
--body "$(cat comment.md)" # set comment as appropriate - cURL with a Personal Access Token (PAT) against the REST API.
Troubleshooting
Problem | Remedy |
---|---|
npm: command not found | Add nodejs: under tools or use an image with Node pre‑installed. |
Cache not restored | Verify the path and that the files pull task succeeds. |
Long‑running jobs | Split prompt sets into separate flows or raise timeoutMillis in the build definition. |
API rate limits | Enable Promptfoo cache and/or rotate API keys. |
Best practices
- Incremental testing – feed
looper diff --name-only prompts/
intopromptfoo eval
to test only changed prompts. - Semantic version tags – tag prompt sets/configs so you can roll back easily.
- Secret management – store API keys in a secret store and inject them as environment variables.
- Reusable library flows – if multiple repos need the same evaluation, host the flow definition in a central repo and
import
it.