Bitbucket Pipelines Integration
This guide demonstrates how to set up promptfoo with Bitbucket Pipelines to run evaluations as part of your CI pipeline.
Prerequisites
- A Bitbucket repository with a promptfoo project
- Bitbucket Pipelines enabled for your repository
- API keys for your LLM providers stored as Bitbucket repository variables
Setting up Bitbucket Pipelines
Create a new file named bitbucket-pipelines.yml
in the root of your repository with the following configuration:
image: node:18
pipelines:
default:
- step:
name: Promptfoo Evaluation
caches:
- node
script:
- npm ci
- npm install -g promptfoo
- npx promptfoo eval
artifacts:
- promptfoo-results.json
- promptfoo-results.xml
Environment Variables
Store your LLM provider API keys as repository variables in Bitbucket:
- Navigate to your repository in Bitbucket
- Go to Repository settings > Pipelines > Repository variables
- Add variables for each provider API key (e.g.,
OPENAI_API_KEY
,ANTHROPIC_API_KEY
) - Mark them as "Secured" to ensure they're not displayed in logs
Advanced Configuration
Fail the Pipeline on Failed Assertions
You can configure the pipeline to fail when promptfoo assertions don't pass:
script:
- npm ci
- npm install -g promptfoo
- npx promptfoo eval --fail-on-error
Custom Evaluation Configurations
Run evaluations with specific configuration files:
script:
- npm ci
- npm install -g promptfoo
- npx promptfoo eval --config custom-config.yaml
Run on Pull Requests
Configure different behavior for pull requests:
pipelines:
default:
- step:
name: Promptfoo Evaluation
script:
- npm ci
- npm install -g promptfoo
- npx promptfoo eval
pull-requests:
'**':
- step:
name: Promptfoo PR Evaluation
script:
- npm ci
- npm install -g promptfoo
- npx promptfoo eval --fail-on-error
Scheduled Evaluations
Run evaluations on a schedule:
pipelines:
default:
- step:
name: Promptfoo Evaluation
script:
- npm ci
- npm install -g promptfoo
- npx promptfoo eval
custom:
nightly-evaluation:
- step:
name: Nightly Evaluation
script:
- npm ci
- npm install -g promptfoo
- npx promptfoo eval
schedules:
- cron: '0 0 * * *' # Run at midnight UTC every day
pipeline: custom.nightly-evaluation
branches:
include:
- main
Parallel Testing
Test across multiple configurations in parallel:
image: node:18
pipelines:
default:
- parallel:
- step:
name: Evaluate with GPT-4
script:
- npm ci
- npm install -g promptfoo
- npx promptfoo eval --providers.0.config.model=gpt-4
artifacts:
- promptfoo-results-gpt4.json
- step:
name: Evaluate with Claude
script:
- npm ci
- npm install -g promptfoo
- npx promptfoo eval --providers.0.config.model=claude-3-opus-20240229
artifacts:
- promptfoo-results-claude.json
Using Pipes
Leverage Bitbucket Pipes for a more concise configuration:
image: node:18
pipelines:
default:
- step:
name: Promptfoo Evaluation
script:
- npm ci
- npm install -g promptfoo
- npx promptfoo eval
after-script:
- pipe: atlassian/junit-report:0.3.0
variables:
REPORT_PATHS: 'promptfoo-results.xml'
Troubleshooting
If you encounter issues with your Bitbucket Pipelines integration:
- Check logs: Review detailed logs in Bitbucket to identify errors
- Verify repository variables: Ensure your API keys are correctly set
- Pipeline timeouts: Bitbucket Pipelines has timeout limits. For long-running evaluations, consider breaking them down or increasing the timeout
- Debug with SSH: For complex issues, use enabling SSH access to debug the pipeline environment directly