Skip to main content

Bitbucket Pipelines Integration

This guide demonstrates how to set up promptfoo with Bitbucket Pipelines to run evaluations as part of your CI pipeline.

Prerequisites​

  • A Bitbucket repository with a promptfoo project
  • Bitbucket Pipelines enabled for your repository
  • API keys for your LLM providers stored as Bitbucket repository variables

Setting up Bitbucket Pipelines​

Create a new file named bitbucket-pipelines.yml in the root of your repository with the following configuration:

image: node:18

pipelines:
default:
- step:
name: Promptfoo Evaluation
caches:
- node
script:
- npm ci
- npm install -g promptfoo
- npx promptfoo eval
artifacts:
- promptfoo-results.json
- promptfoo-results.xml

Environment Variables​

Store your LLM provider API keys as repository variables in Bitbucket:

  1. Navigate to your repository in Bitbucket
  2. Go to Repository settings > Pipelines > Repository variables
  3. Add variables for each provider API key (e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY)
  4. Mark them as "Secured" to ensure they're not displayed in logs

Advanced Configuration​

Fail the Pipeline on Failed Assertions​

You can configure the pipeline to fail when promptfoo assertions don't pass:

script:
- npm ci
- npm install -g promptfoo
- npx promptfoo eval --fail-on-error

Custom Evaluation Configurations​

Run evaluations with specific configuration files:

script:
- npm ci
- npm install -g promptfoo
- npx promptfoo eval --config custom-config.yaml

Run on Pull Requests​

Configure different behavior for pull requests:

pipelines:
default:
- step:
name: Promptfoo Evaluation
script:
- npm ci
- npm install -g promptfoo
- npx promptfoo eval
pull-requests:
'**':
- step:
name: Promptfoo PR Evaluation
script:
- npm ci
- npm install -g promptfoo
- npx promptfoo eval --fail-on-error

Scheduled Evaluations​

Run evaluations on a schedule:

pipelines:
default:
- step:
name: Promptfoo Evaluation
script:
- npm ci
- npm install -g promptfoo
- npx promptfoo eval
custom:
nightly-evaluation:
- step:
name: Nightly Evaluation
script:
- npm ci
- npm install -g promptfoo
- npx promptfoo eval
schedules:
- cron: '0 0 * * *' # Run at midnight UTC every day
pipeline: custom.nightly-evaluation
branches:
include:
- main

Parallel Testing​

Test across multiple configurations in parallel:

image: node:18

pipelines:
default:
- parallel:
- step:
name: Evaluate with GPT-4
script:
- npm ci
- npm install -g promptfoo
- npx promptfoo eval --providers.0.config.model=gpt-4
artifacts:
- promptfoo-results-gpt4.json
- step:
name: Evaluate with Claude
script:
- npm ci
- npm install -g promptfoo
- npx promptfoo eval --providers.0.config.model=claude-3-opus-20240229
artifacts:
- promptfoo-results-claude.json

Using Pipes​

Leverage Bitbucket Pipes for a more concise configuration:

image: node:18

pipelines:
default:
- step:
name: Promptfoo Evaluation
script:
- npm ci
- npm install -g promptfoo
- npx promptfoo eval
after-script:
- pipe: atlassian/junit-report:0.3.0
variables:
REPORT_PATHS: 'promptfoo-results.xml'

Troubleshooting​

If you encounter issues with your Bitbucket Pipelines integration:

  • Check logs: Review detailed logs in Bitbucket to identify errors
  • Verify repository variables: Ensure your API keys are correctly set
  • Pipeline timeouts: Bitbucket Pipelines has timeout limits. For long-running evaluations, consider breaking them down or increasing the timeout
  • Debug with SSH: For complex issues, use enabling SSH access to debug the pipeline environment directly