Azure Pipelines Integration
This guide demonstrates how to set up promptfoo with Azure Pipelines to run evaluations as part of your CI pipeline.
Prerequisites
- A GitHub or Azure DevOps repository with a promptfoo project
- An Azure DevOps account with permission to create pipelines
- API keys for your LLM providers stored as Azure Pipeline variables
Setting up the Azure Pipeline
Create a new file named azure-pipelines.yml
in the root of your repository with the following configuration:
trigger:
- main
- master # Include if you use master as your main branch
pool:
vmImage: 'ubuntu-latest'
variables:
npm_config_cache: $(Pipeline.Workspace)/.npm
steps:
- task: NodeTool@0
inputs:
versionSpec: '18.x'
displayName: 'Install Node.js'
- task: Cache@2
inputs:
key: 'npm | "$(Agent.OS)" | package-lock.json'
restoreKeys: |
npm | "$(Agent.OS)"
path: $(npm_config_cache)
displayName: 'Cache npm packages'
- script: |
npm ci
npm install -g promptfoo
displayName: 'Install dependencies'
- script: |
npx promptfoo eval
displayName: 'Run promptfoo evaluations'
env:
OPENAI_API_KEY: $(OPENAI_API_KEY)
ANTHROPIC_API_KEY: $(ANTHROPIC_API_KEY)
# Add other API keys as needed
- task: PublishTestResults@2
inputs:
testResultsFormat: 'JUnit'
testResultsFiles: 'promptfoo-results.xml'
mergeTestResults: true
testRunTitle: 'Promptfoo Evaluation Results'
condition: succeededOrFailed()
displayName: 'Publish test results'
- task: PublishBuildArtifacts@1
inputs:
pathtoPublish: 'promptfoo-results.json'
artifactName: 'promptfoo-results'
condition: succeededOrFailed()
displayName: 'Publish evaluation results'
Environment Variables
Store your LLM provider API keys as secret pipeline variables in Azure DevOps:
- Navigate to your project in Azure DevOps
- Go to Pipelines > Your Pipeline > Edit > Variables
- Add variables for each provider API key (e.g.,
OPENAI_API_KEY
,ANTHROPIC_API_KEY
) - Mark them as secret to ensure they're not displayed in logs
Advanced Configuration
Fail the Pipeline on Failed Assertions
You can configure the pipeline to fail when promptfoo assertions don't pass by modifying the script step:
- script: |
npx promptfoo eval --fail-on-error
displayName: 'Run promptfoo evaluations'
env:
OPENAI_API_KEY: $(OPENAI_API_KEY)
Configure Custom Output Location
If you want to customize where results are stored:
- script: |
npx promptfoo eval --output-path $(Build.ArtifactStagingDirectory)/promptfoo-results.json
displayName: 'Run promptfoo evaluations'
Run on Pull Requests
To run evaluations on pull requests, add a PR trigger:
trigger:
- main
- master
pr:
- main
- master
# Rest of pipeline configuration
Conditional Execution
Run promptfoo only when certain files change:
steps:
- task: NodeTool@0
inputs:
versionSpec: '18.x'
displayName: 'Install Node.js'
- script: |
npm ci
npm install -g promptfoo
displayName: 'Install dependencies'
- script: |
npx promptfoo eval
displayName: 'Run promptfoo evaluations'
condition: |
and(
succeeded(),
or(
eq(variables['Build.SourceBranch'], 'refs/heads/main'),
eq(variables['Build.Reason'], 'PullRequest')
),
or(
eq(variables['Build.Reason'], 'PullRequest'),
contains(variables['Build.SourceVersionMessage'], '[run-eval]')
)
)
env:
OPENAI_API_KEY: $(OPENAI_API_KEY)
Using with Matrix Testing
Test across multiple configurations or models in parallel:
strategy:
matrix:
gpt4:
MODEL: 'gpt-4'
claude:
MODEL: 'claude-3-opus-20240229'
steps:
- script: |
npx promptfoo eval --providers.0.config.model=$(MODEL)
displayName: 'Test with $(MODEL)'
env:
OPENAI_API_KEY: $(OPENAI_API_KEY)
ANTHROPIC_API_KEY: $(ANTHROPIC_API_KEY)
Troubleshooting
If you encounter issues with your Azure Pipelines integration:
- Check logs: Review detailed logs in Azure DevOps to identify errors
- Verify API keys: Ensure your API keys are correctly set as pipeline variables
- Permissions: Make sure the pipeline has access to read your configuration files
- Node.js version: Promptfoo requires Node.js >= 18.0.0
If you're getting timeouts during evaluations, you may need to adjust the pipeline timeout settings or consider using a self-hosted agent for better stability with long-running evaluations.