Skip to main content

Azure Pipelines Integration

This guide demonstrates how to set up promptfoo with Azure Pipelines to run evaluations as part of your CI pipeline.

Prerequisites​

  • A GitHub or Azure DevOps repository with a promptfoo project
  • An Azure DevOps account with permission to create pipelines
  • API keys for your LLM providers stored as Azure Pipeline variables

Setting up the Azure Pipeline​

Create a new file named azure-pipelines.yml in the root of your repository with the following configuration:

trigger:
- main
- master # Include if you use master as your main branch

pool:
vmImage: 'ubuntu-latest'

variables:
npm_config_cache: $(Pipeline.Workspace)/.npm

steps:
- task: NodeTool@0
inputs:
versionSpec: '18.x'
displayName: 'Install Node.js'

- task: Cache@2
inputs:
key: 'npm | "$(Agent.OS)" | package-lock.json'
restoreKeys: |
npm | "$(Agent.OS)"
path: $(npm_config_cache)
displayName: 'Cache npm packages'

- script: |
npm ci
npm install -g promptfoo
displayName: 'Install dependencies'

- script: |
npx promptfoo eval
displayName: 'Run promptfoo evaluations'
env:
OPENAI_API_KEY: $(OPENAI_API_KEY)
ANTHROPIC_API_KEY: $(ANTHROPIC_API_KEY)
# Add other API keys as needed

- task: PublishTestResults@2
inputs:
testResultsFormat: 'JUnit'
testResultsFiles: 'promptfoo-results.xml'
mergeTestResults: true
testRunTitle: 'Promptfoo Evaluation Results'
condition: succeededOrFailed()
displayName: 'Publish test results'

- task: PublishBuildArtifacts@1
inputs:
pathtoPublish: 'promptfoo-results.json'
artifactName: 'promptfoo-results'
condition: succeededOrFailed()
displayName: 'Publish evaluation results'

Environment Variables​

Store your LLM provider API keys as secret pipeline variables in Azure DevOps:

  1. Navigate to your project in Azure DevOps
  2. Go to Pipelines > Your Pipeline > Edit > Variables
  3. Add variables for each provider API key (e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY)
  4. Mark them as secret to ensure they're not displayed in logs

Advanced Configuration​

Fail the Pipeline on Failed Assertions​

You can configure the pipeline to fail when promptfoo assertions don't pass by modifying the script step:

- script: |
npx promptfoo eval --fail-on-error
displayName: 'Run promptfoo evaluations'
env:
OPENAI_API_KEY: $(OPENAI_API_KEY)

Configure Custom Output Location​

If you want to customize where results are stored:

- script: |
npx promptfoo eval --output-path $(Build.ArtifactStagingDirectory)/promptfoo-results.json
displayName: 'Run promptfoo evaluations'

Run on Pull Requests​

To run evaluations on pull requests, add a PR trigger:

trigger:
- main
- master

pr:
- main
- master
# Rest of pipeline configuration

Conditional Execution​

Run promptfoo only when certain files change:

steps:
- task: NodeTool@0
inputs:
versionSpec: '18.x'
displayName: 'Install Node.js'

- script: |
npm ci
npm install -g promptfoo
displayName: 'Install dependencies'

- script: |
npx promptfoo eval
displayName: 'Run promptfoo evaluations'
condition: |
and(
succeeded(),
or(
eq(variables['Build.SourceBranch'], 'refs/heads/main'),
eq(variables['Build.Reason'], 'PullRequest')
),
or(
eq(variables['Build.Reason'], 'PullRequest'),
contains(variables['Build.SourceVersionMessage'], '[run-eval]')
)
)
env:
OPENAI_API_KEY: $(OPENAI_API_KEY)

Using with Matrix Testing​

Test across multiple configurations or models in parallel:

strategy:
matrix:
gpt4:
MODEL: 'gpt-4'
claude:
MODEL: 'claude-3-opus-20240229'

steps:
- script: |
npx promptfoo eval --providers.0.config.model=$(MODEL)
displayName: 'Test with $(MODEL)'
env:
OPENAI_API_KEY: $(OPENAI_API_KEY)
ANTHROPIC_API_KEY: $(ANTHROPIC_API_KEY)

Troubleshooting​

If you encounter issues with your Azure Pipelines integration:

  • Check logs: Review detailed logs in Azure DevOps to identify errors
  • Verify API keys: Ensure your API keys are correctly set as pipeline variables
  • Permissions: Make sure the pipeline has access to read your configuration files
  • Node.js version: Promptfoo requires Node.js >= 18.0.0

If you're getting timeouts during evaluations, you may need to adjust the pipeline timeout settings or consider using a self-hosted agent for better stability with long-running evaluations.