Python
Use Python for promptfoo evals - providers, assertions, test generators, and prompts. Integrates with LangChain, LangGraph, CrewAI, and more.
AWS CodeCommit
Run promptfoo from AWS CodeCommit-backed CodeBuild pipelines, store results as build artifacts, and optionally post scan summaries to CodeCommit pull requests.
Model Context Protocol (MCP)
Enable Model Context Protocol (MCP) integration for enhanced tool use, persistent memory, and agentic workflows across providers
MCP Server
Deploy promptfoo as Model Context Protocol server enabling external AI agents to access evaluation and red teaming capabilities
Splunk
Send Promptfoo red team findings to Splunk through HTTP Event Collector for SIEM alerting, dashboards, and incident workflows.
Agent Skills
Install Promptfoo agent skills for eval writing and Codex red-team workflows, including provider setup, focused security configs, and scan result triage.
Azure Pipelines
Integrate promptfoo LLM testing with Azure Pipelines CI/CD using step-by-step setup, environment variables, and matrix testing configurations for automated AI evaluation
Bitbucket Pipelines
Integrate promptfoo LLM testing with Bitbucket Pipelines CI/CD to automate evaluations, track results, and catch regressions in your AI models using built-in assertions
Burp Suite
Test LLM applications for jailbreak vulnerabilities by integrating Promptfoo's red teaming capabilities with Burp Suite Intruder for automated security scanning and testing
CI/CD
Automate LLM testing in CI/CD pipelines with GitHub Actions, GitLab CI, and Jenkins for continuous security and quality checks
CircleCI
Automate LLM testing in CircleCI pipelines with promptfoo. Configure caching, API keys, and evaluation workflows to validate prompts and models in CI/CD environments.
GitHub Actions
Automate LLM prompt testing in CI/CD with GitHub Actions integration. Compare prompt changes, view diffs, and analyze results directly in pull requests using promptfoo.
GitLab CI
Automate LLM testing in GitLab CI pipelines with Promptfoo. Configure caching, API keys, and evaluation workflows to validate prompts and models in your CI/CD process.
Google Sheets
Import and export LLM test cases with Google Sheets integration. Configure public or authenticated access, write evaluation results, and run model-graded metrics.
Helicone
Monitor and optimize LLM testing with Helicone integration in Promptfoo. Track usage, costs, and latency while managing prompts through an open-source observability platform.
Jenkins
Integrate Promptfoo's LLM testing into Jenkins pipelines with automated evaluation, credential management, and CI/CD workflows for production AI deployments
Jest & Vitest
Integrate LLM testing into Jest and Vitest workflows with custom matchers for semantic similarity, factuality checks, and automated prompt quality validation
Langfuse
Integrate Langfuse prompts with Promptfoo for LLM testing. Configure version control, labels, and collaborative prompt management using environment variables and SDK setup.
Looper
Automate LLM testing in CI/CD by integrating Promptfoo with Looper workflows. Configure quality gates, caching, and multi-environment evaluations for production AI pipelines.
Mocha/Chai
Integrate Promptfoo LLM testing with Mocha and Chai to automate prompt quality checks using semantic similarity, factuality, and custom assertions in your test suite.
n8n
Learn how to integrate Promptfoo's LLM evaluation into your n8n workflows for automated testing, security and quality gates, and result sharing
Portkey AI
Integrate Portkey AI gateway with promptfoo for LLM testing, including prompt management, observability, and custom configurations with OpenAI models and APIs.
SharePoint
Import LLM test cases from Microsoft SharePoint. Configure certificate-based authentication and load test data from SharePoint CSV files.
SonarQube
Integrate Promptfoo security findings into SonarQube for centralized vulnerability tracking and CI/CD quality gates
Travis CI
Set up automated LLM testing in Travis CI pipelines with promptfoo. Configure environment variables, artifacts storage, and continuous evaluation of AI prompts and outputs.