Skip to main content

Node Module API Reference

This guide documents the Node.js module API for advanced programmatic usage of promptfoo. It covers all publicly exported functions, namespaces, and types.

info

For standard YAML-based evaluation configuration, see Configuration Guide. This API reference is for users building programmatic evaluation workflows directly in JavaScript/TypeScript.

Overview

The promptfoo Node module provides programmatic access to:

  • Core evaluation engine (evaluate())
  • Provider management (load and resolve LLM providers)
  • Assertion execution (run assertions independently)
  • Cache management (control caching behavior)
  • Security guardrails (PII, harm, moderation checks)
  • Adversarial testing (red team evaluation)

Core API

evaluate(testSuite, options?)

The main entry point for running evaluations programmatically.

async function evaluate(testSuite: EvaluateTestSuite, options?: EvaluateOptions): Promise<Eval>;

Parameters:

  • testSuite: Configuration object containing prompts, providers, and tests
  • options: Optional evaluation settings (caching, output, concurrency, etc.)

Returns: Eval record. Call toEvaluateSummary() when you need the serializable results summary.

Example:

import { evaluate } from 'promptfoo';

const evalRecord = await evaluate({
prompts: ['What is 2+2?'],
providers: ['openai:gpt-4', 'anthropic:claude-3-opus'],
tests: [
{
vars: { query: 'math question' },
assert: [
{
type: 'contains',
value: '4',
metric: 'is_correct',
},
],
},
],
});
const results = await evalRecord.toEvaluateSummary();

console.log(`Passed: ${results.stats.successes}/${results.results.length}`);
console.log(`Shareable URL: ${evalRecord.shareableUrl}`);

Common Options:

interface EvaluateOptions {
// Output and format
outputPath?: string | string[];
formatOutput?: boolean;

// Evaluation behavior
maxConcurrency?: number;
nunjucksFilters?: Record<string, Function>;

// Caching
cache?: boolean;

// Sharing and persistence
sharing?: boolean;
writeLatestResults?: boolean;

// Progress tracking
onTestComplete?: (result: EvaluateResult) => void;
}

Provider APIs

loadApiProvider(providerPath, context?)

Load a single provider instance by path or identifier.

async function loadApiProvider(
providerPath: string,
context?: LoadApiProviderContext,
): Promise<ApiProvider>;

Parameters:

  • providerPath: Provider identifier (e.g., 'openai:gpt-4', 'anthropic:claude-3-opus', or 'file://./custom-provider.js')
  • context: Optional context with environment overrides

Returns: Configured ApiProvider instance ready to call

Example:

import { loadApiProvider } from 'promptfoo';

const openaiProvider = await loadApiProvider('openai:gpt-4', {
env: { OPENAI_API_KEY: process.env.MY_SECRET_KEY },
});

const response = await openaiProvider.callApi('Hello, world!');

console.log(response.output);

Supported Provider Types:

  • openai:* (GPT models)
  • anthropic:* (Claude models)
  • azure-openai:*
  • bedrock:* (AWS Bedrock)
  • vertex:* (Google Vertex AI)
  • cohere:*
  • replicate:*
  • huggingface:*
  • Custom files: file://./path/to/provider.js
  • Custom functions via inline JavaScript

loadApiProviders(providers, options?)

Load multiple providers in parallel.

async function loadApiProviders(
providers: ProvidersConfig,
options?: { env?: Record<string, string> },
): Promise<ApiProvider[]>;

Parameters:

  • providers: Array of provider identifiers or config objects
  • options: Optional environment overrides

Returns: Array of configured provider instances

Example:

import { loadApiProviders } from 'promptfoo';

const providers = await loadApiProviders([
'openai:gpt-4',
'anthropic:claude-3-opus',
{
id: 'custom-provider',
config: {
model: 'my-model',
temperature: 0.7,
},
},
]);

for (const provider of providers) {
const result = await provider.callApi('Test');
console.log(`${provider.id()}: ${result.output}`);
}

Assertions API

assertions.runAssertion(params)

Execute a single assertion against provider output. Powerful for custom evaluation logic.

async function runAssertion({
prompt?: string;
provider?: ApiProvider;
assertion: Assertion;
test: AtomicTestCase;
vars?: Record<string, VarValue>;
latencyMs?: number;
providerResponse: ProviderResponse;
traceId?: string;
traceData?: TraceData | null;
}): Promise<GradingResult>

Parameters:

  • assertion: The assertion to run (e.g., { type: 'contains', value: 'expected' })
  • test: The test case context
  • providerResponse: The output from the provider to evaluate
  • vars: Template variables from the test
  • provider: Provider instance (optional, for context)
  • traceData: Distributed trace data for debugging (optional)

Returns:

interface GradingResult {
pass: boolean; // Did the assertion pass?
score?: number; // 0-1 score
reason?: string; // Explanation
assertion: Assertion; // The original assertion
metric?: string; // Metric name
error?: string; // Error message if failed
}

Example: Custom Assertion Logic

import { assertions } from 'promptfoo';

const result = await assertions.runAssertion({
assertion: {
type: 'javascript',
value: (output, context) => {
// Custom grading logic
const score = output.includes('yes') ? 1.0 : 0.0;
return {
pass: score >= 0.8,
score,
reason: `Output contains required keyword`,
};
},
},
test: {
vars: { question: 'Is the sky blue?' },
assert: [], // populated with assertion
},
providerResponse: {
output: 'Yes, the sky is blue in most places.',
tokenUsage: { total: 15 },
},
});

console.log(`Pass: ${result.pass}, Score: ${result.score}`);

Assertion Value Function Context:

interface AssertionValueFunctionContext {
// Test and evaluation metadata
prompt: string | undefined;
vars: Record<string, unknown>; // Template variables
test: AtomicTestCase; // Full test context

// Provider info
provider: ApiProvider | undefined;
providerResponse: ProviderResponse | undefined;

// Output metadata
logProbs?: number[]; // Token log probabilities

// Advanced: Tracing
trace?: TraceData; // Distributed trace with spans
config?: Record<string, any>; // Custom assertion config
}

Using Trace Data:

import { assertions } from 'promptfoo';

const result = await assertions.runAssertion({
assertion: {
type: 'javascript',
value: (output, context) => {
// Access trace data for latency analysis
if (context.trace?.spans) {
const ttft = context.trace.spans.find((s) => s.name === 'time_to_first_token');
console.log(`Time to first token: ${ttft?.duration}ms`);
}
return { pass: true };
},
},
test: { vars: {} },
providerResponse: { output: 'test' },
traceId: 'trace-123',
traceData: {
spans: [
{
name: 'time_to_first_token',
startTime: Date.now(),
duration: 250,
},
],
},
});

assertions.runAssertions(params)

Execute multiple assertions in batch against provider output.

async function runAssertions({
assertions: (Assertion | AssertionSet)[];
prompt?: string;
test: AtomicTestCase;
provider?: ApiProvider;
vars?: Record<string, VarValue>;
providerResponse: ProviderResponse;
latencyMs?: number;
traceData?: TraceData | null;
}): Promise<AssertionsResult>

Returns:

interface GradingResult {
pass: boolean;
score: number; // Aggregate score across all assertions
reason?: string;
componentResults?: GradingResult[]; // Per-assertion results
namedScores?: Record<string, number>;
tokensUsed?: {
total: number;
prompt: number;
completion: number;
cached: number;
numRequests: number;
};
}

Example:

import { assertions } from 'promptfoo';

const result = await assertions.runAssertions({
assertions: [
{ type: 'contains', value: '4' },
{ type: 'regex', value: '^The answer is \\d+$' },
{ type: 'not-regex', value: '(?i)error|failed' },
],
test: { vars: { question: 'What is 2+2?' } },
providerResponse: { output: 'The answer is 4.' },
});

console.log(`All passed: ${result.pass}`);
console.log(`Average score: ${result.score}`);
result.componentResults?.forEach((r) => console.log(` ${r.assertion?.type}: ${r.pass}`));

Cache API

Control promptfoo's caching layer for LLM provider calls.

enableCache()

Enable caching for provider calls (default).

export function enableCache(): void;

Example:

import { cache, evaluate } from 'promptfoo';

cache.enableCache();

disableCache()

Disable caching. Useful during development/testing to always hit the provider.

export function disableCache(): void;

Example:

import { cache, evaluate } from 'promptfoo';

// Disable for fresh results
cache.disableCache();
const evalRecord = await evaluate(testSuite);
cache.enableCache();

isCacheEnabled()

Check if caching is currently enabled.

export function isCacheEnabled(): boolean;

clearCache()

Clear all cached results.

export async function clearCache(): Promise<void>;

Example:

import { cache, evaluate } from 'promptfoo';

// Clear old cache
await cache.clearCache();
await evaluate(testSuite); // Will refetch all provider calls

withCacheNamespace(namespace, fn)

Run evaluation with isolated cache namespace.

export function withCacheNamespace<T>(
namespace: string | undefined,
fn: () => Promise<T>,
): Promise<T>;

Use Cases:

  • Isolate caches for different test runs
  • Prevent cache collisions across environments
  • Version-specific caching

Example:

import { cache, evaluate } from 'promptfoo';

// Run v1 and v2 evals with separate caches
const v1Results = await cache.withCacheNamespace('v1', async () => {
return evaluate(testSuiteV1);
});

const v2Results = await cache.withCacheNamespace('v2', async () => {
return evaluate(testSuiteV2);
});

getCache()

Get the underlying cache instance for advanced operations.

export function getCache(): Cache;

interface Cache {
get(key: string): Promise<unknown>;
set(key: string, value: unknown, ttl?: number): Promise<void>;
del(key: string): Promise<void>;
clear(): Promise<void>;
}

fetchWithCache(url, options?, timeout?, format?, bust?, maxRetries?)

Fetch with caching enabled.

export async function fetchWithCache<T>(
url: string,
options?: RequestInit,
timeout?: number,
format?: 'json' | 'text',
bust?: boolean,
maxRetries?: number,
): Promise<FetchWithCacheResult<T>>;

Example:

import { cache } from 'promptfoo';

const result = await cache.fetchWithCache(
'https://api.example.com/data',
{ method: 'GET' },
undefined,
'json',
);

console.log(result.cached); // true if from cache
console.log(result.data); // the fetched data

Configuration: Set cache location and TTL via environment variables:

# Cache directory (default: ~/.promptfoo/cache)
export PROMPTFOO_CACHE_PATH=/path/to/cache

# Cache TTL in seconds (default: 604800 = 7 days)
export PROMPTFOO_CACHE_TTL=604800

# Disable cache via env
export PROMPTFOO_CACHE_ENABLED=false

Guardrails API

Content safety and security layer. Use guardrails to detect PII, harmful content, and other safety concerns.

guardrails.guard(input)

Run general content moderation.

async function guard(input: string): Promise<GuardResult>;

Returns:

interface GuardResult {
model: string;
results: Array<{
categories: Record<string, boolean>; // e.g., { hate: false, violence: true }
category_scores: Record<string, number>; // Scores 0-1
flagged: boolean; // Any category flagged?
}>;
}

Example:

import { guardrails } from 'promptfoo';

const result = await guardrails.guard('This is a test message');

if (result.results[0].flagged) {
console.log('Content flagged for safety issues:');
Object.entries(result.results[0].categories).forEach(([name, flagged]) => {
if (flagged) {
console.log(` ${name}: ${result.results[0].category_scores[name]}`);
}
});
}

guardrails.pii(input)

Detect personally identifiable information (PII).

async function pii(input: string): Promise<GuardResult>;

Returns: GuardResult with PII detection details

Example:

import { guardrails } from 'promptfoo';

const result = await guardrails.pii('John Doe, [email protected], SSN: 123-45-6789');

if (result.results[0].flagged) {
console.log('PII detected:');
if (result.results[0].payload?.pii) {
result.results[0].payload.pii.forEach((item) => {
console.log(` ${item.type}: ${item.value}`);
});
}
}

guardrails.harm(input)

Detect harmful content (violence, hate speech, etc.).

async function harm(input: string): Promise<GuardResult>;

guardrails.adaptive(request)

Run adaptive guardrails with custom configuration.

async function adaptive(request: AdaptiveRequest): Promise<AdaptiveResult>;

Red Team API

Adversarial testing framework for finding vulnerabilities in LLM systems.

redteam.generate(config)

Generate adversarial test cases.

async function generate(config: RedteamGenerateConfig): Promise<RedteamGenerateResult>;

See Red Teaming Guide for full documentation.


redteam.run(config)

Run full red team evaluation against a system.

async function run(config: RedteamRunConfig): Promise<RedteamRunResult>;

redteam.Plugins

Registry of available red team attack plugins.

interface Plugins {
[pluginName: string]: typeof RedteamPluginBase;
}

// Example plugins
Plugins['prompt-injection']; // Injection attacks
Plugins['jailbreak']; // Jailbreak attempts
Plugins['rbac']; // RBAC bypasses
Plugins['sql-injection']; // SQL injection
// ... and many more

Creating Custom Plugins:

import { redteam } from 'promptfoo';

class MyCustomPlugin extends redteam.Base.Plugin {
async run(params: RedteamPluginRunParams): Promise<RedteamPluginResult> {
// Custom attack logic
return {
generated: ['attack1', 'attack2'],
stats: { duration: 100 },
};
}
}

// Register and use
export default MyCustomPlugin;

redteam.Strategies

Registry of red team strategies for composing attacks.

interface Strategies {
[strategyName: string]: RedteamStrategy;
}

See Red Teaming Strategies for details.


redteam.Extractors

Extract system information for targeted attacks.

const {
extractEntities, // Extract named entities from system
extractMcpToolsInfo, // Extract MCP tool metadata
extractSystemPurpose, // Extract system purpose and goal
} = redteam.Extractors;

Example:

import { redteam } from 'promptfoo';

const purpose = await redteam.Extractors.extractSystemPurpose({
prompt: 'You are a helpful assistant...',
});

console.log(`System purpose: ${purpose}`);

redteam.Base.Plugin

Base class for custom red team plugins.

abstract class RedteamPluginBase {
async run(params: {
target: Prompt;
injectVar?: string;
conversationHistory?: Message[];
options?: Record<string, unknown>;
}): Promise<RedteamPluginResult>;
}

See Plugin Development Guide (below) for examples.


redteam.Base.Grader

Base class for custom red team graders.

abstract class RedteamGraderBase {
async grade(params: { prompt: string; output: string; rubric: string }): Promise<GradingResult>;
}

Utilities

generateTable(evaluateTable, tableCellMaxLength?, maxRows?)

Format evaluation results as a display table.

export function generateTable(
evaluateTable: EvaluateTable,
tableCellMaxLength?: number,
maxRows?: number,
): string;

Example:

import { evaluate, generateTable } from 'promptfoo';

const evalRecord = await evaluate(testSuite);
const table = await evalRecord.getTable();
console.log(generateTable(table));

isTransformFunction(value)

Check if a value is a transform function (for custom prompt/variable transforms).

export function isTransformFunction(value: unknown): value is TransformFunction;

Example:

import { isTransformFunction } from 'promptfoo';

const maybeTransform = (x) => x.toUpperCase();
console.log(isTransformFunction(maybeTransform)); // true

Advanced Patterns

Pattern 1: Programmatic Test Suite Generation

Dynamically generate test cases:

import { evaluate } from 'promptfoo';

const generateTests = (questions) => {
return questions.map((q) => ({
vars: { question: q },
assert: [
{ type: 'contains', value: 'verified-fact' },
{
type: 'custom-llm-rubric',
value: 'Is the answer accurate?',
},
],
}));
};

const evalRecord = await evaluate({
prompts: ['Answer: {{ question }}'],
providers: ['openai:gpt-4'],
tests: generateTests([
'What is the capital of France?',
'What is 2+2?',
'What is photosynthesis?',
]),
});

Pattern 2: Custom Grading with Context

Use provider response data in assertions:

import { evaluate } from 'promptfoo';

const evalRecord = await evaluate({
prompts: ['Analyze: {{ text }}'],
providers: ['openai:gpt-4'],
tests: [
{
vars: { text: 'Sample text' },
assert: [
{
type: 'javascript',
value: (output, context) => {
// Access token usage from provider response
const tokens = context.providerResponse?.tokenUsage?.total || 0;
return {
pass: tokens < 100,
reason: `Used ${tokens} tokens`,
};
},
},
],
},
],
});

Pattern 3: Batch Provider Testing

Load providers and test in parallel:

import { assertions, loadApiProviders } from 'promptfoo';

const providers = await loadApiProviders(['openai:gpt-4', 'anthropic:claude-3-opus']);

const testCases = ['2+2=?', 'What is AI?'];

for (const provider of providers) {
console.log(`Testing ${provider.id()}:`);
for (const test of testCases) {
const response = await provider.callApi(test);

const result = await assertions.runAssertion({
provider,
assertion: { type: 'contains', value: 'answer' },
test: { vars: { prompt: test } },
providerResponse: response,
});

console.log(` ${test}: ${result.pass ? '✓' : '✗'}`);
}
}

Pattern 4: Cache Isolation for A/B Testing

Compare two model versions with isolated caches:

import { cache, evaluate } from 'promptfoo';

async function compareModels(oldVersion, newVersion, testSuite) {
const oldEval = await cache.withCacheNamespace(`model-${oldVersion}`, () =>
evaluate({
...testSuite,
providers: [`openai:gpt-4-${oldVersion}`],
}),
);

const newEval = await cache.withCacheNamespace(`model-${newVersion}`, () =>
evaluate({
...testSuite,
providers: [`openai:gpt-4-${newVersion}`],
}),
);

const oldResults = await oldEval.toEvaluateSummary();
const newResults = await newEval.toEvaluateSummary();

return {
oldPass: oldResults.stats.successes,
newPass: newResults.stats.successes,
improvement: newResults.stats.successes - oldResults.stats.successes,
};
}

Type Definitions

All types are exported from promptfoo and available in TypeScript:

import type {
// Main types
EvaluateTestSuite,
EvaluateOptions,
EvaluateSummary,
EvaluateResult,

// Test configuration
TestCase,
AtomicTestCase,
Assertion,
AssertionSet,

// Provider types
ApiProvider,
ProviderResponse,
ProviderConfig,

// Evaluation types
GradingResult,
AssertionValueFunctionContext,

// Transform types
TransformFunction,
TransformContext,

// Extension hooks
BeforeAllExtensionHookContext,
AfterEachExtensionHookContext,
// ... and more
} from 'promptfoo';

API Stability

Stable APIs (Safe for production)

  • evaluate()
  • loadApiProvider(), loadApiProviders()
  • assertions.runAssertion(), assertions.runAssertions()
  • Cache namespace functions
  • Guardrails namespace

Experimental APIs (May change)

  • redteam.* (plugin/strategy system)
  • Trace data structures

Internal APIs (Don't use directly)

  • Functions not exported from main promptfoo module
  • Test utilities
  • CLI-specific functions

Troubleshooting

Cache issues?

// Clear cache if stale
import { cache } from 'promptfoo';
await cache.clearCache();

// Or disable for development
cache.disableCache();

Type errors with custom assertions?

import type { AssertionValueFunctionContext } from 'promptfoo';

// Type-safe context usage
value: (output, context: AssertionValueFunctionContext) => {
// Full autocomplete and type checking
};

Need to debug evaluation flow?

// Enable verbose logging
process.env.LOG_LEVEL = 'debug';

// Or check trace data in assertions
const result = await assertions.runAssertion({
// ...
traceData: trace, // Access timing information
});

Examples Repository

Find runnable examples in the promptfoo examples directory.


See Also