Node Module API Reference

This guide documents the Node.js module API for advanced programmatic usage of promptfoo. It covers all publicly exported functions, namespaces, and types.

info

For standard YAML-based evaluation configuration, see Configuration Guide. This API reference is for users building programmatic evaluation workflows directly in JavaScript/TypeScript.

Overview

The promptfoo Node module provides programmatic access to:

Core evaluation engine (evaluate())
Provider management (load and resolve LLM providers)
Assertion execution (run assertions independently)
Cache management (control caching behavior)
Security guardrails (PII, harm, moderation checks)
Adversarial testing (red team evaluation)

Core API

`evaluate(testSuite, options?)`

The main entry point for running evaluations programmatically.

async function evaluate(testSuite: EvaluateTestSuite, options?: EvaluateOptions): Promise<Eval>;

Parameters:

testSuite: Configuration object containing prompts, providers, and tests
options: Optional evaluation settings (caching, output, concurrency, etc.)

Returns: Eval record. Call toEvaluateSummary() when you need the serializable results summary.

Example:

import { evaluate } from 'promptfoo';

const evalRecord = await evaluate({
  prompts: ['What is 2+2?'],
  providers: ['openai:gpt-4', 'anthropic:claude-3-opus'],
  tests: [
    {
      vars: { query: 'math question' },
      assert: [
        {
          type: 'contains',
          value: '4',
          metric: 'is_correct',
        },
      ],
    },
  ],
});
const results = await evalRecord.toEvaluateSummary();

console.log(`Passed: ${results.stats.successes}/${results.results.length}`);
console.log(`Shareable URL: ${evalRecord.shareableUrl}`);

Common Options:

interface EvaluateOptions {
  // Output and format
  outputPath?: string | string[];
  formatOutput?: boolean;

  // Evaluation behavior
  maxConcurrency?: number;
  nunjucksFilters?: Record<string, Function>;

  // Caching
  cache?: boolean;

  // Sharing and persistence
  sharing?: boolean;
  writeLatestResults?: boolean;

  // Progress tracking
  onTestComplete?: (result: EvaluateResult) => void;
}

Provider APIs

`loadApiProvider(providerPath, context?)`

Load a single provider instance by path or identifier.

async function loadApiProvider(
  providerPath: string,
  context?: LoadApiProviderContext,
): Promise<ApiProvider>;

Parameters:

providerPath: Provider identifier (e.g., 'openai:gpt-4', 'anthropic:claude-3-opus', or 'file://./custom-provider.js')
context: Optional context with environment overrides

Returns: Configured ApiProvider instance ready to call

Example:

import { loadApiProvider } from 'promptfoo';

const openaiProvider = await loadApiProvider('openai:gpt-4', {
  env: { OPENAI_API_KEY: process.env.MY_SECRET_KEY },
});

const response = await openaiProvider.callApi('Hello, world!');

console.log(response.output);

Supported Provider Types:

openai:* (GPT models)
anthropic:* (Claude models)
azure-openai:*
bedrock:* (AWS Bedrock)
vertex:* (Google Vertex AI)
cohere:*
replicate:*
huggingface:*
Custom files: file://./path/to/provider.js
Custom functions via inline JavaScript

`loadApiProviders(providers, options?)`

Load multiple providers in parallel.

async function loadApiProviders(
  providers: ProvidersConfig,
  options?: { env?: Record<string, string> },
): Promise<ApiProvider[]>;

Parameters:

providers: Array of provider identifiers or config objects
options: Optional environment overrides

Returns: Array of configured provider instances

Example:

import { loadApiProviders } from 'promptfoo';

const providers = await loadApiProviders([
  'openai:gpt-4',
  'anthropic:claude-3-opus',
  {
    id: 'custom-provider',
    config: {
      model: 'my-model',
      temperature: 0.7,
    },
  },
]);

for (const provider of providers) {
  const result = await provider.callApi('Test');
  console.log(`${provider.id()}: ${result.output}`);
}

Assertions API

`assertions.runAssertion(params)`

Execute a single assertion against provider output. Powerful for custom evaluation logic.

async function runAssertion({
  prompt?: string;
  provider?: ApiProvider;
  assertion: Assertion;
  test: AtomicTestCase;
  vars?: Record<string, VarValue>;
  latencyMs?: number;
  providerResponse: ProviderResponse;
  traceId?: string;
  traceData?: TraceData | null;
}): Promise<GradingResult>

Parameters:

assertion: The assertion to run (e.g., { type: 'contains', value: 'expected' })
test: The test case context
providerResponse: The output from the provider to evaluate
vars: Template variables from the test
provider: Provider instance (optional, for context)
traceData: Distributed trace data for debugging (optional)

Returns:

interface GradingResult {
  pass: boolean; // Did the assertion pass?
  score?: number; // 0-1 score
  reason?: string; // Explanation
  assertion: Assertion; // The original assertion
  metric?: string; // Metric name
  error?: string; // Error message if failed
}

Example: Custom Assertion Logic

import { assertions } from 'promptfoo';

const result = await assertions.runAssertion({
  assertion: {
    type: 'javascript',
    value: (output, context) => {
      // Custom grading logic
      const score = output.includes('yes') ? 1.0 : 0.0;
      return {
        pass: score >= 0.8,
        score,
        reason: `Output contains required keyword`,
      };
    },
  },
  test: {
    vars: { question: 'Is the sky blue?' },
    assert: [], // populated with assertion
  },
  providerResponse: {
    output: 'Yes, the sky is blue in most places.',
    tokenUsage: { total: 15 },
  },
});

console.log(`Pass: ${result.pass}, Score: ${result.score}`);

Assertion Value Function Context:

interface AssertionValueFunctionContext {
  // Test and evaluation metadata
  prompt: string | undefined;
  vars: Record<string, unknown>; // Template variables
  test: AtomicTestCase; // Full test context

  // Provider info
  provider: ApiProvider | undefined;
  providerResponse: ProviderResponse | undefined;

  // Output metadata
  logProbs?: number[]; // Token log probabilities

  // Advanced: Tracing
  trace?: TraceData; // Distributed trace with spans
  config?: Record<string, any>; // Custom assertion config
}

Using Trace Data:

import { assertions } from 'promptfoo';

const result = await assertions.runAssertion({
  assertion: {
    type: 'javascript',
    value: (output, context) => {
      // Access trace data for latency analysis
      if (context.trace?.spans) {
        const ttft = context.trace.spans.find((s) => s.name === 'time_to_first_token');
        console.log(`Time to first token: ${ttft?.duration}ms`);
      }
      return { pass: true };
    },
  },
  test: { vars: {} },
  providerResponse: { output: 'test' },
  traceId: 'trace-123',
  traceData: {
    spans: [
      {
        name: 'time_to_first_token',
        startTime: Date.now(),
        duration: 250,
      },
    ],
  },
});

`assertions.runAssertions(params)`

Execute multiple assertions in batch against provider output.

async function runAssertions({
  assertions: (Assertion | AssertionSet)[];
  prompt?: string;
  test: AtomicTestCase;
  provider?: ApiProvider;
  vars?: Record<string, VarValue>;
  providerResponse: ProviderResponse;
  latencyMs?: number;
  traceData?: TraceData | null;
}): Promise<AssertionsResult>

Returns:

interface GradingResult {
  pass: boolean;
  score: number; // Aggregate score across all assertions
  reason?: string;
  componentResults?: GradingResult[]; // Per-assertion results
  namedScores?: Record<string, number>;
  tokensUsed?: {
    total: number;
    prompt: number;
    completion: number;
    cached: number;
    numRequests: number;
  };
}

Example:

import { assertions } from 'promptfoo';

const result = await assertions.runAssertions({
  assertions: [
    { type: 'contains', value: '4' },
    { type: 'regex', value: '^The answer is \\d+$' },
    { type: 'not-regex', value: '(?i)error|failed' },
  ],
  test: { vars: { question: 'What is 2+2?' } },
  providerResponse: { output: 'The answer is 4.' },
});

console.log(`All passed: ${result.pass}`);
console.log(`Average score: ${result.score}`);
result.componentResults?.forEach((r) => console.log(`  ${r.assertion?.type}: ${r.pass}`));

Cache API

Control promptfoo's caching layer for LLM provider calls.

`enableCache()`

Enable caching for provider calls (default).

export function enableCache(): void;

Example:

import { cache, evaluate } from 'promptfoo';

cache.enableCache();

`disableCache()`

Disable caching. Useful during development/testing to always hit the provider.

export function disableCache(): void;

Example:

import { cache, evaluate } from 'promptfoo';

// Disable for fresh results
cache.disableCache();
const evalRecord = await evaluate(testSuite);
cache.enableCache();

`isCacheEnabled()`

Check if caching is currently enabled.

export function isCacheEnabled(): boolean;

`clearCache()`

Clear all cached results.

export async function clearCache(): Promise<void>;

Example:

import { cache, evaluate } from 'promptfoo';

// Clear old cache
await cache.clearCache();
await evaluate(testSuite); // Will refetch all provider calls

`withCacheNamespace(namespace, fn)`

Run evaluation with isolated cache namespace.

export function withCacheNamespace<T>(
  namespace: string | undefined,
  fn: () => Promise<T>,
): Promise<T>;

Use Cases:

Isolate caches for different test runs
Prevent cache collisions across environments
Version-specific caching

Example:

import { cache, evaluate } from 'promptfoo';

// Run v1 and v2 evals with separate caches
const v1Results = await cache.withCacheNamespace('v1', async () => {
  return evaluate(testSuiteV1);
});

const v2Results = await cache.withCacheNamespace('v2', async () => {
  return evaluate(testSuiteV2);
});

`getCache()`

Get the underlying cache instance for advanced operations.

export function getCache(): Cache;

interface Cache {
  get(key: string): Promise<unknown>;
  set(key: string, value: unknown, ttl?: number): Promise<void>;
  del(key: string): Promise<void>;
  clear(): Promise<void>;
}

`fetchWithCache(url, options?, timeout?, format?, bust?, maxRetries?)`

Fetch with caching enabled.

export async function fetchWithCache<T>(
  url: string,
  options?: RequestInit,
  timeout?: number,
  format?: 'json' | 'text',
  bust?: boolean,
  maxRetries?: number,
): Promise<FetchWithCacheResult<T>>;

Example:

import { cache } from 'promptfoo';

const result = await cache.fetchWithCache(
  'https://api.example.com/data',
  { method: 'GET' },
  undefined,
  'json',
);

console.log(result.cached); // true if from cache
console.log(result.data); // the fetched data

Configuration: Set cache location and TTL via environment variables:

# Cache directory (default: ~/.promptfoo/cache)
export PROMPTFOO_CACHE_PATH=/path/to/cache

# Cache TTL in seconds (default: 604800 = 7 days)
export PROMPTFOO_CACHE_TTL=604800

# Disable cache via env
export PROMPTFOO_CACHE_ENABLED=false

Guardrails API

Content safety and security layer. Use guardrails to detect PII, harmful content, and other safety concerns.

`guardrails.guard(input)`

Run general content moderation.

async function guard(input: string): Promise<GuardResult>;

Returns:

interface GuardResult {
  model: string;
  results: Array<{
    categories: Record<string, boolean>; // e.g., { hate: false, violence: true }
    category_scores: Record<string, number>; // Scores 0-1
    flagged: boolean; // Any category flagged?
  }>;
}

Example:

import { guardrails } from 'promptfoo';

const result = await guardrails.guard('This is a test message');

if (result.results[0].flagged) {
  console.log('Content flagged for safety issues:');
  Object.entries(result.results[0].categories).forEach(([name, flagged]) => {
    if (flagged) {
      console.log(`  ${name}: ${result.results[0].category_scores[name]}`);
    }
  });
}

`guardrails.pii(input)`

Detect personally identifiable information (PII).

async function pii(input: string): Promise<GuardResult>;

Returns: GuardResult with PII detection details

Example:

import { guardrails } from 'promptfoo';

const result = await guardrails.pii('John Doe, [email protected], SSN: 123-45-6789');

if (result.results[0].flagged) {
  console.log('PII detected:');
  if (result.results[0].payload?.pii) {
    result.results[0].payload.pii.forEach((item) => {
      console.log(`  ${item.type}: ${item.value}`);
    });
  }
}

`guardrails.harm(input)`

Detect harmful content (violence, hate speech, etc.).

async function harm(input: string): Promise<GuardResult>;

`guardrails.adaptive(request)`

Run adaptive guardrails with custom configuration.

async function adaptive(request: AdaptiveRequest): Promise<AdaptiveResult>;

Red Team API

Adversarial testing framework for finding vulnerabilities in LLM systems.

`redteam.generate(config)`

Generate adversarial test cases.

async function generate(config: RedteamGenerateConfig): Promise<RedteamGenerateResult>;

See Red Teaming Guide for full documentation.

`redteam.run(config)`

Run full red team evaluation against a system.

async function run(config: RedteamRunConfig): Promise<RedteamRunResult>;

`redteam.Plugins`

Registry of available red team attack plugins.

interface Plugins {
  [pluginName: string]: typeof RedteamPluginBase;
}

// Example plugins
Plugins['prompt-injection']; // Injection attacks
Plugins['jailbreak']; // Jailbreak attempts
Plugins['rbac']; // RBAC bypasses
Plugins['sql-injection']; // SQL injection
// ... and many more

Creating Custom Plugins:

import { redteam } from 'promptfoo';

class MyCustomPlugin extends redteam.Base.Plugin {
  async run(params: RedteamPluginRunParams): Promise<RedteamPluginResult> {
    // Custom attack logic
    return {
      generated: ['attack1', 'attack2'],
      stats: { duration: 100 },
    };
  }
}

// Register and use
export default MyCustomPlugin;

`redteam.Strategies`

Registry of red team strategies for composing attacks.

interface Strategies {
  [strategyName: string]: RedteamStrategy;
}

See Red Teaming Strategies for details.

`redteam.Extractors`

Extract system information for targeted attacks.

const {
  extractEntities, // Extract named entities from system
  extractMcpToolsInfo, // Extract MCP tool metadata
  extractSystemPurpose, // Extract system purpose and goal
} = redteam.Extractors;

Example:

import { redteam } from 'promptfoo';

const purpose = await redteam.Extractors.extractSystemPurpose({
  prompt: 'You are a helpful assistant...',
});

console.log(`System purpose: ${purpose}`);

`redteam.Base.Plugin`

Base class for custom red team plugins.

abstract class RedteamPluginBase {
  async run(params: {
    target: Prompt;
    injectVar?: string;
    conversationHistory?: Message[];
    options?: Record<string, unknown>;
  }): Promise<RedteamPluginResult>;
}

See Plugin Development Guide (below) for examples.

`redteam.Base.Grader`

Base class for custom red team graders.

abstract class RedteamGraderBase {
  async grade(params: { prompt: string; output: string; rubric: string }): Promise<GradingResult>;
}

Utilities

`generateTable(evaluateTable, tableCellMaxLength?, maxRows?)`

Format evaluation results as a display table.

export function generateTable(
  evaluateTable: EvaluateTable,
  tableCellMaxLength?: number,
  maxRows?: number,
): string;

Example:

import { evaluate, generateTable } from 'promptfoo';

const evalRecord = await evaluate(testSuite);
const table = await evalRecord.getTable();
console.log(generateTable(table));

`isTransformFunction(value)`

Check if a value is a transform function (for custom prompt/variable transforms).

export function isTransformFunction(value: unknown): value is TransformFunction;

Example:

import { isTransformFunction } from 'promptfoo';

const maybeTransform = (x) => x.toUpperCase();
console.log(isTransformFunction(maybeTransform)); // true

Advanced Patterns

Pattern 1: Programmatic Test Suite Generation

Dynamically generate test cases:

import { evaluate } from 'promptfoo';

const generateTests = (questions) => {
  return questions.map((q) => ({
    vars: { question: q },
    assert: [
      { type: 'contains', value: 'verified-fact' },
      {
        type: 'custom-llm-rubric',
        value: 'Is the answer accurate?',
      },
    ],
  }));
};

const evalRecord = await evaluate({
  prompts: ['Answer: {{ question }}'],
  providers: ['openai:gpt-4'],
  tests: generateTests([
    'What is the capital of France?',
    'What is 2+2?',
    'What is photosynthesis?',
  ]),
});

Pattern 2: Custom Grading with Context

Use provider response data in assertions:

import { evaluate } from 'promptfoo';

const evalRecord = await evaluate({
  prompts: ['Analyze: {{ text }}'],
  providers: ['openai:gpt-4'],
  tests: [
    {
      vars: { text: 'Sample text' },
      assert: [
        {
          type: 'javascript',
          value: (output, context) => {
            // Access token usage from provider response
            const tokens = context.providerResponse?.tokenUsage?.total || 0;
            return {
              pass: tokens < 100,
              reason: `Used ${tokens} tokens`,
            };
          },
        },
      ],
    },
  ],
});

Pattern 3: Batch Provider Testing

Load providers and test in parallel:

import { assertions, loadApiProviders } from 'promptfoo';

const providers = await loadApiProviders(['openai:gpt-4', 'anthropic:claude-3-opus']);

const testCases = ['2+2=?', 'What is AI?'];

for (const provider of providers) {
  console.log(`Testing ${provider.id()}:`);
  for (const test of testCases) {
    const response = await provider.callApi(test);

    const result = await assertions.runAssertion({
      provider,
      assertion: { type: 'contains', value: 'answer' },
      test: { vars: { prompt: test } },
      providerResponse: response,
    });

    console.log(`  ${test}: ${result.pass ? '✓' : '✗'}`);
  }
}

Pattern 4: Cache Isolation for A/B Testing

Compare two model versions with isolated caches:

import { cache, evaluate } from 'promptfoo';

async function compareModels(oldVersion, newVersion, testSuite) {
  const oldEval = await cache.withCacheNamespace(`model-${oldVersion}`, () =>
    evaluate({
      ...testSuite,
      providers: [`openai:gpt-4-${oldVersion}`],
    }),
  );

  const newEval = await cache.withCacheNamespace(`model-${newVersion}`, () =>
    evaluate({
      ...testSuite,
      providers: [`openai:gpt-4-${newVersion}`],
    }),
  );

  const oldResults = await oldEval.toEvaluateSummary();
  const newResults = await newEval.toEvaluateSummary();

  return {
    oldPass: oldResults.stats.successes,
    newPass: newResults.stats.successes,
    improvement: newResults.stats.successes - oldResults.stats.successes,
  };
}

Type Definitions

All types are exported from promptfoo and available in TypeScript:

import type {
  // Main types
  EvaluateTestSuite,
  EvaluateOptions,
  EvaluateSummary,
  EvaluateResult,

  // Test configuration
  TestCase,
  AtomicTestCase,
  Assertion,
  AssertionSet,

  // Provider types
  ApiProvider,
  ProviderResponse,
  ProviderConfig,

  // Evaluation types
  GradingResult,
  AssertionValueFunctionContext,

  // Transform types
  TransformFunction,
  TransformContext,

  // Extension hooks
  BeforeAllExtensionHookContext,
  AfterEachExtensionHookContext,
  // ... and more
} from 'promptfoo';

API Stability

Stable APIs (Safe for production)

evaluate()
loadApiProvider(), loadApiProviders()
assertions.runAssertion(), assertions.runAssertions()
Cache namespace functions
Guardrails namespace

Experimental APIs (May change)

redteam.* (plugin/strategy system)
Trace data structures

Internal APIs (Don't use directly)

Functions not exported from main promptfoo module
Test utilities
CLI-specific functions

Troubleshooting

Cache issues?

// Clear cache if stale
import { cache } from 'promptfoo';
await cache.clearCache();

// Or disable for development
cache.disableCache();

Type errors with custom assertions?

import type { AssertionValueFunctionContext } from 'promptfoo';

// Type-safe context usage
value: (output, context: AssertionValueFunctionContext) => {
  // Full autocomplete and type checking
};

Need to debug evaluation flow?

// Enable verbose logging
process.env.LOG_LEVEL = 'debug';

// Or check trace data in assertions
const result = await assertions.runAssertion({
  // ...
  traceData: trace, // Access timing information
});

Examples Repository

Find runnable examples in the promptfoo examples directory.

Overview​

Core API​

evaluate(testSuite, options?)​

Provider APIs​

loadApiProvider(providerPath, context?)​

loadApiProviders(providers, options?)​

Assertions API​

assertions.runAssertion(params)​

assertions.runAssertions(params)​

Cache API​

enableCache()​

disableCache()​

isCacheEnabled()​

clearCache()​

withCacheNamespace(namespace, fn)​

getCache()​

fetchWithCache(url, options?, timeout?, format?, bust?, maxRetries?)​

Guardrails API​

guardrails.guard(input)​

guardrails.pii(input)​

guardrails.harm(input)​

guardrails.adaptive(request)​

Red Team API​

redteam.generate(config)​

redteam.run(config)​

redteam.Plugins​

redteam.Strategies​

redteam.Extractors​

redteam.Base.Plugin​

redteam.Base.Grader​

Utilities​

generateTable(evaluateTable, tableCellMaxLength?, maxRows?)​

isTransformFunction(value)​

Advanced Patterns​

Pattern 1: Programmatic Test Suite Generation​

Pattern 2: Custom Grading with Context​

Pattern 3: Batch Provider Testing​

Pattern 4: Cache Isolation for A/B Testing​

Type Definitions​

API Stability​

Stable APIs (Safe for production)​

Experimental APIs (May change)​

Internal APIs (Don't use directly)​

Troubleshooting​

Examples Repository​

See Also​

Overview

Core API

`evaluate(testSuite, options?)`

Provider APIs

`loadApiProvider(providerPath, context?)`

`loadApiProviders(providers, options?)`

Assertions API

`assertions.runAssertion(params)`

`assertions.runAssertions(params)`

Cache API

`enableCache()`

`disableCache()`

`isCacheEnabled()`

`clearCache()`

`withCacheNamespace(namespace, fn)`

`getCache()`

`fetchWithCache(url, options?, timeout?, format?, bust?, maxRetries?)`

Guardrails API

`guardrails.guard(input)`

`guardrails.pii(input)`

`guardrails.harm(input)`

`guardrails.adaptive(request)`

Red Team API

`redteam.generate(config)`

`redteam.run(config)`

`redteam.Plugins`

`redteam.Strategies`

`redteam.Extractors`

`redteam.Base.Plugin`

`redteam.Base.Grader`

Utilities

`generateTable(evaluateTable, tableCellMaxLength?, maxRows?)`

`isTransformFunction(value)`

Advanced Patterns

Pattern 1: Programmatic Test Suite Generation

Pattern 2: Custom Grading with Context

Pattern 3: Batch Provider Testing

Pattern 4: Cache Isolation for A/B Testing

Type Definitions

API Stability

Stable APIs (Safe for production)

Experimental APIs (May change)

Internal APIs (Don't use directly)

Troubleshooting

Examples Repository

See Also