Output Formats

Save and analyze your evaluation results in various formats.

Quick Start

# Interactive web viewer (default)
promptfoo eval

# Save as HTML report
promptfoo eval --output results.html

# Export as JSON for further processing
promptfoo eval --output results.json

# Create CSV for spreadsheet analysis
promptfoo eval --output results.csv

# Generate JUnit XML for CI test-report integrations
promptfoo eval --output results.junit.xml

Available Formats

HTML Report

Generate a visual, shareable report:

promptfoo eval --output report.html

Features:

Interactive table with sorting and filtering
Side-by-side output comparison
Pass/fail statistics
Shareable standalone file

Use when: Presenting results to stakeholders or reviewing outputs visually.

JSON Output

Export complete evaluation data:

promptfoo eval --output results.json

Structure:

{
  "version": 3,
  "timestamp": "2024-01-15T10:30:00Z",
  "results": {
    "prompts": [...],
    "providers": [...],
    "outputs": [...],
    "stats": {...}
  }
}

Use when: Integrating with other tools or performing custom analysis.

CSV Export

Create spreadsheet-compatible data:

promptfoo eval --output results.csv

Columns include:

Test variables
Prompt used
Model outputs
Pass/fail status
Latency
Token usage

Use when: Analyzing results in Excel, Google Sheets, or data science tools.

YAML Format

Human-readable structured data:

promptfoo eval --output results.yaml

Use when: Reviewing results in a text editor or version control.

JSONL Format

Each line contains one JSON result:

promptfoo eval --output results.jsonl

Use when: Working with very large evaluations or when JSON export fails with memory errors.

{"testIdx":0,"promptIdx":0,"success":true,"score":1.0,"response":{"output":"Response 1"},"gradingResult":{"pass":true,"score":1.0,"reason":"All assertions passed","componentResults":[{"pass":true,"score":1.0,"reason":"Expected output to contain \"hello\"","assertion":{"type":"contains","value":"hello"}}]}}
{"testIdx":1,"promptIdx":0,"success":false,"score":0.0,"response":{"output":"Response 2"},"gradingResult":null}

For assertion-level details, inspect each row's gradingResult?.componentResults array when present. The top-level success, score, and gradingResult fields describe the aggregate result for the row, while each componentResults[] entry contains the pass/fail, score, reason, and assertion metadata for one evaluated assertion. Both gradingResult and componentResults may be absent on error rows or rows without assertions.

To stream a JSONL file and read each row's component results:

import fs from 'node:fs';
import readline from 'node:readline';

const rl = readline.createInterface({
  input: fs.createReadStream('results.jsonl', { encoding: 'utf8' }),
  crlfDelay: Infinity,
});

for await (const line of rl) {
  if (!line.trim()) {
    continue;
  }
  const row = JSON.parse(line);
  for (const component of row.gradingResult?.componentResults ?? []) {
    console.log({
      type: component.assertion?.type,
      pass: component.pass,
      score: component.score,
      reason: component.reason,
    });
  }
}

?. and ?? [] together cover the gradingResult: null case shown above and rows where a single top-level assertion produced no nested componentResults.

JUnit XML Format

Compact CI test-report output:

promptfoo eval --output results.junit.xml

Structure:

<testsuites tests="2" failures="1" errors="0" time="0.840">
  <testsuite name="[openai:gpt-4.1] prompt 1" tests="2" failures="1" errors="0" time="0.840">
    <testcase name="test 1: greets the customer" classname="[openai:gpt-4.1] prompt 1" time="0.420" />
    <testcase name="test 2: refuses refunds outside policy" classname="[openai:gpt-4.1] prompt 1" time="0.420">
      <failure message="Assertion failed">Score: 0
Reason: Assertion failed
Failed assertions:
- contains</failure>
    </testcase>
  </testsuite>
</testsuites>

Use when: Publishing eval results into CI systems that already understand JUnit-style test reports, such as GitLab, Azure Pipelines, Bitbucket Pipelines, Jenkins, and other test-report viewers.

JUnit XML intentionally stays compact:

one testsuite per prompt/provider pair so CI groups related cases together
one testcase per eval result so every promptfoo test appears in CI
failure for failed assertions and error for provider/runtime errors so CI can distinguish incorrect behavior from execution problems
concise failure/error summaries only; use JSON, HTML, or Promptfoo XML when you need assertion reasons, provider errors, prompts, variables, raw model outputs, or full config

A JUnit report viewer can render the same file into a compact pass/fail report:

Rendered JUnit XML report showing one suite, two failures, and two passing tests

Promptfoo XML Full Export

Full eval data for XML-only consumers:

promptfoo eval --output results.xml

Structure:

<promptfoo>
  <evalId>abc-123-def</evalId>
  <results>
    <version>3</version>
    <timestamp>2024-01-15T10:30:00Z</timestamp>
    <prompts>...</prompts>
    <providers>...</providers>
    <outputs>...</outputs>
    <stats>...</stats>
  </results>
  <config>...</config>
  <shareableUrl>...</shareableUrl>
</promptfoo>

Use when: A downstream system specifically requires the full Promptfoo export in XML. This is not a JUnit-compatible CI report format; use JUnit XML for CI dashboards and test-report viewers.

Configuration Options

Setting Output Path in Config

promptfooconfig.yaml
# Specify default output file
outputPath: evaluations/latest_results.html

prompts:
  - '...'
tests:
  - '...'

Multiple Output Formats

Generate multiple formats simultaneously:

# Command line
promptfoo eval --output results.html --output results.json

# Or use shell commands
promptfoo eval --output results.json && \
promptfoo eval --output results.csv

Output Contents

Structured Output Fields

json, yaml, yml, txt, and Promptfoo XML outputs include:

Field	Description
`timestamp`	When the evaluation ran
`prompts`	Prompts used in evaluation
`providers`	LLM providers tested
`tests`	Test cases with variables
`outputs`	Raw LLM responses
`results`	Pass/fail for each assertion
`stats`	Summary statistics

warning

json, yaml, yml, txt, html, and Promptfoo XML outputs include the eval config. Sensitive fields are redacted using Promptfoo's sanitizer rules on a best-effort basis (not comprehensive). Non-sensitive config.env values may still appear in exports.

JUnit XML omits the eval config, prompts, variables, raw model outputs, assertion reasons, and provider error payloads by design so CI test-report viewers stay compact and do not become a second full export surface.

Detailed Metrics

When available, outputs include:

Latency: Response time in milliseconds
Token Usage: Input/output token counts
Cost: Estimated API costs
Error Details: Failure reasons and stack traces

Analyzing Results

JSON Processing Example

const fs = require('fs');

// Load results
const results = JSON.parse(fs.readFileSync('results.json', 'utf8'));

// Analyze pass rates by provider
const providerStats = {};
results.results.outputs.forEach((output) => {
  const provider = output.provider;
  if (!providerStats[provider]) {
    providerStats[provider] = { pass: 0, fail: 0 };
  }

  if (output.pass) {
    providerStats[provider].pass++;
  } else {
    providerStats[provider].fail++;
  }
});

console.log('Pass rates by provider:', providerStats);

CSV Analysis with Pandas

import pandas as pd

# Load results
df = pd.read_csv('results.csv')

# Group by provider and calculate metrics
summary = df.groupby('provider').agg({
    'pass': 'mean',
    'latency': 'mean',
    'cost': 'sum'
})

print(summary)

Best Practices

1. Organize Output Files

project/
├── promptfooconfig.yaml
├── evaluations/
│   ├── 2024-01-15-baseline.html
│   ├── 2024-01-16-improved.html
│   └── comparison.json

2. Use Descriptive Filenames

# Include date and experiment name
promptfoo eval --output "results/$(date +%Y%m%d)-gpt4-temperature-test.html"

3. Version Control Considerations

# .gitignore
# Exclude large output files
evaluations/*.html
evaluations/*.json

# But keep summary reports
!evaluations/summary-*.csv

4. Automate Report Generation

#!/bin/bash
# run_evaluation.sh

TIMESTAMP=$(date +%Y%m%d-%H%M%S)
promptfoo eval \
  --output "reports/${TIMESTAMP}-full.json" \
  --output "reports/${TIMESTAMP}-summary.html"

Web Viewer

The default web viewer (promptfoo view) provides:

Real-time updates during evaluation
Interactive exploration
Local-only (no data sent externally)

HTML outputs are self-contained:

# Generate report
promptfoo eval --output team-review.html

# Share via email, Slack, etc.
# No external dependencies required

For collaborative review:

# Share results with your team
promptfoo share

Creates a shareable link with:

Read-only access
Commenting capabilities
No setup required for viewers

Troubleshooting

Large Output Files

For extensive evaluations:

# Limit output size
outputPath: results.json
sharing:
  # Exclude raw outputs from file
  includeRawOutputs: false

Encoding Issues

Ensure proper encoding for international content:

# Explicitly set encoding
LANG=en_US.UTF-8 promptfoo eval --output results.csv

Performance Tips

Use JSONL for large datasets - avoids memory issues
Use JSON for standard datasets - complete data structure
Generate HTML for presentations - best visual format
Use CSV for data analysis - Excel/Sheets compatible

Configuration Reference - All output options
Integrations - Using outputs with other tools
Command Line Guide - CLI options

Quick Start​

Available Formats​

HTML Report​

JSON Output​

CSV Export​

YAML Format​

JSONL Format​

JUnit XML Format​

Promptfoo XML Full Export​

Configuration Options​

Setting Output Path in Config​

Multiple Output Formats​

Output Contents​

Structured Output Fields​

Detailed Metrics​

Analyzing Results​

JSON Processing Example​

CSV Analysis with Pandas​

Best Practices​

1. Organize Output Files​

2. Use Descriptive Filenames​

3. Version Control Considerations​

4. Automate Report Generation​

Sharing Results​

Web Viewer​

Sharing HTML Reports​

Promptfoo Share​

Troubleshooting​

Large Output Files​

Encoding Issues​

Performance Tips​

Related Documentation​

Quick Start

Available Formats

HTML Report

JSON Output

CSV Export

YAML Format

JSONL Format

JUnit XML Format

Promptfoo XML Full Export

Configuration Options

Setting Output Path in Config

Multiple Output Formats

Output Contents

Structured Output Fields

Detailed Metrics

Analyzing Results

JSON Processing Example

CSV Analysis with Pandas

Best Practices

1. Organize Output Files

2. Use Descriptive Filenames

3. Version Control Considerations

4. Automate Report Generation

Sharing Results

Web Viewer

Sharing HTML Reports

Promptfoo Share

Troubleshooting

Large Output Files

Encoding Issues

Performance Tips

Related Documentation