HuggingFace Datasets

Promptfoo can import test cases directly from HuggingFace datasets using the huggingface://datasets/ prefix.

Basic usage

To load an entire dataset:

tests: huggingface://datasets/fka/awesome-chatgpt-prompts

Run the evaluation:

npx promptfoo eval

Each dataset row becomes a test case with all dataset fields available as variables.

Dataset splits

Load specific portions of datasets using query parameters:

# Load from training split
tests: huggingface://datasets/fka/awesome-chatgpt-prompts?split=train

# Load from validation split with custom configuration
tests: huggingface://datasets/fka/awesome-chatgpt-prompts?split=validation&config=custom

Use dataset fields in prompts

Dataset fields automatically become prompt variables. Here's how:

promptfooconfig.yaml
prompts:
  - "Question: {{question}}\nAnswer:"

tests: huggingface://datasets/rajpurkar/squad

Query parameters

Parameter	Description	Default
`split`	Dataset split to load (train/test/validation)	`test`
`config`	Dataset configuration name	`default`
`subset`	Dataset subset (for multi-subset datasets)	`none`
`limit`	Maximum number of test cases to load	`unlimited`

The loader accepts any parameter supported by the HuggingFace Datasets API. Additional parameters beyond these common ones are passed directly to the API.

To limit the number of test cases:

tests: huggingface://datasets/fka/awesome-chatgpt-prompts?split=train&limit=50

To load a specific subset (common with MMLU datasets):

tests: huggingface://datasets/cais/mmlu?split=test&subset=physics&limit=10

Authentication

For private datasets or increased rate limits, authenticate using your HuggingFace token. Set one of these environment variables:

# Any of these environment variables will work:
export HF_TOKEN=your_token_here
export HF_API_TOKEN=your_token_here
export HUGGING_FACE_HUB_TOKEN=your_token_here

info

Authentication is required for private datasets and gated models. For public datasets, authentication is optional but provides higher rate limits.

Implementation details

Each dataset row becomes a test case
All dataset fields are available as prompt variables
Large datasets are automatically paginated (100 rows per request)
Variable expansion is disabled to preserve original data

Example configurations

Basic chatbot evaluation

promptfooconfig.yaml
description: Testing with HuggingFace dataset

prompts:
  - 'Act as {{act}}. {{prompt}}'

providers:
  - openai:gpt-4.1-mini

tests: huggingface://datasets/fka/awesome-chatgpt-prompts?split=train

Question answering with limits

promptfooconfig.yaml
description: SQUAD evaluation with authentication

prompts:
  - 'Question: {{question}}\nContext: {{context}}\nAnswer:'

providers:
  - openai:gpt-4.1-mini

tests: huggingface://datasets/rajpurkar/squad?split=validation&limit=100

env:
  HF_TOKEN: your_token_here

Example projects

Example	Use Case	Key Features
Basic Setup	Simple evaluation	Default parameters
MMLU Comparison	Query parameters	Split, subset, limit
Red Team Safety	Safety testing	BeaverTails dataset

Troubleshooting

Authentication errors

Ensure your HuggingFace token is set correctly: export HF_TOKEN=your_token

Dataset not found

Verify the dataset path format: owner/repo (e.g., rajpurkar/squad)

Empty results

Check that the specified split exists for the dataset. Try split=train if split=test returns no results.

Performance issues

Add the limit parameter to reduce the number of rows loaded: &limit=100

Basic usage​

Dataset splits​

Use dataset fields in prompts​

Query parameters​

Authentication​

Implementation details​

Example configurations​

Basic chatbot evaluation​

Question answering with limits​

Example projects​

Troubleshooting​

Authentication errors​

Dataset not found​

Empty results​

Performance issues​

See Also​