Skip to main content

HuggingFace Datasets

Promptfoo can import test cases directly from HuggingFace datasets using the huggingface://datasets/ prefix.

Basic usage

To load an entire dataset:

tests: huggingface://datasets/fka/awesome-chatgpt-prompts

Run the evaluation:

npx promptfoo eval

Each dataset row becomes a test case with all dataset fields available as variables.

Dataset splits

Load specific portions of datasets using query parameters:

# Load from training split
tests: huggingface://datasets/fka/awesome-chatgpt-prompts?split=train

# Load from validation split with custom configuration
tests: huggingface://datasets/fka/awesome-chatgpt-prompts?split=validation&config=custom

Use dataset fields in prompts

Dataset fields automatically become prompt variables. Here's how:

promptfooconfig.yaml
prompts:
- "Question: {{question}}\nAnswer:"

tests: huggingface://datasets/rajpurkar/squad

Query parameters

ParameterDescriptionDefault
splitDataset split to load (train/test/validation)test
configDataset configuration namedefault
subsetDataset subset (for multi-subset datasets)none
limitMaximum number of test cases to loadunlimited

The loader accepts any parameter supported by the HuggingFace Datasets API. Additional parameters beyond these common ones are passed directly to the API.

To limit the number of test cases:

tests: huggingface://datasets/fka/awesome-chatgpt-prompts?split=train&limit=50

To load a specific subset (common with MMLU datasets):

tests: huggingface://datasets/cais/mmlu?split=test&subset=physics&limit=10

Authentication

For private datasets or increased rate limits, authenticate using your HuggingFace token. Set one of these environment variables:

# Any of these environment variables will work:
export HF_TOKEN=your_token_here
export HF_API_TOKEN=your_token_here
export HUGGING_FACE_HUB_TOKEN=your_token_here
info

Authentication is required for private datasets and gated models. For public datasets, authentication is optional but provides higher rate limits.

Implementation details

  • Each dataset row becomes a test case
  • All dataset fields are available as prompt variables
  • Large datasets are automatically paginated (100 rows per request)
  • Variable expansion is disabled to preserve original data

Example configurations

Basic chatbot evaluation

promptfooconfig.yaml
description: Testing with HuggingFace dataset

prompts:
- 'Act as {{act}}. {{prompt}}'

providers:
- openai:gpt-4.1-mini

tests: huggingface://datasets/fka/awesome-chatgpt-prompts?split=train

Question answering with limits

promptfooconfig.yaml
description: SQUAD evaluation with authentication

prompts:
- 'Question: {{question}}\nContext: {{context}}\nAnswer:'

providers:
- openai:gpt-4.1-mini

tests: huggingface://datasets/rajpurkar/squad?split=validation&limit=100

env:
HF_TOKEN: your_token_here

Example projects

ExampleUse CaseKey Features
Basic SetupSimple evaluationDefault parameters
MMLU ComparisonQuery parametersSplit, subset, limit
Red Team SafetySafety testingBeaverTails dataset

Troubleshooting

Authentication errors

Ensure your HuggingFace token is set correctly: export HF_TOKEN=your_token

Dataset not found

Verify the dataset path format: owner/repo (e.g., rajpurkar/squad)

Empty results

Check that the specified split exists for the dataset. Try split=train if split=test returns no results.

Performance issues

Add the limit parameter to reduce the number of rows loaded: &limit=100

See Also