Skip to main content

HuggingFace Datasets

Promptfoo can import test cases directly from HuggingFace datasets using the huggingface://datasets/ prefix.

Basic usage​

To load an entire dataset:

tests: huggingface://datasets/fka/awesome-chatgpt-prompts

Run the evaluation:

npx promptfoo eval

Each dataset row becomes a test case with all dataset fields available as variables.

Dataset splits​

Load specific portions of datasets using query parameters:

# Load from training split
tests: huggingface://datasets/fka/awesome-chatgpt-prompts?split=train

# Load from validation split with custom configuration
tests: huggingface://datasets/fka/awesome-chatgpt-prompts?split=validation&config=custom

Use dataset fields in prompts​

Dataset fields automatically become prompt variables. Here's how:

promptfooconfig.yaml
prompts:
- "Question: {{question}}\nAnswer:"

tests: huggingface://datasets/rajpurkar/squad

Query parameters​

ParameterDescriptionDefault
splitDataset split to load (train/test/validation)test
configDataset configuration namedefault
subsetDataset subset (for multi-subset datasets)none
limitMaximum number of test cases to loadunlimited

The loader accepts any parameter supported by the HuggingFace Datasets API. Additional parameters beyond these common ones are passed directly to the API.

To limit the number of test cases:

tests: huggingface://datasets/fka/awesome-chatgpt-prompts?split=train&limit=50

To load a specific subset (common with MMLU datasets):

tests: huggingface://datasets/cais/mmlu?split=test&subset=physics&limit=10

Authentication​

For private datasets or increased rate limits, authenticate using your HuggingFace token. Set one of these environment variables:

# Any of these environment variables will work:
export HF_TOKEN=your_token_here
export HF_API_TOKEN=your_token_here
export HUGGING_FACE_HUB_TOKEN=your_token_here
info

Authentication is required for private datasets and gated models. For public datasets, authentication is optional but provides higher rate limits.

Implementation details​

  • Each dataset row becomes a test case
  • All dataset fields are available as prompt variables
  • Large datasets are automatically paginated (100 rows per request)
  • Variable expansion is disabled to preserve original data

Example configurations​

Basic chatbot evaluation​

promptfooconfig.yaml
description: Testing with HuggingFace dataset

prompts:
- 'Act as {{act}}. {{prompt}}'

providers:
- openai:gpt-4.1-mini

tests: huggingface://datasets/fka/awesome-chatgpt-prompts?split=train

Question answering with limits​

promptfooconfig.yaml
description: SQUAD evaluation with authentication

prompts:
- 'Question: {{question}}\nContext: {{context}}\nAnswer:'

providers:
- openai:gpt-4.1-mini

tests: huggingface://datasets/rajpurkar/squad?split=validation&limit=100

env:
HF_TOKEN: your_token_here

Example projects​

ExampleUse CaseKey Features
Basic SetupSimple evaluationDefault parameters
MMLU ComparisonQuery parametersSplit, subset, limit
Red Team SafetySafety testingBeaverTails dataset

Troubleshooting​

Authentication errors​

Ensure your HuggingFace token is set correctly: export HF_TOKEN=your_token

Dataset not found​

Verify the dataset path format: owner/repo (e.g., rajpurkar/squad)

Empty results​

Check that the specified split exists for the dataset. Try split=train if split=test returns no results.

Performance issues​

Add the limit parameter to reduce the number of rows loaded: &limit=100

See Also​