Context relevance

Measures what fraction of retrieved context is minimally needed to answer the query.

Use when: You want to check if your retrieval is returning too much irrelevant content.

How it works: Extracts only the sentences absolutely required to answer the query. Score = required sentences / total sentences.

warning

This metric finds the MINIMUM needed, not all relevant content. A low score might mean good retrieval (found answer plus supporting context) or bad retrieval (lots of irrelevant content).

Example:

Query: "What is the capital of France?"
Context: "Paris is the capital. France has great wine. The Eiffel Tower is in Paris."
Score: 0.33 (only first sentence required)

Configuration

assert:
  - type: context-relevance
    threshold: 0.3 # At least 30% should be essential

Required fields

query - User's question (in test vars)
context - Retrieved text (in vars or via contextTransform)
threshold - Minimum score 0-1 (default: 0)

Full example

tests:
  - vars:
      query: 'What is the capital of France?'
      context: 'Paris is the capital of France.'
    assert:
      - type: context-relevance
        threshold: 0.8 # Most content should be essential

Array context

Context can be provided as an array of chunks:

tests:
  - vars:
      query: 'What are the benefits of RAG systems?'
      context:
        - 'RAG systems improve factual accuracy by incorporating external knowledge sources.'
        - 'They reduce hallucinations in large language models through grounded responses.'
        - 'RAG enables up-to-date information retrieval beyond training data cutoffs.'
        - 'The weather forecast shows rain this weekend.' # irrelevant chunk
    assert:
      - type: context-relevance
        threshold: 0.5 # Score: 3/4 = 0.75

Dynamic context extraction

For RAG systems that return context with their response:

# Provider returns { answer: "...", context: "..." }
assert:
  - type: context-relevance
    contextTransform: 'output.context' # Extract context field
    threshold: 0.3

contextTransform can also return an array:

assert:
  - type: context-relevance
    contextTransform: 'output.chunks' # Extract chunks array
    threshold: 0.5

Score interpretation

0.8-1.0: Almost all content is essential (very focused or minimal retrieval)
0.3-0.7: Mixed essential and supporting content (often ideal)
0.0-0.3: Mostly non-essential content (may indicate poor retrieval)

Limitations

Only identifies minimum sufficient content
Single context strings split by lines (use arrays for better accuracy)
Score interpretation varies by use case

context-faithfulness - Does output stay faithful to context?
context-recall - Does context support expected answer?

Configuration​

Required fields​

Full example​

Array context​

Dynamic context extraction​

Score interpretation​

Limitations​

Related metrics​

Further reading​