How it works
Create a list of test cases
Use a representative sample of user inputs to reduce subjectivity when tuning prompts.
Set up evaluation metrics
Use built-in metrics, LLM-graded evals, or define your own custom metrics.
Select the best prompt & model
Compare prompts and model outputs side-by-side, or integrate the library into your existing test/CI workflow.