Skip to main content


vllm's OpenAI-compatible server offers access to many supported models for local inference from Huggingface Transformers.

In order to use vllm in your eval, set the apiBaseUrl variable to http://localhost:8080 (or wherever you're hosting vllm).

Here's an example config that uses Mixtral-8x7b for text completions:

- id: openai:completion:mistralai/Mixtral-8x7B-v0.1
apiBaseUrl: http://localhost:8080/v1

If desired, you can instead use the OPENAI_BASE_URL environment variable instead of the apiBaseUrl config.