Envoy AI Gateway
2 minute read
Envoy AI Gateway is an open source project for using Envoy Gateway to handle request traffic from application clients to Generative AI services.
How to use
1. Enable Envoy Gateway and Envoy AI Gateway
Both of them are enabled by default in values.global.yaml
and will be deployed in llmaz-system.
envoy-gateway:
enabled: true
envoy-ai-gateway:
enabled: true
However, Envoy Gateway and Envoy AI Gateway can be deployed standalone in case you want to deploy them in other namespaces.
2. Basic AI Gateway Example
To expose your models via Envoy Gateway, you need to create a GatewayClass, Gateway, and AIGatewayRoute. The following example shows how to do this.
We’ll deploy two models Qwen/Qwen2-0.5B-Instruct-GGUF
and Qwen/Qwen2.5-Coder-0.5B-Instruct-GGUF
with llama.cpp (cpu only) and expose them via Envoy AI Gateway.
The full example is here, apply it.
3. Check Envoy AI Gateway APIs
If Open-WebUI is enabled, you can chat via the webui (recommended), see documentation. Otherwise, following the steps below to test the Envoy AI Gateway APIs.
I. Port-forwarding the LoadBalancer
service in llmaz-system, like:
kubectl port-forward svc/envoy-default-default-envoy-ai-gateway-dbec795a 8080:80
II. Query http://localhost:8008/v1/models | jq .
, available models will be listed. Expected response will look like this:
{
"data": [
{
"id": "qwen2-0.5b",
"created": 1745327294,
"object": "model",
"owned_by": "Envoy AI Gateway"
},
{
"id": "qwen2.5-coder",
"created": 1745327294,
"object": "model",
"owned_by": "Envoy AI Gateway"
}
],
"object": "list"
}
III. Query http://localhost:8080/v1/chat/completions
to chat with the model. Here, we ask the qwen2-0.5b
model, the query will look like:
curl -H "Content-Type: application/json" -d '{
"model": "qwen2-0.5b",
"messages": [
{
"role": "system",
"content": "Hi."
}
]
}' http://localhost:8080/v1/chat/completions | jq .
Expected response will look like this:
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I assist you today?"
}
}
],
"created": 1745327371,
"model": "qwen2-0.5b",
"system_fingerprint": "b5124-bc091a4d",
"object": "chat.completion",
"usage": {
"completion_tokens": 10,
"prompt_tokens": 10,
"total_tokens": 20
},
"id": "chatcmpl-AODlT8xnf4OjJwpQH31XD4yehHLnurr0",
"timings": {
"prompt_n": 1,
"prompt_ms": 319.876,
"prompt_per_token_ms": 319.876,
"prompt_per_second": 3.1262114069201816,
"predicted_n": 10,
"predicted_ms": 1309.393,
"predicted_per_token_ms": 130.9393,
"predicted_per_second": 7.63712651587415
}
}
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.