Getting Started

This section contains the tutorials for llmaz.

1: Prerequisites
2: Installation
3: Basic Usage

1 - Prerequisites

This section contains the prerequisites for llmaz.

Requirements:

Kubernetes version >= 1.27.
LWS requires Kubernetes version v1.27 or higher. If you are using a lower Kubernetes version and most of your workloads rely on single-node inference, we may consider replacing LWS with a Deployment-based approach. This fallback plan would involve using Kubernetes Deployments to manage single-node inference workloads efficiently. See #32 for more details and updates.
Helm 3, see installation.

Note that llmaz helm chart will by default install:

LWS as the default inference workload in the llmaz-system, if you *already installed it * or want to deploy it in other namespaces , append --set leaderWorkerSet.enabled=false to the command below.
Envoy Gateway and Envoy AI Gateway as the frontier in the llmaz-system, if you already installed these two components or want to deploy in other namespaces , append --set envoy-gateway.enabled=false --set envoy-ai-gateway.enabled=false to the command below.
Open WebUI as the default chatbot, if you want to disable it, append --set open-webui.enabled=false to the command below.

2 - Installation

This section introduces the installation guidance for llmaz.

Install a released version (recommended)

Install

helm install llmaz oci://registry-1.docker.io/inftyai/llmaz --namespace llmaz-system --create-namespace --version 0.0.10

Uninstall

helm uninstall llmaz --namespace llmaz-system
kubectl delete ns llmaz-system

If you want to delete the CRDs as well, run

kubectl delete crd \
    openmodels.llmaz.io \
    backendruntimes.inference.llmaz.io \
    playgrounds.inference.llmaz.io \
    services.inference.llmaz.io

Install from source

Change configurations

If you want to change the default configurations, please change the values in values.global.yaml.

Do not change the values in values.yaml because it’s auto-generated and will be overwritten.

Install

git clone https://github.com/inftyai/llmaz.git && cd llmaz
kubectl create ns llmaz-system && kubens llmaz-system
make helm-install

Uninstall

helm uninstall llmaz --namespace llmaz-system
kubectl delete ns llmaz-system

If you want to delete the CRDs as well, run

kubectl delete crd \
    openmodels.llmaz.io \
    backendruntimes.inference.llmaz.io \
    playgrounds.inference.llmaz.io \
    services.inference.llmaz.io

Upgrade

Once you changed your code, run the command to upgrade the controller:

IMG=<image-registry>:<tag> make helm-upgrade

3 - Basic Usage

This section introduces the basic usage of llmaz.

Let’s assume that you have installed the llmaz with the default settings, which means both the AI Gateway and Open WebUI are installed. Now let’s following the steps to chat with your models.

Deploy the Services

Run the following command to deploy two models (cpu only).

kubectl apply -f https://raw.githubusercontent.com/InftyAI/llmaz/refs/heads/main/docs/examples/envoy-ai-gateway/basic.yaml

Chat with Models

Waiting for your services ready, generally looks like:

NAME                                                            READY   STATUS            RESTARTS   AGE
ai-eg-route-extproc-default-envoy-ai-gateway-6ddcd49b64-ldwcd   1/1     Running           0          6m37s
qwen2--5-coder-0                                                1/1     Running           0          6m37s
qwen2-0--5b-0                                                   1/1     Running           0          6m37s

Once ready, you can access the Open WebUI by port-forwarding the service:

kubectl port-forward svc/open-webui 8080:80 -n llmaz-system

Let’s chat on http://localhost:8080 now, two models are available to you! 🎉