Documentation

less than a minute

Welcome to llmaz

llmaz (pronounced /lima:z/), aims to provide a Production-Ready inference platform for large language models on Kubernetes. It closely integrates with the state-of-the-art inference backends to bring the leading-edge researches to cloud.

High Level Overview

infrastructure

Architecture

architecture

Read to get started?

Getting Started

This section contains the tutorials for llmaz.

Features

This section contains the advanced features of llmaz.

Integrations

This section contains the llmaz integration information.

Develop Guidance

This section contains a develop guidance for people who want to learn more about this project.

Reference

This section contains the llmaz reference information.

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.

Last modified October 30, 2025: Merge pull request #498 from X1aoZEOuO/feat/0-1-activator (3c13ff3)