Documentation

Welcome to llmaz

llmaz (pronounced /lima:z/), aims to provide a Production-Ready inference platform for large language models on Kubernetes. It closely integrates with the state-of-the-art inference backends to bring the leading-edge researches to cloud.

High Level Overview

infrastructure

Architecture

architecture

Read to get started?


Getting Started

This section contains the tutorials for llmaz.

Features

This section contains the advanced features of llmaz.

Integrations

This section contains the llmaz integration information.

Develop Guidance

This section contains a develop guidance for people who want to learn more about this project.

Reference

This section contains the llmaz reference information.