Key Features
Easy of Use
People can quick deploy a LLM service with minimal configurations.
Broad Backends Support
llmaz supports a wide range of advanced inference backends for different scenarios, like vLLM, Text-Generation-Inference, SGLang, llama.cpp. Find the full list of supported backends here.
Heterogeneous Cluster Support
llmaz supports serving the same LLM with various accelerators to optimize cost and performance.
Various Model Providers
llmaz supports a wide range of model providers, such as HuggingFace, ModelScope, ObjectStores. llmaz will automatically handle the model loading, requiring no effort from users.
Distributed Serving
Multi-host & homogeneous xPyD distributed serving support with LWS from day 0. Will implement the heterogeneous xPyD in the future.
AI Gateway Support
Offering capabilities like token-based rate limiting, model routing with the integration of Envoy AI Gateway.
Scaling Efficiency
Build-in ChatUI
Out-of-the-box chatbot support with the integration of Open WebUI, offering capacities like function call, RAG, web search and more, see configurations here.