News

Llama Stack 0.1.3 is now available! See the release notes for more details.

Llama Stack

Llama Stack defines and standardizes the core building blocks needed to bring generative AI applications to market. It provides a unified set of APIs with implementations from leading service providers, enabling seamless transitions between development and production environments. More specifically, it provides

  • Unified API layer for Inference, RAG, Agents, Tools, Safety, Evals, and Telemetry.

  • Plugin architecture to support the rich ecosystem of implementations of the different APIs in different environments like local development, on-premises, cloud, and mobile.

  • Prepackaged verified distributions which offer a one-stop solution for developers to get started quickly and reliably in any environment

  • Multiple developer interfaces like CLI and SDKs for Python, Node, iOS, and Android

  • Standalone applications as examples for how to build production-grade AI applications with Llama Stack

We focus on making it easy to build production applications with the Llama model family - from the latest Llama 3.3 to specialized models like Llama Guard for safety.

Llama Stack

Our goal is to provide pre-packaged implementations (aka “distributions”) which can be run in a variety of deployment environments. LlamaStack can assist you in your entire app development lifecycle - start iterating on local, mobile or desktop and seamlessly transition to on-prem or public cloud deployments. At every point in this transition, the same set of APIs and the same developer experience is available.

Available SDKs

We have a number of client-side SDKs available for different languages.

Language

Client SDK

Package

Python

llama-stack-client-python

PyPI version

Swift

llama-stack-client-swift

Swift Package Index

Node

llama-stack-client-node

NPM version

Kotlin

llama-stack-client-kotlin

Maven version

Supported Llama Stack Implementations

A number of “adapters” are available for some popular Inference and Vector Store providers. For other APIs (particularly Safety and Agents), we provide reference implementations you can use to get started. We expect this list to grow over time. We are slowly onboarding more providers to the ecosystem as we get more confidence in the APIs.

Inference API

Provider

Environments

Meta Reference

Single Node

Ollama

Single Node

Fireworks

Hosted

Together

Hosted

NVIDIA NIM

Hosted and Single Node

vLLM

Hosted and Single Node

TGI

Hosted and Single Node

AWS Bedrock

Hosted

Cerebras

Hosted

Groq

Hosted

SambaNova

Hosted

PyTorch ExecuTorch

On-device iOS, Android

Vector IO API

Provider

Environments

FAISS

Single Node

Chroma

Hosted and Single Node

Postgres (PGVector)

Hosted and Single Node

Weaviate

Hosted

Safety API

Provider

Environments

Llama Guard

Depends on Inference Provider

Prompt Guard

Single Node

Code Scanner

Single Node

AWS Bedrock

Hosted