Llama API
Section titled “Llama API”!!! info “Language Support” This provider is only supported in Python.
Llama API is a Meta-hosted API service that helps you integrate Llama models into your applications quickly and efficiently.
Llama API provides access to Llama models through a simple API interface, with inference provided by Meta, so you can focus on building AI-powered solutions without managing your own inference infrastructure.
With Llama API, you get access to state-of-the-art AI capabilities through a developer-friendly interface designed for simplicity and performance.
Installation
Section titled “Installation”Llama API is configured as an optional dependency in Strands Agents. To install, run:
pip install 'strands-agents[llamaapi]' strands-agents-toolsAfter installing llamaapi, you can import and initialize Strands Agents’ Llama API provider as follows:
from strands import Agentfrom strands.models.llamaapi import LlamaAPIModelfrom strands_tools import calculator
model = LlamaAPIModel( client_args={ "api_key": "<KEY>", }, # **model_config model_id="Llama-4-Maverick-17B-128E-Instruct-FP8",)
agent = Agent(model=model, tools=[calculator])response = agent("What is 2+2")print(response)Configuration
Section titled “Configuration”Client Configuration
Section titled “Client Configuration”The client_args configure the underlying LlamaAPI client. For a complete list of available arguments, please refer to the LlamaAPI docs.
Model Configuration
Section titled “Model Configuration”The model_config configures the underlying model selected for inference. The supported configurations are:
| Parameter | Description | Example | Options |
|---|---|---|---|
model_id | ID of a model to use | Llama-4-Maverick-17B-128E-Instruct-FP8 | reference |
repetition_penalty | Controls the likelihood and generating repetitive responses. (minimum: 1, maximum: 2, default: 1) | 1 | reference |
temperature | Controls randomness of the response by setting a temperature. | 0.7 | reference |
top_p | Controls diversity of the response by setting a probability threshold when choosing the next token. | 0.9 | reference |
max_completion_tokens | The maximum number of tokens to generate. | 4096 | reference |
top_k | Only sample from the top K options for each subsequent token. | 10 | reference |
Troubleshooting
Section titled “Troubleshooting”Module Not Found
Section titled “Module Not Found”If you encounter the error ModuleNotFoundError: No module named 'llamaapi', this means you haven’t installed the llamaapi dependency in your environment. To fix, run pip install 'strands-agents[llamaapi]'.
Advanced Features
Section titled “Advanced Features”Structured Output
Section titled “Structured Output”Llama API models support structured output through their tool calling capabilities. When you use Agent.structured_output(), the Strands SDK converts your Pydantic models to tool specifications that Llama models can understand.
from pydantic import BaseModel, Fieldfrom strands import Agentfrom strands.models.llamaapi import LlamaAPIModel
class BookAnalysis(BaseModel): """Analyze a book's key information.""" title: str = Field(description="The book's title") author: str = Field(description="The book's author") genre: str = Field(description="Primary genre or category") summary: str = Field(description="Brief summary of the book") rating: int = Field(description="Rating from 1-10", ge=1, le=10)
model = LlamaAPIModel( client_args={"api_key": "<KEY>"}, model_id="Llama-4-Maverick-17B-128E-Instruct-FP8",)
agent = Agent(model=model)
result = agent.structured_output( BookAnalysis, """ Analyze this book: "The Hitchhiker's Guide to the Galaxy" by Douglas Adams. It's a science fiction comedy about Arthur Dent's adventures through space after Earth is destroyed. It's widely considered a classic of humorous sci-fi. """)
print(f"Title: {result.title}")print(f"Author: {result.author}")print(f"Genre: {result.genre}")print(f"Rating: {result.rating}")