All Models
18 Models · 3 Providers · PII Redacted

🦙Llama Models

by Meta (Open Source)

Meta's open-weights Llama family is the most widely deployed open-source LLM series. Compare Llama API pricing across Groq, Together, and DeepInfra to find the cheapest Llama provider. Llama 4 introduced mixture-of-experts (Maverick) and a long-context variant (Scout), while Llama 3.3 remains a cost-efficient workhorse for production workloads.

From $0.03/M tokens
3 providers
28+ PII entities redacted

Why deploy Llama through OpenSourceAIHub?

Automatic PII Redaction

Every Llama request is scanned for 28+ PII entity types — SSNs, credit cards, emails, API keys, and more — before it reaches any provider.

Smart Cost Routing

Llama is available across 3 providers. Our Smart Router picks the cheapest one per-request. 25% managed markup / 0% on Pro BYOK.

Zero Code Changes

Change two lines in your OpenAI SDK — base_url and api_key — and every request flows through the Hub. Full backward compatibility.

Full Observability

Per-request logging of token counts, latency, DLP violations, and cost. Never wonder what your AI spend is again.

Llama Strengths

  • Open-weights — full transparency and audit capability
  • Multi-provider availability (Groq, Together, DeepInfra) for cost arbitrage
  • Llama 4 Maverick: 400B MoE with vision support
  • Llama 4 Scout: 512K context window for long-document tasks
  • Fine-tuning friendly — build domain-specific models on open weights

Available Llama Models (18)

Meta Llama 3 8B Instruct Reference

oah/llama-3-8b-chat-hf
Open Source

Deploy Meta Llama 3 8B Instruct Reference with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai
Input: $0.20/MOutput: $0.20/M

Meta Llama 3.1 405B Instruct

oah/llama-3.1
Open Source

Deploy Meta Llama 3.1 405B Instruct with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai
Input: $3.50/MOutput: $3.50/M

Llama 3.2 1B

oah/llama-3.2
Open Source

Deploy Llama 3.2 1B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai
Input: $0.06/MOutput: $0.06/M

Meta Llama 3.3 70B Instruct Turbo

oah/llama-3.3
Open Source

Deploy Meta Llama 3.3 70B Instruct Turbo with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.aiDeepInfra
Input: $0.13/MOutput: $0.39/M

Llama 4 Maverick Instruct (17Bx128E) FP8

oah/llama-4-maverick
Open Source

Deploy Llama 4 Maverick Instruct (17Bx128E) FP8 with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.aiDeepInfra
Input: $0.15/MOutput: $0.60/M

Llama 4 Scout Instruct (17Bx16E)

oah/llama-4-scout
Open Source

Deploy Llama 4 Scout Instruct (17Bx16E) with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.aiGroqDeepInfra
Input: $0.11/MOutput: $0.34/M

Meta Llama 3 8B Instruct Lite

oah/meta-llama-3-8b-instruct-lite
Open Source

Deploy Meta Llama 3 8B Instruct Lite with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai
Input: $0.10/MOutput: $0.10/M

Meta Llama 3.1 8B Instruct Turbo

oah/meta-llama-3.1
Open Source

Deploy Meta Llama 3.1 8B Instruct Turbo with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.aiDeepInfra
Input: $0.06/MOutput: $0.06/M

nim/nvidia/llama-3.3-nemotron-super-49b-v1

oah/nvidia/llama-3.3-nemotron-super-49b
Open Source

Deploy nim/nvidia/llama-3.3-nemotron-super-49b-v1 with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai
Input: Free/MOutput: Free/M

llama-3.1-8b-instant

oah/llama-3.1-8b-instant
Open Source

Deploy llama-3.1-8b-instant with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Groq
Input: $0.05/MOutput: $0.08/M

llama-3.3-70b-versatile

oah/llama-3.3-70b-versatile
Open Source

Deploy llama-3.3-70b-versatile with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Groq
Input: $0.59/MOutput: $0.79/M

NousResearch/Hermes-3-Llama-3.1-405B

oah/hermes-3-llama-3.1
Open Source

Deploy NousResearch/Hermes-3-Llama-3.1-405B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

DeepInfra
Input: $0.30/MOutput: $0.30/M

deepseek-ai/DeepSeek-R1-Distill-Llama-70B

oah/deepseek-r1-distill-llama
Open Source

Deploy deepseek-ai/DeepSeek-R1-Distill-Llama-70B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

DeepInfra
ReasoningInput: $0.20/MOutput: $0.60/M

meta-llama/Llama-3.2-11B-Vision-Instruct

oah/llama-3.2-11b-vision
Open Source

Deploy meta-llama/Llama-3.2-11B-Vision-Instruct with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

DeepInfra
Input: $0.05/MOutput: $0.05/M

meta-llama/Llama-Guard-4-12B

oah/llama-guard-4
Open Source

Deploy meta-llama/Llama-Guard-4-12B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

DeepInfra
Input: $0.18/MOutput: $0.18/M

meta-llama/Meta-Llama-3-8B-Instruct

oah/meta-llama-3
Open Source

Deploy meta-llama/Meta-Llama-3-8B-Instruct with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

DeepInfra
Input: $0.03/MOutput: $0.06/M

nvidia/Llama-3.1-Nemotron-70B-Instruct

oah/llama-3.1-nemotron
Open Source

Deploy nvidia/Llama-3.1-Nemotron-70B-Instruct with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

DeepInfra
Input: $0.60/MOutput: $0.60/M

nvidia/Llama-3.3-Nemotron-Super-49B-v1.5

oah/llama-3.3-nemotron-super-49b
Open Source

Deploy nvidia/Llama-3.3-Nemotron-Super-49B-v1.5 with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

DeepInfra
Input: $0.10/MOutput: $0.40/M

Llama Pricing Comparison (per 1M tokens, USD)

Input / Output pricing by provider. Managed Mode adds a 25% managed markup. Pro BYOK = 0% markup.

ModelParamsContextVisionTogether.aiDeepInfraGroq
Meta Llama 3 8B Instruct Reference
oah/llama-3-8b-chat-hf
8KNo
$0.20/$0.20
Meta Llama 3.1 405B Instruct
oah/llama-3.1
4KNo
$3.50/$3.50
Llama 3.2 1B
oah/llama-3.2
131KNo
$0.06/$0.06
Meta Llama 3.3 70B Instruct Turbo
oah/llama-3.3
131KNo
$0.88/$0.88
$0.13/$0.39
Llama 4 Maverick Instruct (17Bx128E) FP8
oah/llama-4-maverick
1.0MNo
$0.27/$0.85
$0.15/$0.60
Llama 4 Scout Instruct (17Bx16E)
oah/llama-4-scout
1.0MNo
$0.18/$0.59
$0.15/$0.45
$0.11/$0.34
Meta Llama 3 8B Instruct Lite
oah/meta-llama-3-8b-instruct-lite
8KNo
$0.10/$0.10
Meta Llama 3.1 8B Instruct Turbo
oah/meta-llama-3.1
131KNo
$0.18/$0.18
$0.06/$0.06
nim/nvidia/llama-3.3-nemotron-super-49b-v1
oah/nvidia/llama-3.3-nemotron-super-49b
16KNo
Free/Free
llama-3.1-8b-instant
oah/llama-3.1-8b-instant
131KNo
$0.05/$0.08
llama-3.3-70b-versatile
oah/llama-3.3-70b-versatile
131KNo
$0.59/$0.79
NousResearch/Hermes-3-Llama-3.1-405B
oah/hermes-3-llama-3.1
No
$0.30/$0.30
deepseek-ai/DeepSeek-R1-Distill-Llama-70B
oah/deepseek-r1-distill-llama
No
$0.20/$0.60
meta-llama/Llama-3.2-11B-Vision-Instruct
oah/llama-3.2-11b-vision
No
$0.05/$0.05
meta-llama/Llama-Guard-4-12B
oah/llama-guard-4
No
$0.18/$0.18
meta-llama/Meta-Llama-3-8B-Instruct
oah/meta-llama-3
No
$0.03/$0.06
nvidia/Llama-3.1-Nemotron-70B-Instruct
oah/llama-3.1-nemotron
No
$0.60/$0.60
nvidia/Llama-3.3-Nemotron-Super-49B-v1.5
oah/llama-3.3-nemotron-super-49b
No
$0.10/$0.40

Llama Direct vs OpenSourceAIHub

What you get at each pricing tier. Hub adds security, governance, and multi-provider routing on top of raw API access.

ModeWhat You PayPII RedactionBudget CapsRoutingAudit Trail
Direct to MetaProvider pricing onlyNoneNoneManualNone
Hub — Managed ModeProvider + 25% markup28+ PII typesPer-key hard capsSmart RouterFull compliance log
Hub — Pro BYOK ($29/mo)Direct to provider (0% markup)28+ PII typesPer-key hard capsSmart RouterFull compliance log

Popular Use Cases

1

Privacy-sensitive deployments requiring model auditability

2

Cost-optimized chatbots and customer support agents

3

Long-document summarization and analysis (Scout 512K context)

4

Multi-provider redundancy with automatic failover

Integration — 2 Lines

from openai import OpenAI

client = OpenAI(
    base_url="https://api.opensourceaihub.ai/v1",
    api_key="your_hub_api_key"
)

# Use any virtual model name from the pricing table above
response = client.chat.completions.create(
    model="oah/llama-3-8b-chat-hf",
    messages=[{"role": "user", "content": "Hello!"}]
)

Use any virtual model name from the pricing table above (prefixed with oah/). Works with the standard OpenAI SDK. Every request is PII-scanned before reaching Meta (Open Source).

Frequently Asked Questions

What is the Llama API pricing on OpenSourceAIHub?
Llama API pricing varies by provider and model variant. In Managed Mode, we add a 25% markup on top of the provider's rate. With Pro BYOK ($29/mo), you pay the provider directly at 0% markup. Our Smart Router automatically picks the cheapest available provider for each request.
Who is the cheapest Llama provider?
Llama 3.3 70B on Groq is typically the cheapest Llama provider. Our Smart Router compares real-time pricing across Groq, Together.ai, and DeepInfra and automatically routes to the cheapest one for you.
Llama 3 vs Llama 4 — which should I use?
Llama 4 Maverick (400B MoE) and Llama 4 Scout (512K context) are the latest and most capable. Llama 3.3 70B remains the best value for cost-sensitive production workloads. All versions run through the Hub with identical PII protection.
Can I use my own Llama API keys with OpenSourceAIHub?
Yes. With Pro BYOK mode, store your Groq, Together.ai, or DeepInfra keys in the Hub (AES-256 encrypted). We route requests through your account at 0% markup — you only pay the provider directly.
Does OpenSourceAIHub store my Llama prompts?
No. Prompts are processed in volatile memory (RAM) and discarded immediately. We never persist, log, or train on your content. Only metadata (token counts, latency, violation types) is stored.
What happens when a Llama model is retired?
Retired models automatically move to the 'Previous Versions' section on this page. If a replacement exists, it's shown alongside. Your API calls will return a clear error indicating the model is deprecated.

Deploy Llama with Enterprise-Grade Security

Get started with 1,000,000 free credits. Every Llama request is PII-scanned, cost-optimized, and fully logged — zero configuration.

Not ready yet? Get notified about Llama updates:

Explore Other Model Families

Model registry last updated: . Pricing shown is the lowest available rate across providers (per 1M tokens, USD). Actual pricing depends on provider and plan.