Llama Models — Pricing, Providers & PII Redaction | OpenSourceAIHub

Why deploy Llama through OpenSourceAIHub?

Automatic PII Redaction

Every Llama request is scanned for 28+ PII entity types — SSNs, credit cards, emails, API keys, and more — before it reaches any provider.

Smart Cost Routing

Llama is available across 3 providers. Our Smart Router picks the cheapest one per-request. 25% managed markup / 0% on Pro BYOK.

Zero Code Changes

Change two lines in your OpenAI SDK — base_url and api_key — and every request flows through the Hub. Full backward compatibility.

Full Observability

Per-request logging of token counts, latency, DLP violations, and cost. Never wonder what your AI spend is again.

Llama Strengths

Open-weights — full transparency and audit capability
Multi-provider availability (Groq, Together, DeepInfra) for cost arbitrage
Llama 4 Maverick: 400B MoE with vision support
Llama 4 Scout: 512K context window for long-document tasks
Fine-tuning friendly — build domain-specific models on open weights

Available Llama Models (18)

Meta Llama 3 8B Instruct Reference

oah/llama-3-8b-chat-hf

Open Source

Deploy Meta Llama 3 8B Instruct Reference with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai

Input: $0.20/MOutput: $0.20/M

Meta Llama 3.1 405B Instruct

oah/llama-3.1

Open Source

Deploy Meta Llama 3.1 405B Instruct with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai

Input: $3.50/MOutput: $3.50/M

Llama 3.2 1B

oah/llama-3.2

Open Source

Deploy Llama 3.2 1B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai

Input: $0.06/MOutput: $0.06/M

Meta Llama 3.3 70B Instruct Turbo

oah/llama-3.3

Open Source

Deploy Meta Llama 3.3 70B Instruct Turbo with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.aiDeepInfra

Input: $0.13/MOutput: $0.39/M

Llama 4 Maverick Instruct (17Bx128E) FP8

oah/llama-4-maverick

Open Source

Deploy Llama 4 Maverick Instruct (17Bx128E) FP8 with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.aiDeepInfra

Input: $0.15/MOutput: $0.60/M

Llama 4 Scout Instruct (17Bx16E)

oah/llama-4-scout

Open Source

Deploy Llama 4 Scout Instruct (17Bx16E) with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.aiGroqDeepInfra

Input: $0.11/MOutput: $0.34/M

Meta Llama 3 8B Instruct Lite

oah/meta-llama-3-8b-instruct-lite

Open Source

Deploy Meta Llama 3 8B Instruct Lite with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai

Input: $0.10/MOutput: $0.10/M

Meta Llama 3.1 8B Instruct Turbo

oah/meta-llama-3.1

Open Source

Deploy Meta Llama 3.1 8B Instruct Turbo with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.aiDeepInfra

Input: $0.06/MOutput: $0.06/M

nim/nvidia/llama-3.3-nemotron-super-49b-v1

oah/nvidia/llama-3.3-nemotron-super-49b

Open Source

Deploy nim/nvidia/llama-3.3-nemotron-super-49b-v1 with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Together.ai

Input: Free/MOutput: Free/M

llama-3.1-8b-instant

oah/llama-3.1-8b-instant

Open Source

Deploy llama-3.1-8b-instant with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Groq

Input: $0.05/MOutput: $0.08/M

llama-3.3-70b-versatile

oah/llama-3.3-70b-versatile

Open Source

Deploy llama-3.3-70b-versatile with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

Groq

Input: $0.59/MOutput: $0.79/M

NousResearch/Hermes-3-Llama-3.1-405B

oah/hermes-3-llama-3.1

Open Source

Deploy NousResearch/Hermes-3-Llama-3.1-405B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

DeepInfra

Input: $0.30/MOutput: $0.30/M

deepseek-ai/DeepSeek-R1-Distill-Llama-70B

oah/deepseek-r1-distill-llama

Open Source

Deploy deepseek-ai/DeepSeek-R1-Distill-Llama-70B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

DeepInfra

ReasoningInput: $0.20/MOutput: $0.60/M

meta-llama/Llama-3.2-11B-Vision-Instruct

oah/llama-3.2-11b-vision

Open Source

Deploy meta-llama/Llama-3.2-11B-Vision-Instruct with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

DeepInfra

Input: $0.05/MOutput: $0.05/M

meta-llama/Llama-Guard-4-12B

oah/llama-guard-4

Open Source

Deploy meta-llama/Llama-Guard-4-12B with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

DeepInfra

Input: $0.18/MOutput: $0.18/M

meta-llama/Meta-Llama-3-8B-Instruct

oah/meta-llama-3

Open Source

Deploy meta-llama/Meta-Llama-3-8B-Instruct with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

DeepInfra

Input: $0.03/MOutput: $0.06/M

nvidia/Llama-3.1-Nemotron-70B-Instruct

oah/llama-3.1-nemotron

Open Source

Deploy nvidia/Llama-3.1-Nemotron-70B-Instruct with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

DeepInfra

Input: $0.60/MOutput: $0.60/M

nvidia/Llama-3.3-Nemotron-Super-49B-v1.5

oah/llama-3.3-nemotron-super-49b

Open Source

Deploy nvidia/Llama-3.3-Nemotron-Super-49B-v1.5 with built-in PII redaction and Hub governance. Available on Managed Credits and BYOK.

DeepInfra

Input: $0.10/MOutput: $0.40/M

Llama Pricing Comparison (per 1M tokens, USD)

Input / Output pricing by provider. Managed Mode adds a 25% managed markup. Pro BYOK = 0% markup.

Model	Params	Context	Vision	Together.ai	DeepInfra	Groq
Meta Llama 3 8B Instruct Reference `oah/llama-3-8b-chat-hf`	—	8K	No	$0.20/$0.20	—	—
Meta Llama 3.1 405B Instruct `oah/llama-3.1`	—	4K	No	$3.50/$3.50	—	—
Llama 3.2 1B `oah/llama-3.2`	—	131K	No	$0.06/$0.06	—	—
Meta Llama 3.3 70B Instruct Turbo `oah/llama-3.3`	—	131K	No	$0.88/$0.88	$0.13/$0.39	—
Llama 4 Maverick Instruct (17Bx128E) FP8 `oah/llama-4-maverick`	—	1.0M	No	$0.27/$0.85	$0.15/$0.60	—
Llama 4 Scout Instruct (17Bx16E) `oah/llama-4-scout`	—	1.0M	No	$0.18/$0.59	$0.15/$0.45	$0.11/$0.34
Meta Llama 3 8B Instruct Lite `oah/meta-llama-3-8b-instruct-lite`	—	8K	No	$0.10/$0.10	—	—
Meta Llama 3.1 8B Instruct Turbo `oah/meta-llama-3.1`	—	131K	No	$0.18/$0.18	$0.06/$0.06	—
nim/nvidia/llama-3.3-nemotron-super-49b-v1 `oah/nvidia/llama-3.3-nemotron-super-49b`	—	16K	No	Free/Free	—	—
llama-3.1-8b-instant `oah/llama-3.1-8b-instant`	—	131K	No	—	—	$0.05/$0.08
llama-3.3-70b-versatile `oah/llama-3.3-70b-versatile`	—	131K	No	—	—	$0.59/$0.79
NousResearch/Hermes-3-Llama-3.1-405B `oah/hermes-3-llama-3.1`	—	—	No	—	$0.30/$0.30	—
deepseek-ai/DeepSeek-R1-Distill-Llama-70B `oah/deepseek-r1-distill-llama`	—	—	No	—	$0.20/$0.60	—
meta-llama/Llama-3.2-11B-Vision-Instruct `oah/llama-3.2-11b-vision`	—	—	No	—	$0.05/$0.05	—
meta-llama/Llama-Guard-4-12B `oah/llama-guard-4`	—	—	No	—	$0.18/$0.18	—
meta-llama/Meta-Llama-3-8B-Instruct `oah/meta-llama-3`	—	—	No	—	$0.03/$0.06	—
nvidia/Llama-3.1-Nemotron-70B-Instruct `oah/llama-3.1-nemotron`	—	—	No	—	$0.60/$0.60	—
nvidia/Llama-3.3-Nemotron-Super-49B-v1.5 `oah/llama-3.3-nemotron-super-49b`	—	—	No	—	$0.10/$0.40	—

Llama Direct vs OpenSourceAIHub

What you get at each pricing tier. Hub adds security, governance, and multi-provider routing on top of raw API access.

Mode	What You Pay	PII Redaction	Budget Caps	Routing	Audit Trail
Direct to Meta	Provider pricing only	None	None	Manual	None
Hub — Managed Mode	Provider + 25% markup	28+ PII types	Per-key hard caps	Smart Router	Full compliance log
Hub — Pro BYOK ($29/mo)	Direct to provider (0% markup)	28+ PII types	Per-key hard caps	Smart Router	Full compliance log

Popular Use Cases

Privacy-sensitive deployments requiring model auditability

Cost-optimized chatbots and customer support agents

Long-document summarization and analysis (Scout 512K context)

Multi-provider redundancy with automatic failover

Integration — 2 Lines

from openai import OpenAI

client = OpenAI(
    base_url="https://api.opensourceaihub.ai/v1",
    api_key="your_hub_api_key"
)

# Use any virtual model name from the pricing table above
response = client.chat.completions.create(
    model="oah/llama-3-8b-chat-hf",
    messages=[{"role": "user", "content": "Hello!"}]
)

Use any virtual model name from the pricing table above (prefixed with oah/). Works with the standard OpenAI SDK. Every request is PII-scanned before reaching Meta (Open Source).

Frequently Asked Questions

What is the Llama API pricing on OpenSourceAIHub?

Llama API pricing varies by provider and model variant. In Managed Mode, we add a 25% markup on top of the provider's rate. With Pro BYOK ($29/mo), you pay the provider directly at 0% markup. Our Smart Router automatically picks the cheapest available provider for each request.

Who is the cheapest Llama provider?

Llama 3.3 70B on Groq is typically the cheapest Llama provider. Our Smart Router compares real-time pricing across Groq, Together.ai, and DeepInfra and automatically routes to the cheapest one for you.

Llama 3 vs Llama 4 — which should I use?

Llama 4 Maverick (400B MoE) and Llama 4 Scout (512K context) are the latest and most capable. Llama 3.3 70B remains the best value for cost-sensitive production workloads. All versions run through the Hub with identical PII protection.

Can I use my own Llama API keys with OpenSourceAIHub?

Yes. With Pro BYOK mode, store your Groq, Together.ai, or DeepInfra keys in the Hub (AES-256 encrypted). We route requests through your account at 0% markup — you only pay the provider directly.

Does OpenSourceAIHub store my Llama prompts?

No. Prompts are processed in volatile memory (RAM) and discarded immediately. We never persist, log, or train on your content. Only metadata (token counts, latency, violation types) is stored.

What happens when a Llama model is retired?

Retired models automatically move to the 'Previous Versions' section on this page. If a replacement exists, it's shown alongside. Your API calls will return a clear error indicating the model is deprecated.

Deploy Llama with Enterprise-Grade Security

Get started with 1,000,000 free credits. Every Llama request is PII-scanned, cost-optimized, and fully logged — zero configuration.

Get 1,000,000 Free Credits Free PII Leak Checker

Not ready yet? Get notified about Llama updates:

Explore Other Model Families

🧠GPT

OpenAI's GPT family powers the majority of commercial AI applications. Compare GPT-4 API cost and OpenAI API pricing acr…

💎Gemini

Google's Gemini family offers powerful multimodal capabilities with large context windows. Compare Gemini API pricing an…

🤖Claude

Anthropic's Claude family is built with safety and reliability at its core. Compare Claude API pricing and Claude Sonnet…

🔍DeepSeek

DeepSeek has rapidly risen as a leading open-source model family, known for exceptional coding performance and cost effi…

🌀Mistral

Mistral AI's model family spans from compact open-weights models to powerful commercial variants. Compare Mistral API pr…

← View all 10 model families

Model registry last updated: . Pricing shown is the lowest available rate across providers (per 1M tokens, USD). Actual pricing depends on provider and plan.

🦙Llama Models

Why deploy Llama through OpenSourceAIHub?

Automatic PII Redaction

Smart Cost Routing

Zero Code Changes

Full Observability

Llama Strengths

Available Llama Models (18)

Meta Llama 3 8B Instruct Reference

Meta Llama 3.1 405B Instruct

Llama 3.2 1B

Meta Llama 3.3 70B Instruct Turbo

Llama 4 Maverick Instruct (17Bx128E) FP8

Llama 4 Scout Instruct (17Bx16E)

Meta Llama 3 8B Instruct Lite

Meta Llama 3.1 8B Instruct Turbo

nim/nvidia/llama-3.3-nemotron-super-49b-v1

llama-3.1-8b-instant

llama-3.3-70b-versatile

NousResearch/Hermes-3-Llama-3.1-405B

deepseek-ai/DeepSeek-R1-Distill-Llama-70B

meta-llama/Llama-3.2-11B-Vision-Instruct

meta-llama/Llama-Guard-4-12B

meta-llama/Meta-Llama-3-8B-Instruct

nvidia/Llama-3.1-Nemotron-70B-Instruct

nvidia/Llama-3.3-Nemotron-Super-49B-v1.5

Llama Pricing Comparison (per 1M tokens, USD)

Llama Direct vs OpenSourceAIHub

Popular Use Cases

Integration — 2 Lines

Frequently Asked Questions

Deploy Llama with Enterprise-Grade Security

Explore Other Model Families