OpenSourceAIHub Documentation — API Reference, Guides & Model Catalog
What is OpenSourceAIHub?
A plain-language introduction — no AI background required.
OpenSourceAIHub is a secure gateway that sits between your application and AI models like Llama, Mistral, Claude, GPT-4o, Gemini, Grok, and others. Think of it as a security checkpoint for every AI request your software makes.
When your application asks an AI model a question (called a “prompt”), that prompt might accidentally contain sensitive data — a customer's email address, a credit card number, an API key, or even a social security number. Without protection, that data gets sent directly to a third-party AI company's servers.
OpenSourceAIHub catches that sensitive data before it ever leaves your control. It scans every request, redacts or blocks the sensitive parts, then forwards the cleaned request to the AI model. Your users get their AI-powered features. Your company stays compliant and secure.
What the Hub Does (In One Sentence Each)
How It Works (The Big Picture)
Core Concepts
Key terms you'll see throughout this documentation, explained simply.
AI Providers (Groq, Together.ai, DeepInfra, Mistral AI, Anthropic, OpenAI, Google Gemini, xAI)
These are the 8 companies that host and run AI models on their servers. Think of them like different cell phone carriers — they all offer similar services (access to AI models) but at different prices and speeds. Groq is known for speed, Together.ai for variety, DeepInfra for cost, Mistral AI for their own models, Anthropic powers Claude, OpenAI powers GPT-4.1, Google Gemini offers Gemini models, and xAI provides Grok. You don't need accounts with any of them to use OpenSourceAIHub — unless you choose BYOK mode.
BYOK (Bring Your Own Key)
"Bring Your Own Key" means you create your own account with an AI provider (like OpenAI, Google Gemini, xAI, Groq, or Together.ai), get an API key from them, and store it in OpenSourceAIHub. When you send a request, the Hub uses YOUR key to talk to the provider. The provider bills you directly. The Hub adds the security layer for free. This is ideal if you already have provider accounts or want maximum control over costs. BYOK is also the only way to access premium closed-source models like GPT-4o, Gemini 1.5 Pro, and Grok-2.
Managed Mode (Wallet / Hub Credits)
Don't want to sign up with AI providers? No problem. In Managed Mode, you add credits to your OpenSourceAIHub wallet ($1 = 1,000,000 Hub Credits). Different models burn credits at different rates per actual token. The Hub uses its own provider keys on your behalf. The Hub charges wholesale cost plus a service fee — 25% markup for open-source models and 30% markup for closed-source models (OpenAI, Gemini, xAI) — deducted from your wallet automatically. This is the easiest way to get started — just top up and go.
Smart Routing
When you use Managed Mode, the Hub doesn't just send your request to one provider — it periodically indexes prices across all supported providers and selects the most cost-efficient route for your request, ensuring high performance at near-wholesale rates. This is best-effort optimization, not a guarantee of the absolute lowest price on every request. You don't need to do anything — it just works.
DLP (Data Loss Prevention) / AI Firewall
DLP is the security layer that scans every request for sensitive data before it reaches the AI model. It can detect 28+ types of sensitive information — from email addresses and credit cards to API keys and social security numbers. When it finds something, it either replaces the data with [REDACTED] (redact mode) or blocks the entire request (block mode), depending on your policy settings.
Wallet
Your prepaid credit balance on OpenSourceAIHub. You add credits via Stripe ($1 = 1,000,000 Hub Credits), and usage is deducted automatically each time you make an AI request in Managed Mode. Think of it like a prepaid balance for AI usage. 1,000,000 Hub Credits can cover thousands of simple AI requests.
Hub Credits
Hub Credits are the abstract currency for your wallet balance. $1.00 = 1,000,000 Hub Credits. Different AI models consume credits at different rates per actual LLM token — lightweight models like Llama 3 8B burn fewer credits per token, while premium models like GPT-4o burn more. This means your credit pool stretches further with efficient models.
API Key
A secret string (like a password) that identifies who you are when making requests. OpenSourceAIHub gives you keys that start with "os_hub_" (user-level) or "oah_" (project-level). You include this key in every request so the Hub knows it's you. Never share your API keys publicly.
Which Mode Should I Use?
Solution Guides
Deep-dive tutorials for specific use cases.
Quickstart
Start sending secure, cost-optimized AI requests in under 2 minutes.
Before You Start
- 1Create an account at app.opensourceaihub.ai using Google or GitHub sign-in.
- 2Get your API key — go to the API Keys page in your dashboard and click “Generate New Key”. Copy and save it immediately (it's shown only once).
- 3Choose your mode — either top up your wallet (Managed Mode, easiest) or store your own provider API keys (BYOK Mode, free). See if you're unsure.
1. Install the OpenAI SDK
npm install openaiOpenSourceAIHub is OpenAI-compatible. Use the standard SDK — just change the baseURL.
2. Make Your First Request
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "os_hub_your_key_here",
baseURL: "https://api.opensourceaihub.ai/v1",
});
const response = await client.chat.completions.create({
model: "oah/llama-3-70b", // virtual model → smart-routed
messages: [
{ role: "user", content: "Explain the CAP theorem." }
],
max_tokens: 512,
});
console.log(response.choices[0].message.content);
// Custom headers are returned:
// x-request-id: req_xxxx
// x-hub-latency: hub overhead ms
// x-dlp-latency: DLP scan msHow Routing Works
Managed Mode (Wallet)
The oah/ prefix activates Smart Routing. The Hub indexes pricing across Groq, Together.ai, DeepInfra, Mistral AI, Anthropic, OpenAI, Google Gemini, and xAI — then selects the most cost-efficient available provider for your request on a best-effort basis. Text models are billed per token; image generation is billed per event across three tiers: Performance (~3,750 credits), Standard (~50,000 credits), and Premium (~100,000 credits). All costs include the wholesale rate plus a Governance & Infrastructure fee (25% open-weight / 30% closed models) and are deducted from your prepaid wallet.
BYOK Mode (Bring Your Own Key)
When your project has a BYOK provider key configured, requests are routed directly to that provider using your own credentials. There is no cost-optimized routing — the Hub uses the provider you configured. You are billed directly by the provider. The full DLP / Firewall layer still applies to every request.
What Happens Behind the Scenes
Every time you send a request to the Hub, here's exactly what happens in order. The entire process typically takes under 100ms of Hub overhead:
- 1Authenticate: The Hub verifies your API key. It checks if the key is valid, not expired, and not revoked.
- 2Resolve Provider: Based on your mode (BYOK or Managed), the Hub determines which AI provider and credentials to use. In Managed Mode, the Smart Router selects the most cost-efficient available option.
- 3Scan for Sensitive Data: The AI Firewall scans every message in your request for PII, credentials, and injection patterns. If your request contains images, those are OCR-scanned too.
- 4Enforce Policy: If sensitive data is found, the Hub either replaces it with [REDACTED] or blocks the entire request — depending on your DLP policy settings.
- 5Check Balance: In Managed Mode, the Hub estimates how much this request will cost and verifies your wallet can cover it. If not, you get a 402 error with a clear message.
- 6Forward to AI: The cleaned, safe request is forwarded to the AI provider via the standard API. The Hub never stores your prompts or responses.
- 7Log Metadata: Only metadata is recorded — token counts, cost, latency, entity types detected. Never the actual content of your prompts or responses.
- 8Return Response: The AI provider's response is returned to your application with extra headers showing Hub latency and a correlation ID for debugging.
The Hub is Stateless — Your App Manages Conversation Memory
This is an important concept to understand: the Hub does not store any conversation history. Every request you send is completely independent. The Hub processes it, scans it, routes it, and immediately forgets it.
So how does a chatbot “remember” earlier messages? That's your app's job. If a user is on their 10th message in a conversation, your application sends all 10 previous messages in the messages array every single time. This is exactly how the OpenAI API works, and the Hub follows the same pattern.
Example: A 3-message conversation
// On the user's 3rd message, your app sends ALL prior messages:
messages: [
{ role: "user", content: "What is DLP?" }, // Message 1
{ role: "assistant", content: "DLP stands for..." }, // AI reply 1
{ role: "user", content: "How does it work?" }, // Message 2
{ role: "assistant", content: "It scans every..." }, // AI reply 2
{ role: "user", content: "Show me an example." } // Message 3 (new)
]
// The Hub scans ALL 5 messages for PII, then forwards to the AI.What this means for security: The Hub scans the entire messages array on every request — not just the newest message. Even if a user accidentally pastes a credit card number in message #2, the Hub catches it when message #5 is sent because message #2 is still in the array.
What this means for billing: In Managed Mode, you are billed for all tokens in the messages array, including the repeated history. Longer conversations cost more because they send more tokens per request. This is how all LLM APIs work — it's not specific to the Hub.
What this means for privacy: The Hub never stores your messages. Once the response is returned, the conversation data is gone from our systems. Only metadata (token count, cost, latency, entity types detected) is logged. Your conversation content is never written to disk or retained.
Supported & Unsupported Endpoints
The Hub implements the OpenAI /v1/chat/completions endpoint — the industry standard used by virtually all LLM applications. Some other OpenAI endpoints are not supported by design:
If your application currently uses the Threads API, you can migrate by managing the messages[] array in your own code (or database) and sending the full conversation with each request. This gives you the same “memory” behavior while keeping the security benefits of our stateless proxy.
Response Headers
Every response includes these custom headers for full transparency:
x-hub-latencyHub overhead latency in ms (excludes upstream provider time)x-dlp-latencyDLP / Firewall scan latency (ms)x-request-idUnique request ID for tracing and support (req_xxxx)Authentication
All requests require a Bearer token. Two key types are supported.
How API Keys Work
Every request to OpenSourceAIHub must include an API key in the Authorization header. This is how the Hub knows who you are, which security policies to apply, and how to bill you. API keys are like passwords for your AI requests — keep them secret and never include them in client-side code (like a website's JavaScript) where anyone can see them.
// Every request includes this header:
Authorization: Bearer os_hub_your_key_hereHub API Key (os_hub_*)
User-LevelYour personal API key, tied to your user account. Created in the API Keys page of the dashboard. Best for individual developers, prototyping, and personal projects. The key is shown only once when created — after that, only an irreversible cryptographic hash is retained, so even our engineering team cannot retrieve your raw key.
Use this when you're a solo developer or want one key for all your projects.
Authorization: Bearer os_hub_*Project API Key (oah_*)
Project-LevelA key scoped to a specific project. Each project can have its own DLP (security) policies, provider settings, and usage analytics. Multiple team members can use the same project key. Created within a project's Keys page.
Use this when you have multiple applications or teams that need separate security policies and usage tracking.
Authorization: Bearer oah_*How Billing Is Determined
Your API key determines billing mode automatically. If your account (or project) has BYOK provider keys stored, the Hub uses your provider credentials — no wallet charge. If no BYOK keys are found, the Hub uses Managed Mode and deducts from your wallet (wholesale cost + 25% service fee for open-source models, 30% for closed-source models). You can mix both modes: BYOK for one provider and Managed for another.
Security Best Practices
Request Headers
Control routing, tagging, and analytics with optional request headers. All headers are optional — only include the ones you need.
Routing Overrides
By default, the Smart Router picks the cheapest provider for your model. Use these headers to override that behavior.
| Header | Description | Example |
|---|---|---|
| x-provider | Pin to a specific provider (bypasses Smart Routing) | groq |
| x-model | Override the model for this request | llama-3.3-70b-versatile |
Analytics Tags
Tag every request with metadata that flows into your project analytics dashboard. Use these headers to slice usage by feature, environment, or tenant — then filter the dashboard to see per-feature costs, latency, and PII violations.
| Header | Description | Example |
|---|---|---|
| x-feature | Tag requests by feature for analytics dashboards | chatbot-v2 |
| x-env | Tag by environment for environment-level analytics | production |
| x-tenant | Multi-tenant ID for per-customer tracking | customer_123 |
Example: Tagged Request
Pass analytics tags alongside your chat completion request. These tags are stored with every request event and appear in your project dashboard, where you can filter charts and the request table by any combination of feature, environment, and tenant.
curl -X POST https://api.opensourceaihub.ai/v1/chat/completions \
-H "Authorization: Bearer oah_your_project_key" \
-H "Content-Type: application/json" \
-H "x-feature: chatbot-v2" \
-H "x-env: production" \
-H "x-tenant: customer_123" \
-d '{
"model": "oah/llama-3-70b",
"messages": [{"role": "user", "content": "Hello"}]
}'Dashboard Filtering
Once you start sending tagged requests, your project dashboard automatically shows filter dropdowns for each tag dimension. Filter by feature to see per-feature cost breakdowns, or by tenant to track per-customer usage — all charts, totals, and the request table update in real time.
Model Catalog
Virtual model names (oah/*) are provider-agnostic. In Managed Mode, the Smart Router optimizes for cost efficiency automatically. In BYOK Mode, use explicit provider model IDs.
Understanding AI Models & Providers
An AI model (like Llama 3, Mistral, or Claude) is the actual “brain” that processes your questions and generates responses. A provider (like OpenAI, Google Gemini, xAI, Groq, or Together.ai) is a company that runs these models on their servers and lets you use them via an API.
The Hub supports two categories of models:
- Sovereign (open-source) models — like Llama 4, DeepSeek, Qwen, Mistral, and Mixtral — are open-weight models hosted by multiple providers at different prices. For example, Llama 4 Maverick is available on Groq, Together.ai, and DeepInfra. The Smart Router indexes prices across all of them and optimizes for the best available rate.
- External (closed-source) models — like GPT-4.1 (OpenAI), Claude (Anthropic), Gemini (Google), and Grok (xAI) — are proprietary models only available from their creator. These are routed directly to the single provider that offers them.
Virtual model names (starting with oah/) let you request a model without choosing a provider. For sovereign models, the Hub's Smart Router will automatically select the most cost-efficient provider available. For external models, the Hub routes directly to the model's creator. If you need a specific provider, you can use the provider's native model name instead.
| Virtual Model | Input | Output |
|---|---|---|
oah/llama-4-maverick | $0.20 | $0.60 |
oah/llama-4-scout | $0.11 | $0.34 |
oah/llama-3-70b | $0.35 | $0.40 |
oah/deepseek-r1 | $0.50 | $2.15 |
oah/deepseek-v3 | $0.21 | $0.79 |
oah/qwen3-235b | $0.20 | $0.60 |
oah/mixtral-8x7b | $0.45 | $0.70 |
oah/mistral-large | $0.50 | $1.50 |
oah/mistral-small | $0.10 | $0.30 |
oah/codestral | $0.30 | $0.90 |
oah/claude-sonnet-4.6 | $3.00 | $15.00 |
oah/claude-opus-4.6 | $5.00 | $25.00 |
oah/claude-haiku-4.5 | $1.00 | $5.00 |
oah/claude-sonnet-4 | $3.00 | $15.00 |
oah/gpt-4.1 | $2.00 | $8.00 |
oah/gpt-4.1-mini | $0.40 | $1.60 |
oah/gpt-4.1-nano | $0.10 | $0.40 |
oah/o4-mini | $1.10 | $4.40 |
oah/gemini-2.5-pro | $1.25 | $10.00 |
oah/gemini-2.5-flash | $0.30 | $2.50 |
oah/grok-3 | $3.00 | $15.00 |
oah/grok-3-mini | $0.30 | $0.50 |
Not limited to the list above
The Hub supports any model your provider offers — not just the curated virtual models above. The table above shows models with optimized Smart Router support and verified pricing. In BYOK mode, if your provider releases a new model tomorrow, you can use it immediately by passing the provider's native model ID (e.g., gemini-2.5-flash) — any valid model string is forwarded directly. In Managed Mode, explicit model names are validated against the registry to prevent wasted credits on typos. Use oah/ virtual names for automatic Smart Router support.
Explicit Provider Models
You can bypass the Smart Router by using a provider-specific model ID directly. In BYOK mode, any model string is passed through to your provider. In Managed Mode, the model must exist in the Hub registry — unrecognized names are rejected with a helpful suggestion. We recommend using oah/ virtual names for the best experience.
// Recommended: virtual model names (works in all modes)
{ "model": "oah/gemini-2.5-flash" } // Smart Router picks best provider
{ "model": "oah/llama-3-70b" } // Cost-optimized across Groq, Together, DeepInfra
// Explicit: bypass Smart Router (BYOK — any model; Managed — registry-validated)
{ "model": "gemini-2.5-flash" } // Google Gemini directly
{ "model": "gpt-4.1" } // OpenAI directly
{ "model": "llama-3.3-70b-versatile" } // Groq directly (short alias)
// Some providers use org/model-name format for open-source models:
{ "model": "meta-llama/llama-4-maverick-17b-128e-instruct" } // Groq / Together / DeepInfra
{ "model": "deepseek-ai/DeepSeek-R1" } // Together / DeepInfra
{ "model": "Qwen/Qwen3-235B-A22B" } // Together / DeepInfraProvider model IDs vary by provider. Closed-source models (OpenAI, Google, Anthropic) use short names like gpt-4.1. Open-source models hosted on inference providers (Groq, Together, DeepInfra) often use org/model-name format. Use the exact ID your provider expects, or use oah/ virtual names to avoid worrying about provider-specific IDs.
Which Model Should I Choose?
oah/llama-4-maverick — Meta's flagship MoE model, excellent quality at low cost across 3 providers.oah/llama-4-scout or oah/gpt-4.1-nano — low cost, high speed. Good for simple tasks.oah/deepseek-r1 or oah/o4-mini — chain-of-thought reasoning models optimized for complex problem solving.oah/codestral or oah/claude-sonnet-4.6 — Mistral's specialized coding model or Anthropic's frontier coder.oah/claude-opus-4.6 or oah/gpt-4.1 — the most capable models from Anthropic and OpenAI.oah/gemini-2.5-flash, oah/gpt-4o-mini, or oah/claude-3.5-sonnet — these models accept image inputs (base64 or URL). Not all models support vision; the Hub will return a model_does_not_support_vision error with suggestions if you send images to a text-only model.oah/gpt-4.1-mini or oah/gemini-2.5-flash — excellent price-to-performance for production workloads.AI Firewall (DLP)
Every request is scanned for PII, credentials, and injection patterns before reaching the LLM. The firewall is fail-closed: if the scan service is unavailable, the request is blocked.
Why Does This Matter?
When your application sends data to an AI model, that data leaves your systems and goes to a third-party provider's servers. If a user accidentally types their credit card number, social security number, or your company's API keys into a prompt, that sensitive data could end up in the AI provider's logs, training data, or be exposed in a breach.
The AI Firewall prevents this by scanning every request before it reaches the AI model and either redacting (replacing sensitive data with [REDACTED]) or blocking (rejecting the entire request).
Real-World Example: Before vs. After
What the user sends (unsafe):
"messages": [
{
"role": "user",
"content": "Hey, can you help me format this for my resume?\nName: Jane Smith\nEmail: jane.smith@acme.com\nSSN: 123-45-6789\nMy OpenAI key is sk-abc123xyz456"
}
]What the AI model receives (safe, after Hub redaction):
"messages": [
{
"role": "user",
"content": "Hey, can you help me format this for my resume?\nName: [REDACTED]\nEmail: [REDACTED]\nSSN: [REDACTED]\nMy OpenAI key is [REDACTED]"
}
]The AI model still understands the request and can help with resume formatting — but it never sees the actual personal data. The user gets their helpful response; the company stays GDPR/CCPA compliant.
Two Policy Actions
Sensitive data is replaced with [REDACTED] and the request is forwarded to the AI model. The AI can still process the rest of the prompt. This is the default for most entity types.
The entire request is rejected with a 400 error. Nothing is forwarded to the AI model. This is the default for prompt injection attacks and can be configured for any entity type.
28 Built-in Entities
Hub Governance Engine + Extended DetectionAI Firewall
PROMPT_INJECTIONJailbreak / injection heuristicsBLOCKDeveloper Secrets
API_KEYOpenAI, Anthropic, Google AI keys (sk-*, gsk_*)REDACTAWS_ACCESS_KEYAWS AKIA* access key IDsREDACTAWS_SECRET_KEY40-char AWS secret access keysREDACTPRIVATE_KEYRSA / EC / DSA / SSH PEM headersREDACTGITHUB_TOKENPATs (ghp_), OAuth (gho_), fine-grained tokensREDACTSLACK_WEBHOOKSlack incoming webhook URLsREDACTFinancial & Crypto
CREDIT_CARDVisa, MC, Amex, Discover, etc.REDACTIBAN_CODEInternational Bank Account NumbersREDACTUS_BANK_NUMBERUS bank account numbersREDACTCRYPTO_ADDRESSBitcoin (legacy + bech32) and EthereumREDACTUS_ITINUS Individual Taxpayer IDsREDACTPersonal Identifiers
EMAIL_ADDRESSEmail addressesREDACTPHONE_NUMBERInternational phone numbersREDACTUS_SSNUS Social Security NumbersREDACTUS_PASSPORTUS passport numbersREDACTPERSONNamed person entitiesREDACTSTREET_ADDRESSUS/UK/global street addresses, PO Boxes, apartment/suite numbersREDACTDATE_TIMEDates and timesREDACTNRPNationality, religion, political groupREDACTUK / EU Identifiers
UK_NINOUK National Insurance NumbersREDACTUK_NHS_NUMBERUK National Health Service NumbersREDACTNetwork & Location
IP_ADDRESSIPv4 and IPv6 addressesREDACTMAC_ADDRESSNetwork MAC addressesREDACTLOCATIONNamed locationsREDACTURLURLsREDACTMedical & Licenses
MEDICAL_LICENSEMedical license numbersREDACTUS_DRIVER_LICENSEUS driver's license numbersREDACTCustom IP Guard (Enterprise Regex)
Define custom regex patterns in your project's DLP policy to protect internal project names, proprietary terminology, internal IP ranges, or any sensitive string unique to your organization. Custom patterns run alongside the 28 built-in entities on every request.
// In your project's DLP Policy dashboard, add:
{
"custom_regexes": [
{
"name": "INTERNAL_PROJECT_NAME",
"pattern": "(?i)\\b(project[- ]?phoenix|codename[- ]?aurora)\\b",
"action": "REDACT"
},
{
"name": "INTERNAL_IP_RANGE",
"pattern": "\\b10\\.42\\.\\d{1,3}\\.\\d{1,3}\\b",
"action": "BLOCK"
}
]
}
// These patterns are applied in real-time alongside
// the 28 built-in entity recognizers on every request.DLP Policies
How the Hub applies DLP protection to every request — default global policy, project-level customization, and version history.
Global Default Policy — Maximum Protection
Every request that reaches the Hub is protected by the AI Firewall. If no custom project-level policy is configured, the Hub automatically applies the Global Default Policy:
Action
REDACT — PII replaced with [REDACTED]
Scope
All 28 entity types + Prompt Injection
This means that even if you haven't configured anything, your data is protected from the very first request. No entity is excluded — every PII category (personal identifiers, financial data, developer secrets, network info, medical data) is scanned and redacted automatically.
When Does the Default Policy Apply?
Hub API keys (os_hub_*) always use the global default policy. Hub keys are account-level and have no project context, so the default is the only DLP policy that applies. All 28 entities are scanned, PII is redacted.
Project API keys (oah_*) use the project's custom DLP policy if one has been configured. If the project has no custom policy, the global default policy applies — same behavior as Hub keys.
Playground keys (os_pg_*) always use the global default policy. The Playground is for testing — it uses the same maximum protection as Hub keys.
Project-Level Policy Customization
To override the global default, create a custom DLP policy in your project's DLP Policy page. Custom policies let you:
- Choose the action: REDACT (replace PII with placeholders, forward the request) or BLOCK (reject the entire request).
- Select specific entities: Pick exactly which of the 28 entity types to monitor — for example, a fintech app might only need credit cards, SSNs, and bank numbers.
- Add custom regex rules: Protect proprietary codenames, internal project names, or any pattern unique to your organization.
- Use templates: Quick presets like AI Firewall, GDPR Bundle, Fintech/PCI, DevOps/SRE, or Maximum Protection.
Per-Project Sensitivity Tiers
Each project can choose a protection sensitivity tier that controls how aggressively PII is detected. This lets you balance security vs. prompt utility for different use cases.
Strict (0.25)
Maximum security. Catches even partial names like "B George" and low-confidence matches. Best for Legal, Finance, and Healthcare.
Balanced (0.40)
Recommended default. Optimized for professional AI interactions. Full names, emails, SSNs, and secrets are caught reliably.
Relaxed (0.60)
Highest utility. Only high-confidence PII triggers redaction. Common words like "July" pass through. Best for developer tooling.
Sensitivity is configured per-project in the DLP Policy page. The global admin threshold still controls the default for Hub keys and Playground keys.
Policy Versioning
Every time you save a policy, it creates a new immutable version (v1, v2, v3...). Old versions are never modified — they're deactivated. This gives you:
- Full audit trail: See who created each version, when, and what it blocked.
- Restore capability: Revert to any previous version with one click — this creates a new version as a copy, preserving the history.
- Version comparison: The project dashboard shows block rate per policy version so you can measure the impact of policy changes.
- Timeline markers: Cost and usage charts display markers at each policy change point.
Policy Resolution Flow
# Every request follows this resolution path:
1. Request arrives → authenticate key
2. Is it a project key?
├─ Yes → Does project have a custom DLP policy?
│ ├─ Yes → Use project policy (e.g. v3: BLOCK, 14 entities + 2 regex)
│ └─ No → Use global default (REDACT, all 28 entities)
└─ No → Is it a Hub key or Playground key?
└─ Use global default (REDACT, all 28 entities)
3. Scan messages with resolved policy → redact or block
4. Forward to LLM (if not blocked)
Vision Security (Multi-modal DLP)
Base64-encoded images in chat payloads are OCR-scanned for PII and secrets before the request is forwarded.
How It Works
- 1Detect: The Hub identifies image_url entries with data:image/* Base64 payloads in the messages array.
- 2Extract: The Vision Security Layer extracts all readable text from the image via OCR within a 5-second timeout.
- 3Scan: Extracted text is scanned by the Hub Governance Engine (28 entities + custom regex).
- 4Enforce: If violations are found, the entire request is blocked with a 400 Security Violation. The image is never forwarded.
- 5Purge: The image is processed in RAM only and immediately discarded. It is never written to disk or stored.
const response = await client.chat.completions.create({
model: "oah/llama-3-70b",
messages: [
{
role: "user",
content: [
{ type: "text", text: "What is in this config file?" },
{
type: "image_url",
image_url: {
url: "data:image/jpeg;base64,/9j/4AAQ..."
},
},
],
},
],
max_tokens: 256,
});
// The Hub runs OCR on the image BEFORE forwarding:
// 1. Extracts text via the Vision Security Layer (OCR)
// 2. Scans extracted text for PII / secrets
// 3. If violations found → 400 Security Violation
// 4. Image is processed ephemerally — zero persistence, zero storageSize Limit
Individual image payloads are limited to 5 MB. Requests exceeding this return 413 Payload Too Large.
Block, Don't Redact
For images, the action is always BLOCK. We cannot selectively redact pixels — if PII is found, the entire request is rejected.
Vision-Capable Model Routing
Not all models accept image inputs. The Hub automatically handles this:
- Virtual models (oah/*) in Managed Mode: The Smart Router automatically picks a vision-capable provider for the model. If no provider supports vision for that model, a clear error is returned with suggestions.
- Explicit models (BYOK): If the model you specify doesn't support vision, the Hub returns a
model_does_not_support_visionerror with suggested alternatives instead of a vague provider rejection.
Vision-capable models are marked with the Vision badge in the Model Catalog above. Examples: oah/gemini-2.5-flash, oah/gpt-4.1-mini, oah/claude-sonnet-4.6.
Billing & Wallet
Two modes: Bring Your Own Key (free pass-through) or Managed Credits (prepaid wallet with 25% markup for open-source models, 30% for closed-source).
How Billing Works — Plain Language
Think of OpenSourceAIHub as a restaurant that serves dishes from many kitchens (AI providers). You have two options for paying:
BYOK Mode
- You provide your own provider API keys
- No wallet deduction — zero Hub cost
- Full DLP / Firewall protection included
- Keys encrypted with AES-256-GCM
Managed Mode
- No keys needed — top up your wallet
- 25% markup (open-source) / 30% markup (closed-source)
- Smart Router optimizes for best rate
- Pre-flight balance check + max_tokens cap
- Image gen billed per-event: 3,750 credits (Performance) to 100K credits (Premium)
Total Token Budgeting
In Managed Mode, the Hub performs a pre-flight check before every request:
- 1.Estimates the input token cost for the entire messages array (full conversation history).
- 2.If the estimated input cost alone exceeds your wallet balance →
402 Insufficient Balance. - 3.Caps
max_tokens(output) based on remaining balance after input cost, preventing surprise overspend.
Media Generation Tiers
Image generation models are billed at a flat fee per event instead of per-token. Prices include the wholesale cost plus the Governance & Infrastructure fee (25% open-weight / 30% closed models).
Performance
3,750
~$0.004/image
Standard
50,000
~$0.05/image
Premium
100,000
~$0.10/image
| Model | Tier | Wholesale | Credits?Credits are deducted atomically based on our cost-optimization engine. Estimates include the standard Hub service fee. |
|---|---|---|---|
| FLUX.1-schnell | Performance | $0.003 | 3,750 |
| DALL-E 2 | Standard | $0.020 | 25,000 |
| FLUX.1-dev | Standard | $0.025 | 31,250 |
| SD XL Base 1.0 | Standard | $0.040 | 50,000 |
| FLUX.1-pro | Standard | $0.050 | 62,500 |
| Stable Diffusion 3 | Standard | $0.065 | 81,250 |
| DALL-E 3 | Premium | $0.080 | 100,000 |
| DALL-E 3 HD | Premium | $0.120 | 150,000 |
Unknown media models default to 125,000 credits ($0.125) to protect margins. Media requests are logged as IMAGE_GENERATION events.
Wallet Top-Up
Credits are purchased via Stripe one-time payments. Minimum top-up is 5M Credits ($5). Credits have no cash value, are non-transferable, and expire after 12 months of account inactivity. Wallet deductions are atomic and transaction-safe — you can never be double-charged or spend below zero, even under high concurrency.
Frequently Asked Questions
How much does a typical AI request cost?
A short question-and-answer exchange (about 100-200 words total) with Llama 3 70B costs roughly 100-500 credits in Managed Mode. That means 1,000,000 Hub Credits can cover thousands of requests. Image generation uses tiered pricing: Performance models (e.g., Flux.1-schnell) cost ~3,750 credits, Standard models (e.g., SDXL) ~50,000, and Premium models (e.g., DALL-E 3) ~100,000 credits per image.
Can I use BYOK for one provider and Managed for another?
Yes! This is called hybrid mode. If you have a Groq API key stored in BYOK but not a Together.ai key, requests to Groq will use your key (free Hub charge) and requests to Together.ai will use Managed Mode (deducted from your wallet). The Hub handles this automatically.
What happens if my wallet runs out mid-request?
The Hub checks your balance BEFORE sending the request. If your wallet can't cover the estimated cost, the request is rejected with a 402 error and a clear message telling you to top up. You'll never be charged more than your balance.
Is the DLP / security layer free?
Yes. The AI Firewall runs on every request regardless of billing mode. Whether you use BYOK or Managed Mode, your data is always scanned and protected at no extra charge.
What does the 25%/30% markup cover?
The markup covers the AI Firewall (DLP), Smart Routing, infrastructure, API key management, wallet billing, usage analytics, and dashboard features. Open-source models carry a 25% markup; closed-source models (OpenAI, Gemini, xAI) carry a 30% markup to account for higher wholesale costs. It's the service fee for not having to build all of this yourself.
Success Response Reference
Complete field-by-field reference for every successful /v1/chat/completions response.
Understanding Latency Fields
┌───────────────── Total Request (elapsed) ──────────────────┐ │ │ │ ┌─ Hub Overhead ─┐ ┌─── Provider (upstream) ───┐ │ │ │ │ │ │ │ │ │ DLP scan │ │ Network + LLM inference │ │ │ │ routing │ │ │ │ │ │ validation │ │ │ │ │ │ │ │ │ │ │ └─── latency_ms ──┘ └── upstream_latency_ms ───┘ │ │ (hub_meta) (hub_meta) │ │ │ │ dlp_latency_ms ⊆ latency_ms │ └────────────────────────────────────────────────────────────┘ latency_ms = total_elapsed − upstream_latency_ms dlp_latency_ms ⊆ latency_ms (DLP scan portion only) upstream_latency_ms = time spent waiting for the provider
Full Success Response Example
The response follows the standard OpenAI format with an additional hub_metadata object. Any client expecting OpenAI-style responses will work unchanged — hub_metadata can simply be ignored if not needed.
{
// ── Standard OpenAI-compatible fields ──────────────
"id": "chatcmpl-DKxHaZHGxRtRfLlIUglZWIcs2Q7FY",
"object": "chat.completion",
"created": 1773886814,
"model": "o3-mini",
"system_fingerprint": "fp_5b51a51e10",
"service_tier": "default",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "Abraham Lincoln was the 16th president ...",
"tool_calls": null,
"function_call": null,
"provider_specific_fields": { "refusal": null },
"annotations": []
},
"provider_specific_fields": {}
}
],
"usage": {
"prompt_tokens": 502,
"completion_tokens": 2909,
"total_tokens": 3411,
"completion_tokens_details": {
"accepted_prediction_tokens": 0,
"audio_tokens": 0,
"reasoning_tokens": 2368,
"rejected_prediction_tokens": 0
},
"prompt_tokens_details": {
"audio_tokens": 0,
"cached_tokens": 0
}
},
// ── Hub-enriched fields ────────────────────────────
"x_request_id": "req_8138a426fe794d0a",
"hub_metadata": {
// Routing & mode
"requested_model": "oah/o3-mini",
"provider_selected": "openai",
"routing_mode": "smart_route",
"mode": "MANAGED",
"request_type": "CHAT_COMPLETION",
"intent": "GENERAL_CHAT",
"is_managed": true,
// Latency breakdown (milliseconds)
"latency_ms": 320,
"dlp_latency_ms": 68,
"upstream_latency_ms": 4977,
// Throughput
"tokens_per_sec": 150.3,
// Cost (USD)
"wholesale_cost_usd": 0.013352,
"cost_usd": 0.0173576,
"new_balance_usd": 0.95314815,
// DLP / PII results
"pii_detected": true,
"dlp_action": "redact",
"violations_count": 37,
"entity_types_detected": [
"DATE_TIME", "LOCATION", "NRP", "PERSON", "URL"
],
"redacted_prompt": "[REDACTED] ([REDACTED]) was the 16th president of [REDACTED], serving from ..."
}
}Block Mode Response Example
When the DLP policy action is set to BLOCK, the request is rejected immediately and never forwarded to the AI provider. Each violation includes the entity type, character positions, and a confidence score.
{
"error": "pii_policy_violation",
"message": "Request blocked: PII detected in prompt",
"request_id": "req_a08a3d1067654a16",
"violations": [
{
"entity_type": "EMAIL_ADDRESS",
"start": 45,
"end": 64,
"score": 1.0
},
{
"entity_type": "US_SSN",
"start": 110,
"end": 121,
"score": 1.0
},
{
"entity_type": "CREDIT_CARD",
"start": 142,
"end": 161,
"score": 1.0
},
{
"entity_type": "IBAN_CODE",
"start": 172,
"end": 199,
"score": 1.0
},
{
"entity_type": "STREET_ADDRESS",
"start": 263,
"end": 283,
"score": 1.0
},
{
"entity_type": "PERSON",
"start": 33,
"end": 43,
"score": 0.85
},
{
"entity_type": "DATE_TIME",
"start": 88,
"end": 98,
"score": 0.95
},
{
"entity_type": "IP_ADDRESS",
"start": 306,
"end": 315,
"score": 0.95
},
{
"entity_type": "LOCATION",
"start": 285,
"end": 293,
"score": 0.85
}
]
}Block Response Fields
| Field | Type | Description |
|---|---|---|
| error | string | Always "pii_policy_violation" for DLP blocks. |
| message | string | Human-readable reason: "Request blocked: PII detected in prompt". |
| request_id | string | Hub correlation ID for tracing and support (e.g. "req_a08a3d10..."). |
| violations | array | Array of detected entities. Each contains entity_type, start/end character offsets, and confidence score. |
| violations[].entity_type | string | The PII entity type detected (e.g. "EMAIL_ADDRESS", "PERSON", "CREDIT_CARD"). |
| violations[].start | integer | Start character offset in the original prompt. |
| violations[].end | integer | End character offset in the original prompt. |
| violations[].score | float | Detection confidence: 1.0 for pattern-based (email, SSN), 0.85–0.95 for NLP-based (person names, locations, dates). |
Top-Level Response Fields (Success / Redact Mode)
Standard OpenAI-compatible fields plus Hub-enriched additions.
| Field | Type | Source | Description |
|---|---|---|---|
| id | string | Provider | Unique completion ID assigned by the upstream provider (e.g. "chatcmpl-DKxH..."). |
| object | string | Provider | Always "chat.completion". |
| created | integer | Provider | Unix timestamp (seconds) when the completion was generated. |
| model | string | Hub | Actual provider model ID that ran the request (e.g. "o3-mini", "llama-3.3-70b-versatile"). This is the real model, not the requested virtual name. |
| system_fingerprint | string | Provider | Provider system version fingerprint. May be null for some providers. |
| service_tier | string | Provider | Service tier used (e.g. "default"). Provider-specific; may be absent for non-OpenAI providers. |
| choices | array | Provider | Array of completion choices. Each contains index, finish_reason, message (with role, content, tool_calls, function_call, annotations), and optional provider_specific_fields. |
| usage | object | Provider | Token usage: prompt_tokens, completion_tokens, total_tokens. May include completion_tokens_details (reasoning_tokens, etc.) and prompt_tokens_details (cached_tokens, etc.). |
| x_request_id | string | Hub | Hub correlation ID for tracing and support (e.g. "req_8138a42..."). Same value as the x-request-id response header. |
| hub_metadata | object | Hub | Hub-enriched metadata block with routing, latency, cost, and DLP details. See detailed reference below. |
hub_metadata Field Reference
| Field | Type | Present | Description |
|---|---|---|---|
| requested_model | string | Always | Exact model string the caller sent (e.g. "oah/o3-mini" or "gemini-2.5-flash"). |
| provider_selected | string | Always | Provider that served the request (openai, groq, together, deepinfra, mistral, anthropic, gemini, xai). |
| routing_mode | string | Always | "smart_route" if an oah/* virtual model was used; "explicit" if a provider model was specified directly. |
| mode | string | Always | Billing mode — "MANAGED" (Hub keys, billed to wallet) or "BYOK" (your own provider keys). |
| request_type | string | Always | "CHAT_COMPLETION" or "IMAGE_GENERATION". |
| intent | string | Always | Classified prompt intent (GENERAL_CHAT, CODE_GENERATION, SUMMARIZATION, etc.). |
| is_managed | boolean | Managed only | Present and true only when mode is MANAGED. |
| latency_ms | integer | Always | Hub overhead in milliseconds — total elapsed minus upstream time. Includes DLP scan, routing, validation, response assembly. |
| dlp_latency_ms | integer | Always | Time spent on DLP/PII scanning (subset of latency_ms). 0 if no scan was performed. |
| upstream_latency_ms | integer | Always | Time the LLM provider took to respond — network round-trip plus model inference. |
| tokens_per_sec | number | Always | Throughput: total_tokens / upstream_latency_seconds. |
| wholesale_cost_usd | number | Always | Raw provider cost in USD (what the provider charges). |
| cost_usd | number | Always | Billed cost in USD. Managed mode: wholesale + markup. BYOK: $0 (you pay provider directly). |
| new_balance_usd | number | Managed only | Remaining wallet balance after deduction. Only present in Managed mode. |
| pii_detected | boolean | Always | true if the DLP scanner found any PII / sensitive entities in the request. |
| dlp_action | string | If PII found | "redact" (entities replaced with [REDACTED]) or "detect" (flagged but not modified). Absent when no PII detected. |
| violations_count | integer | If PII found | Number of individual PII entities detected (e.g. 37 means 37 separate matches). |
| entity_types_detected | string[] | If PII found | Sorted list of unique entity types found (e.g. ["DATE_TIME", "LOCATION", "PERSON"]). |
| redacted_prompt | string | If redacted | The prompt after PII entities were replaced with [REDACTED]. Only present when dlp_action is "redact". |
| media_events | integer | Images only | Number of images processed. Only present for IMAGE_GENERATION requests. |
Response Headers
Key metrics are also available as HTTP headers for clients that need them without parsing the JSON body.
| Header | Example | Description |
|---|---|---|
| x-request-id | req_e0e48046... | Unique correlation ID for this request. Use this when contacting support. |
| x-hub-latency | 490 | Hub overhead in ms (same as hub_metadata.latency_ms). |
| x-dlp-latency | 189 | DLP scan time in ms (same as hub_metadata.dlp_latency_ms). |
| x-hub-intent | GENERAL_CHAT | Classified prompt intent. |
| x-hub-tokens-per-sec | 114.33 | Throughput metric. |
Total elapsed time is not returned directly in the response. Calculate it as: latency_ms + upstream_latency_ms. For the example above: 320 + 4,977 = 5,297 ms total.
Error Reference
Structured JSON error responses with correlation IDs for every failure mode.
Security Violation — DLP Block
Returned when the AI Firewall detects PII, secrets, or a prompt injection pattern and the DLP policy action is BLOCK. The request is never forwarded to the LLM.
{
"error": {
"message": "Security policy violation: request blocked.",
"type": "security_violation",
"code": 400,
"violations": [
{
"entity": "PROMPT_INJECTION",
"action": "BLOCK",
"start": 0,
"end": 47
}
],
"correlation_id": "req_a1b2c3d4"
}
}Invalid Request
The request is malformed — empty messages array, empty last message content, streaming requested (not yet supported), or an unsafe image URL.
{
"detail": {
"error": "invalid_request",
"message": "Invalid request: messages array must not be empty.",
"correlation_id": "req_b2c3d4e5"
}
}Unknown Virtual Model
The oah/* virtual model name is not recognized in the Hub's model registry. This is caught early — before DLP scanning or provider calls. Includes similar model suggestions when available.
{
"detail": {
"error": "unknown_virtual_model",
"message": "Virtual model 'oah/invalid_model' is not recognized. Did you mean: oah/llama-3-70b, oah/llama-4-maverick?",
"suggested_models": ["oah/llama-3-70b", "oah/llama-4-maverick"],
"docs_url": "/docs#models",
"correlation_id": "req_b2c3d4e5"
}
}Unknown Model (Managed Mode)
In Managed Mode, an explicit (non-virtual) model name was not found in the registry. The response includes a suggested virtual model name when available.
{
"detail": {
"error": "unknown_model",
"message": "Model 'gpt-5-turbo' is not in the registry. Did you mean 'oah/gpt-4.1'?",
"suggested_model": "oah/gpt-4.1",
"correlation_id": "req_c3d4e5f6"
}
}Upstream Bad Request
The LLM provider rejected the request (e.g. invalid parameters, unsupported features for the model). Passed through from the provider.
{
"detail": {
"error": "upstream_bad_request",
"message": "The AI provider rejected this request. Did you mean 'oah/llama-3-70b'? Please check your model name and parameters.",
"suggested_model": "oah/llama-3-70b",
"correlation_id": "req_d4e5f6g7"
}
}Unauthorized
The API key is missing from the Authorization header.
{
"detail": "Missing API key"
}Insufficient Balance
Managed Mode only. The wallet balance is too low to cover the estimated cost, or was exhausted mid-request. Also returned when no provider credentials could be resolved.
{
"detail": {
"error": "insufficient_balance",
"message": "Your Hub Wallet balance is $0.001. Minimum required: $0.01. Please top up.",
"correlation_id": "req_e5f6g7h8"
}
}Budget Exceeded
Managed Mode only. The request would exceed the project's configured daily or monthly budget cap.
{
"detail": {
"error": "budget_exceeded",
"message": "This request would exceed your project daily budget of $5.00. Current spend: $4.98.",
"correlation_id": "req_f6g7h8i9"
}
}Forbidden
The API key is valid but not authorized for this action — revoked key, playground key used externally, or a premium model accessed without a Pro subscription.
{
"detail": "Invalid or revoked API key"
}Model Does Not Support Vision
The request contains image content (image_url parts) but the selected model does not support vision/multimodal inputs. The response includes suggested vision-capable models.
{
"detail": {
"error": "model_does_not_support_vision",
"message": "Model 'oah/llama-3.3-70b-versatile' does not support image inputs. Try a vision-capable model: oah/gemini-2.5-flash, oah/gpt-4o-mini, oah/claude-3.5-sonnet",
"suggested_models": ["oah/gemini-2.5-flash", "oah/gpt-4o-mini", "oah/claude-3.5-sonnet"],
"correlation_id": "req_v1s10n"
}
}Payload Too Large
An image payload exceeds the 5 MB per-image limit.
{
"detail": {
"error": "image_too_large",
"message": "Image exceeds 5MB limit (received 7.2MB).",
"correlation_id": "req_g7h8i9j0"
}
}Rate Limited
Too many requests in the current window. Respect the Retry-After header.
{
"detail": {
"error": "rate_limit_error",
"message": "Rate limit exceeded. Try again in 60 seconds.",
"correlation_id": "req_h8i9j0k1"
}
}Security Processing Error
The PII/DLP scanning service is temporarily unavailable. The Hub fails closed — requests are blocked, never forwarded unscanned.
{
"detail": {
"error": "security_processing_error",
"message": "PII safety scan is temporarily unavailable. Requests are blocked for your protection. Please try again shortly.",
"correlation_id": "req_i9j0k1l2"
}
}Upstream Provider Error
The LLM provider returned an unexpected error, rejected the credentials (Managed Mode), or could not be reached.
{
"detail": "The AI provider returned an error. Please try again or try a different model."
}Service Temporarily Unavailable
An internal dependency (authentication, key resolution, or policy lookup) is temporarily unavailable. Safe to retry.
{
"detail": "Unable to verify your API key right now. Please try again shortly."
}Upstream Timeout
The LLM provider did not respond within the timeout window. The request was not completed.
{
"detail": "The AI provider took too long to respond. Please try again."
}Postman Collection
Import our pre-built collection to test every endpoint — including security violation and balance check scenarios.
OpenSourceAIHub API Collection
Includes requests for Smart-Routed completions, explicit provider calls, streaming, vision/multi-modal, PII detection tests, prompt injection tests, and all error scenarios. Pre-configured with collection variables.
Download Postman CollectionPostman v2.1 format · Import via File → Import in Postman
Frequently Asked Questions
Common questions about architecture, compatibility, and privacy.
Does the Hub remember my conversation history?
No. The Hub is completely stateless — it processes each request independently and immediately forgets it. Conversation memory is your application's responsibility. If a user is on their 10th message, your app sends all 10 previous messages in the messages[] array every time. The Hub scans the entire array for PII, routes it to the AI, and discards everything. Only metadata (token count, cost, latency) is logged — never your actual messages. This is exactly how the OpenAI API works.
Why do longer conversations cost more?
Because your app resends the full conversation history with every request. A 10-message conversation means the 10th request contains all 10 messages — so you're billed for all those tokens each time. This is standard for all LLM APIs (not Hub-specific). To manage costs, consider trimming older messages from the array or summarizing earlier conversation turns.
Do you support the OpenAI Threads / Assistants API?
No, and this is by design. The Threads API (/v1/threads) stores conversation messages on the server, making it stateful. This would require us to store billions of user messages — a massive storage cost and a direct privacy liability. Our stateless architecture means we never hold your conversation data, so there's nothing to breach, subpoena, or accidentally leak. If you need conversation memory, manage the messages[] array in your own code or database and send the full history with each request.
Does the Hub store my data?
No. We only log metadata (like "1 email address was blocked") — we never store the actual content of your messages or the AI's responses. Your data passes through our security layer and is forwarded to the AI provider. Nothing is saved on our servers. Images sent for vision scanning are processed in RAM only and immediately discarded.
Is this compatible with my existing code?
If your app uses the OpenAI SDK (the most popular AI library), you only need to change two lines: the API key and the base URL. Everything else — your models, your prompts, your response handling — stays exactly the same. The Hub implements the standard /v1/chat/completions endpoint.
What is the default DLP policy?
The Hub applies a "Maximum Protection" default policy to every request that doesn't have a custom project-level policy. This default scans for all 28 built-in entity types (PII, credentials, financial data, etc.) plus prompt injection patterns, and redacts any matches with [REDACTED] before forwarding to the AI model. Hub keys always use this default. Project keys use their project's custom policy if one exists, otherwise they also get this default. You are protected from your very first API call — no configuration required.
How do I customize my DLP policy?
Go to your project's DLP Policy page (Projects → Your Project → Policy). There you can change the action (REDACT vs. BLOCK), select specific entity types, add custom regex patterns, or apply quick templates (GDPR, Fintech, DevOps, etc.). Every save creates a new immutable version — you can view version history and restore any previous version with one click.
Does streaming (stream: true) work?
Not yet. In Phase 1 the AI Firewall scans the full request before forwarding, which requires processing the complete prompt. Sending stream: true will return a 400 error. Streaming with output scanning is planned for Phase 2.
Glossary
Quick reference for every key term used in this documentation.