OpenSourceAIHub

OpenSourceAIHub Documentation — API Reference, Guides & Model Catalog

What is OpenSourceAIHub?

A plain-language introduction — no AI background required.

OpenSourceAIHub is a secure gateway that sits between your application and AI models like Llama, Mistral, Claude, GPT-4o, Gemini, Grok, and others. Think of it as a security checkpoint for every AI request your software makes.

When your application asks an AI model a question (called a “prompt”), that prompt might accidentally contain sensitive data — a customer's email address, a credit card number, an API key, or even a social security number. Without protection, that data gets sent directly to a third-party AI company's servers.

OpenSourceAIHub catches that sensitive data before it ever leaves your control. It scans every request, redacts or blocks the sensitive parts, then forwards the cleaned request to the AI model. Your users get their AI-powered features. Your company stays compliant and secure.

What the Hub Does (In One Sentence Each)

Protects sensitive data: Automatically detects and removes personal information, API keys, credit card numbers, and 28+ other sensitive data types from your AI requests.
Works with any AI model: Supports Llama 4, DeepSeek, Qwen, Mistral, Claude, GPT-4.1, Gemini, Grok, and every other model your provider offers — through 8 providers: Groq, Together.ai, DeepInfra, Mistral AI, Anthropic, OpenAI, Google Gemini, and xAI.
Saves you money: Uses algorithmic cost selection across providers to optimize for the best available rate on each request.
Drop-in replacement: If your app already uses the OpenAI SDK (the most popular AI library), you only need to change two lines of code — the API key and the URL.
Stops prompt injection: Detects and blocks attempts to manipulate or “jailbreak” the AI model through crafted inputs.
Scans images too: If your app sends images to an AI model, the Hub extracts text from the image (OCR) and scans it for sensitive data before forwarding.

How It Works (The Big Picture)

Your App
Sends a prompt
OpenSourceAIHub
Scans & cleans
AI Provider
Returns answer
Your App
Gets safe response

Core Concepts

Key terms you'll see throughout this documentation, explained simply.

AI Providers

AI Providers (Groq, Together.ai, DeepInfra, Mistral AI, Anthropic, OpenAI, Google Gemini, xAI)

These are the 8 companies that host and run AI models on their servers. Think of them like different cell phone carriers — they all offer similar services (access to AI models) but at different prices and speeds. Groq is known for speed, Together.ai for variety, DeepInfra for cost, Mistral AI for their own models, Anthropic powers Claude, OpenAI powers GPT-4.1, Google Gemini offers Gemini models, and xAI provides Grok. You don't need accounts with any of them to use OpenSourceAIHub — unless you choose BYOK mode.

BYOK

BYOK (Bring Your Own Key)

"Bring Your Own Key" means you create your own account with an AI provider (like OpenAI, Google Gemini, xAI, Groq, or Together.ai), get an API key from them, and store it in OpenSourceAIHub. When you send a request, the Hub uses YOUR key to talk to the provider. The provider bills you directly. The Hub adds the security layer for free. This is ideal if you already have provider accounts or want maximum control over costs. BYOK is also the only way to access premium closed-source models like GPT-4o, Gemini 1.5 Pro, and Grok-2.

Managed Mode

Managed Mode (Wallet / Hub Credits)

Don't want to sign up with AI providers? No problem. In Managed Mode, you add credits to your OpenSourceAIHub wallet ($1 = 1,000,000 Hub Credits). Different models burn credits at different rates per actual token. The Hub uses its own provider keys on your behalf. The Hub charges wholesale cost plus a service fee — 25% markup for open-source models and 30% markup for closed-source models (OpenAI, Gemini, xAI) — deducted from your wallet automatically. This is the easiest way to get started — just top up and go.

Smart Routing

Smart Routing

When you use Managed Mode, the Hub doesn't just send your request to one provider — it periodically indexes prices across all supported providers and selects the most cost-efficient route for your request, ensuring high performance at near-wholesale rates. This is best-effort optimization, not a guarantee of the absolute lowest price on every request. You don't need to do anything — it just works.

DLP

DLP (Data Loss Prevention) / AI Firewall

DLP is the security layer that scans every request for sensitive data before it reaches the AI model. It can detect 28+ types of sensitive information — from email addresses and credit cards to API keys and social security numbers. When it finds something, it either replaces the data with [REDACTED] (redact mode) or blocks the entire request (block mode), depending on your policy settings.

Wallet

Wallet

Your prepaid credit balance on OpenSourceAIHub. You add credits via Stripe ($1 = 1,000,000 Hub Credits), and usage is deducted automatically each time you make an AI request in Managed Mode. Think of it like a prepaid balance for AI usage. 1,000,000 Hub Credits can cover thousands of simple AI requests.

Hub Credits

Hub Credits

Hub Credits are the abstract currency for your wallet balance. $1.00 = 1,000,000 Hub Credits. Different AI models consume credits at different rates per actual LLM token — lightweight models like Llama 3 8B burn fewer credits per token, while premium models like GPT-4o burn more. This means your credit pool stretches further with efficient models.

API Key

API Key

A secret string (like a password) that identifies who you are when making requests. OpenSourceAIHub gives you keys that start with "os_hub_" (user-level) or "oah_" (project-level). You include this key in every request so the Hub knows it's you. Never share your API keys publicly.

Which Mode Should I Use?

I'm new to AI / I just want to try it: Use Managed Mode. Top up your wallet with $5-10 and start making requests immediately. No provider accounts needed.
I already have OpenAI / Groq / Together.ai / etc. accounts: Use BYOK Mode. Store your provider keys in the Hub, and your requests go directly to the provider at their prices. Zero Hub cost.
I want the best of both: You can use BYOK for providers where you have keys and Managed Mode as a fallback. The Hub handles this automatically — no configuration needed.

Solution Guides

Deep-dive tutorials for specific use cases.

Quickstart

Start sending secure, cost-optimized AI requests in under 2 minutes.

Before You Start

  1. 1Create an account at app.opensourceaihub.ai using Google or GitHub sign-in.
  2. 2Get your API key — go to the API Keys page in your dashboard and click “Generate New Key”. Copy and save it immediately (it's shown only once).
  3. 3Choose your mode — either top up your wallet (Managed Mode, easiest) or store your own provider API keys (BYOK Mode, free). See if you're unsure.

1. Install the OpenAI SDK

npm install openai

OpenSourceAIHub is OpenAI-compatible. Use the standard SDK — just change the baseURL.

2. Make Your First Request

index.ts
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "os_hub_your_key_here",
  baseURL: "https://api.opensourceaihub.ai/v1",
});

const response = await client.chat.completions.create({
  model: "oah/llama-3-70b",          // virtual model → smart-routed
  messages: [
    { role: "user", content: "Explain the CAP theorem." }
  ],
  max_tokens: 512,
});

console.log(response.choices[0].message.content);

// Custom headers are returned:
// x-request-id:   req_xxxx
// x-hub-latency:  hub overhead ms
// x-dlp-latency:  DLP scan ms

How Routing Works

Managed Mode (Wallet)

The oah/ prefix activates Smart Routing. The Hub indexes pricing across Groq, Together.ai, DeepInfra, Mistral AI, Anthropic, OpenAI, Google Gemini, and xAI — then selects the most cost-efficient available provider for your request on a best-effort basis. Text models are billed per token; image generation is billed per event across three tiers: Performance (~3,750 credits), Standard (~50,000 credits), and Premium (~100,000 credits). All costs include the wholesale rate plus a Governance & Infrastructure fee (25% open-weight / 30% closed models) and are deducted from your prepaid wallet.

BYOK Mode (Bring Your Own Key)

When your project has a BYOK provider key configured, requests are routed directly to that provider using your own credentials. There is no cost-optimized routing — the Hub uses the provider you configured. You are billed directly by the provider. The full DLP / Firewall layer still applies to every request.

What Happens Behind the Scenes

Every time you send a request to the Hub, here's exactly what happens in order. The entire process typically takes under 100ms of Hub overhead:

  1. 1Authenticate: The Hub verifies your API key. It checks if the key is valid, not expired, and not revoked.
  2. 2Resolve Provider: Based on your mode (BYOK or Managed), the Hub determines which AI provider and credentials to use. In Managed Mode, the Smart Router selects the most cost-efficient available option.
  3. 3Scan for Sensitive Data: The AI Firewall scans every message in your request for PII, credentials, and injection patterns. If your request contains images, those are OCR-scanned too.
  4. 4Enforce Policy: If sensitive data is found, the Hub either replaces it with [REDACTED] or blocks the entire request — depending on your DLP policy settings.
  5. 5Check Balance: In Managed Mode, the Hub estimates how much this request will cost and verifies your wallet can cover it. If not, you get a 402 error with a clear message.
  6. 6Forward to AI: The cleaned, safe request is forwarded to the AI provider via the standard API. The Hub never stores your prompts or responses.
  7. 7Log Metadata: Only metadata is recorded — token counts, cost, latency, entity types detected. Never the actual content of your prompts or responses.
  8. 8Return Response: The AI provider's response is returned to your application with extra headers showing Hub latency and a correlation ID for debugging.

The Hub is Stateless — Your App Manages Conversation Memory

This is an important concept to understand: the Hub does not store any conversation history. Every request you send is completely independent. The Hub processes it, scans it, routes it, and immediately forgets it.

So how does a chatbot “remember” earlier messages? That's your app's job. If a user is on their 10th message in a conversation, your application sends all 10 previous messages in the messages array every single time. This is exactly how the OpenAI API works, and the Hub follows the same pattern.

Example: A 3-message conversation

// On the user's 3rd message, your app sends ALL prior messages:
messages: [
  { role: "user", content: "What is DLP?" },           // Message 1
  { role: "assistant", content: "DLP stands for..." },  // AI reply 1
  { role: "user", content: "How does it work?" },       // Message 2
  { role: "assistant", content: "It scans every..." },  // AI reply 2
  { role: "user", content: "Show me an example." }      // Message 3 (new)
]
// The Hub scans ALL 5 messages for PII, then forwards to the AI.

What this means for security: The Hub scans the entire messages array on every request — not just the newest message. Even if a user accidentally pastes a credit card number in message #2, the Hub catches it when message #5 is sent because message #2 is still in the array.

What this means for billing: In Managed Mode, you are billed for all tokens in the messages array, including the repeated history. Longer conversations cost more because they send more tokens per request. This is how all LLM APIs work — it's not specific to the Hub.

What this means for privacy: The Hub never stores your messages. Once the response is returned, the conversation data is gone from our systems. Only metadata (token count, cost, latency, entity types detected) is logged. Your conversation content is never written to disk or retained.

Supported & Unsupported Endpoints

The Hub implements the OpenAI /v1/chat/completions endpoint — the industry standard used by virtually all LLM applications. Some other OpenAI endpoints are not supported by design:

POST /v1/chat/completions — Fully supported. Stateless, secure, cost-optimized. This is the only endpoint you need for text and vision requests.
Threads API (/v1/threads) — Not supported. The OpenAI Assistants API stores messages server-side, making the API stateful. This requires storing billions of user messages — a significant storage cost and a direct privacy liability. Our stateless design is a deliberate choice: we never store your conversation content, which means there's nothing to breach, subpoena, or accidentally leak.
Streaming (stream: true) — Not yet supported (Phase 1). The AI Firewall scans the full request before forwarding, which requires processing the complete prompt first. Streaming with output scanning is planned for Phase 2.

If your application currently uses the Threads API, you can migrate by managing the messages[] array in your own code (or database) and sending the full conversation with each request. This gives you the same “memory” behavior while keeping the security benefits of our stateless proxy.

Response Headers

Every response includes these custom headers for full transparency:

x-hub-latencyHub overhead latency in ms (excludes upstream provider time)
x-dlp-latencyDLP / Firewall scan latency (ms)
x-request-idUnique request ID for tracing and support (req_xxxx)

Authentication

All requests require a Bearer token. Two key types are supported.

How API Keys Work

Every request to OpenSourceAIHub must include an API key in the Authorization header. This is how the Hub knows who you are, which security policies to apply, and how to bill you. API keys are like passwords for your AI requests — keep them secret and never include them in client-side code (like a website's JavaScript) where anyone can see them.

// Every request includes this header:
Authorization: Bearer os_hub_your_key_here

Hub API Key (os_hub_*)

User-Level

Your personal API key, tied to your user account. Created in the API Keys page of the dashboard. Best for individual developers, prototyping, and personal projects. The key is shown only once when created — after that, only an irreversible cryptographic hash is retained, so even our engineering team cannot retrieve your raw key.

Use this when you're a solo developer or want one key for all your projects.

Authorization: Bearer os_hub_*

Project API Key (oah_*)

Project-Level

A key scoped to a specific project. Each project can have its own DLP (security) policies, provider settings, and usage analytics. Multiple team members can use the same project key. Created within a project's Keys page.

Use this when you have multiple applications or teams that need separate security policies and usage tracking.

Authorization: Bearer oah_*

How Billing Is Determined

Your API key determines billing mode automatically. If your account (or project) has BYOK provider keys stored, the Hub uses your provider credentials — no wallet charge. If no BYOK keys are found, the Hub uses Managed Mode and deducts from your wallet (wholesale cost + 25% service fee for open-source models, 30% for closed-source models). You can mix both modes: BYOK for one provider and Managed for another.

Security Best Practices

Never put API keys in client-side code (browser JavaScript, mobile apps). Use a server-side backend.
Use environment variables to store keys — never hardcode them in source code.
Rotate keys periodically. You can create new keys and revoke old ones in the dashboard.
Use project-scoped keys (oah_*) for production apps to isolate analytics and policies.
If a key is compromised, revoke it immediately in the dashboard — it takes effect instantly.

Request Headers

Control routing, tagging, and analytics with optional request headers. All headers are optional — only include the ones you need.

Routing Overrides

By default, the Smart Router picks the cheapest provider for your model. Use these headers to override that behavior.

HeaderDescriptionExample
x-providerPin to a specific provider (bypasses Smart Routing)groq
x-modelOverride the model for this requestllama-3.3-70b-versatile

Analytics Tags

Tag every request with metadata that flows into your project analytics dashboard. Use these headers to slice usage by feature, environment, or tenant — then filter the dashboard to see per-feature costs, latency, and PII violations.

HeaderDescriptionExample
x-featureTag requests by feature for analytics dashboardschatbot-v2
x-envTag by environment for environment-level analyticsproduction
x-tenantMulti-tenant ID for per-customer trackingcustomer_123

Example: Tagged Request

Pass analytics tags alongside your chat completion request. These tags are stored with every request event and appear in your project dashboard, where you can filter charts and the request table by any combination of feature, environment, and tenant.

curl -X POST https://api.opensourceaihub.ai/v1/chat/completions \
  -H "Authorization: Bearer oah_your_project_key" \
  -H "Content-Type: application/json" \
  -H "x-feature: chatbot-v2" \
  -H "x-env: production" \
  -H "x-tenant: customer_123" \
  -d '{
    "model": "oah/llama-3-70b",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Dashboard Filtering

Once you start sending tagged requests, your project dashboard automatically shows filter dropdowns for each tag dimension. Filter by feature to see per-feature cost breakdowns, or by tenant to track per-customer usage — all charts, totals, and the request table update in real time.

Model Catalog

Virtual model names (oah/*) are provider-agnostic. In Managed Mode, the Smart Router optimizes for cost efficiency automatically. In BYOK Mode, use explicit provider model IDs.

Understanding AI Models & Providers

An AI model (like Llama 3, Mistral, or Claude) is the actual “brain” that processes your questions and generates responses. A provider (like OpenAI, Google Gemini, xAI, Groq, or Together.ai) is a company that runs these models on their servers and lets you use them via an API.

The Hub supports two categories of models:

  • Sovereign (open-source) models — like Llama 4, DeepSeek, Qwen, Mistral, and Mixtral — are open-weight models hosted by multiple providers at different prices. For example, Llama 4 Maverick is available on Groq, Together.ai, and DeepInfra. The Smart Router indexes prices across all of them and optimizes for the best available rate.
  • External (closed-source) models — like GPT-4.1 (OpenAI), Claude (Anthropic), Gemini (Google), and Grok (xAI) — are proprietary models only available from their creator. These are routed directly to the single provider that offers them.

Virtual model names (starting with oah/) let you request a model without choosing a provider. For sovereign models, the Hub's Smart Router will automatically select the most cost-efficient provider available. For external models, the Hub routes directly to the model's creator. If you need a specific provider, you can use the provider's native model name instead.

Virtual ModelInputOutput
oah/llama-4-maverick$0.20$0.60
oah/llama-4-scout$0.11$0.34
oah/llama-3-70b$0.35$0.40
oah/deepseek-r1$0.50$2.15
oah/deepseek-v3$0.21$0.79
oah/qwen3-235b$0.20$0.60
oah/mixtral-8x7b$0.45$0.70
oah/mistral-large$0.50$1.50
oah/mistral-small$0.10$0.30
oah/codestral$0.30$0.90
oah/claude-sonnet-4.6$3.00$15.00
oah/claude-opus-4.6$5.00$25.00
oah/claude-haiku-4.5$1.00$5.00
oah/claude-sonnet-4$3.00$15.00
oah/gpt-4.1$2.00$8.00
oah/gpt-4.1-mini$0.40$1.60
oah/gpt-4.1-nano$0.10$0.40
oah/o4-mini$1.10$4.40
oah/gemini-2.5-pro$1.25$10.00
oah/gemini-2.5-flash$0.30$2.50
oah/grok-3$3.00$15.00
oah/grok-3-mini$0.30$0.50

Not limited to the list above

The Hub supports any model your provider offers — not just the curated virtual models above. The table above shows models with optimized Smart Router support and verified pricing. In BYOK mode, if your provider releases a new model tomorrow, you can use it immediately by passing the provider's native model ID (e.g., gemini-2.5-flash) — any valid model string is forwarded directly. In Managed Mode, explicit model names are validated against the registry to prevent wasted credits on typos. Use oah/ virtual names for automatic Smart Router support.

Explicit Provider Models

You can bypass the Smart Router by using a provider-specific model ID directly. In BYOK mode, any model string is passed through to your provider. In Managed Mode, the model must exist in the Hub registry — unrecognized names are rejected with a helpful suggestion. We recommend using oah/ virtual names for the best experience.

// Recommended: virtual model names (works in all modes)
{ "model": "oah/gemini-2.5-flash" }   // Smart Router picks best provider
{ "model": "oah/llama-3-70b" }        // Cost-optimized across Groq, Together, DeepInfra

// Explicit: bypass Smart Router (BYOK — any model; Managed — registry-validated)
{ "model": "gemini-2.5-flash" }        // Google Gemini directly
{ "model": "gpt-4.1" }                 // OpenAI directly
{ "model": "llama-3.3-70b-versatile" } // Groq directly (short alias)

// Some providers use org/model-name format for open-source models:
{ "model": "meta-llama/llama-4-maverick-17b-128e-instruct" }  // Groq / Together / DeepInfra
{ "model": "deepseek-ai/DeepSeek-R1" }                        // Together / DeepInfra
{ "model": "Qwen/Qwen3-235B-A22B" }                           // Together / DeepInfra

Provider model IDs vary by provider. Closed-source models (OpenAI, Google, Anthropic) use short names like gpt-4.1. Open-source models hosted on inference providers (Groq, Together, DeepInfra) often use org/model-name format. Use the exact ID your provider expects, or use oah/ virtual names to avoid worrying about provider-specific IDs.

Which Model Should I Choose?

For general use and chatbots: oah/llama-4-maverick — Meta's flagship MoE model, excellent quality at low cost across 3 providers.
For fast, cheap prototyping: oah/llama-4-scout or oah/gpt-4.1-nano — low cost, high speed. Good for simple tasks.
For reasoning & math: oah/deepseek-r1 or oah/o4-mini — chain-of-thought reasoning models optimized for complex problem solving.
For code generation: oah/codestral or oah/claude-sonnet-4.6 — Mistral's specialized coding model or Anthropic's frontier coder.
For maximum quality (enterprise): oah/claude-opus-4.6 or oah/gpt-4.1 — the most capable models from Anthropic and OpenAI.
For image/vision tasks: oah/gemini-2.5-flash, oah/gpt-4o-mini, or oah/claude-3.5-sonnet — these models accept image inputs (base64 or URL). Not all models support vision; the Hub will return a model_does_not_support_vision error with suggestions if you send images to a text-only model.
Want a fast, cheap closed-source model? oah/gpt-4.1-mini or oah/gemini-2.5-flash — excellent price-to-performance for production workloads.

AI Firewall (DLP)

Every request is scanned for PII, credentials, and injection patterns before reaching the LLM. The firewall is fail-closed: if the scan service is unavailable, the request is blocked.

Why Does This Matter?

When your application sends data to an AI model, that data leaves your systems and goes to a third-party provider's servers. If a user accidentally types their credit card number, social security number, or your company's API keys into a prompt, that sensitive data could end up in the AI provider's logs, training data, or be exposed in a breach.

The AI Firewall prevents this by scanning every request before it reaches the AI model and either redacting (replacing sensitive data with [REDACTED]) or blocking (rejecting the entire request).

Real-World Example: Before vs. After

What the user sends (unsafe):

"messages": [
  {
    "role": "user",
    "content": "Hey, can you help me format this for my resume?\nName: Jane Smith\nEmail: jane.smith@acme.com\nSSN: 123-45-6789\nMy OpenAI key is sk-abc123xyz456"
  }
]

What the AI model receives (safe, after Hub redaction):

"messages": [
  {
    "role": "user",
    "content": "Hey, can you help me format this for my resume?\nName: [REDACTED]\nEmail: [REDACTED]\nSSN: [REDACTED]\nMy OpenAI key is [REDACTED]"
  }
]

The AI model still understands the request and can help with resume formatting — but it never sees the actual personal data. The user gets their helpful response; the company stays GDPR/CCPA compliant.

Two Policy Actions

REDACT

Sensitive data is replaced with [REDACTED] and the request is forwarded to the AI model. The AI can still process the rest of the prompt. This is the default for most entity types.

BLOCK

The entire request is rejected with a 400 error. Nothing is forwarded to the AI model. This is the default for prompt injection attacks and can be configured for any entity type.

28 Built-in Entities

Hub Governance Engine + Extended Detection

AI Firewall

PROMPT_INJECTIONJailbreak / injection heuristicsBLOCK

Developer Secrets

API_KEYOpenAI, Anthropic, Google AI keys (sk-*, gsk_*)REDACT
AWS_ACCESS_KEYAWS AKIA* access key IDsREDACT
AWS_SECRET_KEY40-char AWS secret access keysREDACT
PRIVATE_KEYRSA / EC / DSA / SSH PEM headersREDACT
GITHUB_TOKENPATs (ghp_), OAuth (gho_), fine-grained tokensREDACT
SLACK_WEBHOOKSlack incoming webhook URLsREDACT

Financial & Crypto

CREDIT_CARDVisa, MC, Amex, Discover, etc.REDACT
IBAN_CODEInternational Bank Account NumbersREDACT
US_BANK_NUMBERUS bank account numbersREDACT
CRYPTO_ADDRESSBitcoin (legacy + bech32) and EthereumREDACT
US_ITINUS Individual Taxpayer IDsREDACT

Personal Identifiers

EMAIL_ADDRESSEmail addressesREDACT
PHONE_NUMBERInternational phone numbersREDACT
US_SSNUS Social Security NumbersREDACT
US_PASSPORTUS passport numbersREDACT
PERSONNamed person entitiesREDACT
STREET_ADDRESSUS/UK/global street addresses, PO Boxes, apartment/suite numbersREDACT
DATE_TIMEDates and timesREDACT
NRPNationality, religion, political groupREDACT

UK / EU Identifiers

UK_NINOUK National Insurance NumbersREDACT
UK_NHS_NUMBERUK National Health Service NumbersREDACT

Network & Location

IP_ADDRESSIPv4 and IPv6 addressesREDACT
MAC_ADDRESSNetwork MAC addressesREDACT
LOCATIONNamed locationsREDACT
URLURLsREDACT

Medical & Licenses

MEDICAL_LICENSEMedical license numbersREDACT
US_DRIVER_LICENSEUS driver's license numbersREDACT

Custom IP Guard (Enterprise Regex)

Define custom regex patterns in your project's DLP policy to protect internal project names, proprietary terminology, internal IP ranges, or any sensitive string unique to your organization. Custom patterns run alongside the 28 built-in entities on every request.

DLP Policy → Custom Regex
// In your project's DLP Policy dashboard, add:
{
  "custom_regexes": [
    {
      "name": "INTERNAL_PROJECT_NAME",
      "pattern": "(?i)\\b(project[- ]?phoenix|codename[- ]?aurora)\\b",
      "action": "REDACT"
    },
    {
      "name": "INTERNAL_IP_RANGE",
      "pattern": "\\b10\\.42\\.\\d{1,3}\\.\\d{1,3}\\b",
      "action": "BLOCK"
    }
  ]
}

// These patterns are applied in real-time alongside
// the 28 built-in entity recognizers on every request.

DLP Policies

How the Hub applies DLP protection to every request — default global policy, project-level customization, and version history.

Global Default Policy — Maximum Protection

Every request that reaches the Hub is protected by the AI Firewall. If no custom project-level policy is configured, the Hub automatically applies the Global Default Policy:

Action

REDACT — PII replaced with [REDACTED]

Scope

All 28 entity types + Prompt Injection

This means that even if you haven't configured anything, your data is protected from the very first request. No entity is excluded — every PII category (personal identifiers, financial data, developer secrets, network info, medical data) is scanned and redacted automatically.

When Does the Default Policy Apply?

Hub Key

Hub API keys (os_hub_*) always use the global default policy. Hub keys are account-level and have no project context, so the default is the only DLP policy that applies. All 28 entities are scanned, PII is redacted.

Project Key

Project API keys (oah_*) use the project's custom DLP policy if one has been configured. If the project has no custom policy, the global default policy applies — same behavior as Hub keys.

Playground

Playground keys (os_pg_*) always use the global default policy. The Playground is for testing — it uses the same maximum protection as Hub keys.

Project-Level Policy Customization

To override the global default, create a custom DLP policy in your project's DLP Policy page. Custom policies let you:

  • Choose the action: REDACT (replace PII with placeholders, forward the request) or BLOCK (reject the entire request).
  • Select specific entities: Pick exactly which of the 28 entity types to monitor — for example, a fintech app might only need credit cards, SSNs, and bank numbers.
  • Add custom regex rules: Protect proprietary codenames, internal project names, or any pattern unique to your organization.
  • Use templates: Quick presets like AI Firewall, GDPR Bundle, Fintech/PCI, DevOps/SRE, or Maximum Protection.

Per-Project Sensitivity Tiers

Each project can choose a protection sensitivity tier that controls how aggressively PII is detected. This lets you balance security vs. prompt utility for different use cases.

Strict (0.25)

Maximum security. Catches even partial names like "B George" and low-confidence matches. Best for Legal, Finance, and Healthcare.

Balanced (0.40)

Recommended default. Optimized for professional AI interactions. Full names, emails, SSNs, and secrets are caught reliably.

Relaxed (0.60)

Highest utility. Only high-confidence PII triggers redaction. Common words like "July" pass through. Best for developer tooling.

Sensitivity is configured per-project in the DLP Policy page. The global admin threshold still controls the default for Hub keys and Playground keys.

Policy Versioning

Every time you save a policy, it creates a new immutable version (v1, v2, v3...). Old versions are never modified — they're deactivated. This gives you:

  • Full audit trail: See who created each version, when, and what it blocked.
  • Restore capability: Revert to any previous version with one click — this creates a new version as a copy, preserving the history.
  • Version comparison: The project dashboard shows block rate per policy version so you can measure the impact of policy changes.
  • Timeline markers: Cost and usage charts display markers at each policy change point.

Policy Resolution Flow

# Every request follows this resolution path:

1. Request arrives → authenticate key

2. Is it a project key?

├─ Yes → Does project have a custom DLP policy?

├─ Yes → Use project policy (e.g. v3: BLOCK, 14 entities + 2 regex)

└─ No Use global default (REDACT, all 28 entities)

└─ No → Is it a Hub key or Playground key?

└─ Use global default (REDACT, all 28 entities)

3. Scan messages with resolved policy → redact or block

4. Forward to LLM (if not blocked)

Vision Security (Multi-modal DLP)

Base64-encoded images in chat payloads are OCR-scanned for PII and secrets before the request is forwarded.

How It Works

  1. 1Detect: The Hub identifies image_url entries with data:image/* Base64 payloads in the messages array.
  2. 2Extract: The Vision Security Layer extracts all readable text from the image via OCR within a 5-second timeout.
  3. 3Scan: Extracted text is scanned by the Hub Governance Engine (28 entities + custom regex).
  4. 4Enforce: If violations are found, the entire request is blocked with a 400 Security Violation. The image is never forwarded.
  5. 5Purge: The image is processed in RAM only and immediately discarded. It is never written to disk or stored.
vision-request.ts
const response = await client.chat.completions.create({
  model: "oah/llama-3-70b",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What is in this config file?" },
        {
          type: "image_url",
          image_url: {
            url: "data:image/jpeg;base64,/9j/4AAQ..."
          },
        },
      ],
    },
  ],
  max_tokens: 256,
});

// The Hub runs OCR on the image BEFORE forwarding:
// 1. Extracts text via the Vision Security Layer (OCR)
// 2. Scans extracted text for PII / secrets
// 3. If violations found → 400 Security Violation
// 4. Image is processed ephemerally — zero persistence, zero storage

Size Limit

Individual image payloads are limited to 5 MB. Requests exceeding this return 413 Payload Too Large.

Block, Don't Redact

For images, the action is always BLOCK. We cannot selectively redact pixels — if PII is found, the entire request is rejected.

Vision-Capable Model Routing

Not all models accept image inputs. The Hub automatically handles this:

  • Virtual models (oah/*) in Managed Mode: The Smart Router automatically picks a vision-capable provider for the model. If no provider supports vision for that model, a clear error is returned with suggestions.
  • Explicit models (BYOK): If the model you specify doesn't support vision, the Hub returns a model_does_not_support_vision error with suggested alternatives instead of a vague provider rejection.

Vision-capable models are marked with the Vision badge in the Model Catalog above. Examples: oah/gemini-2.5-flash, oah/gpt-4.1-mini, oah/claude-sonnet-4.6.

Billing & Wallet

Two modes: Bring Your Own Key (free pass-through) or Managed Credits (prepaid wallet with 25% markup for open-source models, 30% for closed-source).

How Billing Works — Plain Language

Think of OpenSourceAIHub as a restaurant that serves dishes from many kitchens (AI providers). You have two options for paying:

A.BYOK (“I brought my own ingredients”): You already have an account with a kitchen (like Groq). You give the Hub your kitchen access card (API key). The Hub sends your order to your kitchen using your card. The kitchen bills you directly. The Hub charges you nothing — you just get the security service for free.
B.Managed (“Let the restaurant handle it”): You don't have a kitchen account — no problem. You put money in your Hub wallet (like buying a gift card). The Hub finds the best-priced kitchen available for you, pays them, and charges you the wholesale price plus a service fee (25% for open-source models, 30% for closed-source models like OpenAI, Gemini, and xAI) from your wallet. No signups with providers needed.

BYOK Mode

  • You provide your own provider API keys
  • No wallet deduction — zero Hub cost
  • Full DLP / Firewall protection included
  • Keys encrypted with AES-256-GCM

Managed Mode

  • No keys needed — top up your wallet
  • 25% markup (open-source) / 30% markup (closed-source)
  • Smart Router optimizes for best rate
  • Pre-flight balance check + max_tokens cap
  • Image gen billed per-event: 3,750 credits (Performance) to 100K credits (Premium)

Total Token Budgeting

In Managed Mode, the Hub performs a pre-flight check before every request:

  1. 1.Estimates the input token cost for the entire messages array (full conversation history).
  2. 2.If the estimated input cost alone exceeds your wallet balance → 402 Insufficient Balance.
  3. 3.Caps max_tokens (output) based on remaining balance after input cost, preventing surprise overspend.

Media Generation Tiers

Image generation models are billed at a flat fee per event instead of per-token. Prices include the wholesale cost plus the Governance & Infrastructure fee (25% open-weight / 30% closed models).

Performance

3,750

~$0.004/image

Standard

50,000

~$0.05/image

Premium

100,000

~$0.10/image

ModelTierWholesaleCredits?Credits are deducted atomically based on our cost-optimization engine. Estimates include the standard Hub service fee.
FLUX.1-schnellPerformance$0.0033,750
DALL-E 2Standard$0.02025,000
FLUX.1-devStandard$0.02531,250
SD XL Base 1.0Standard$0.04050,000
FLUX.1-proStandard$0.05062,500
Stable Diffusion 3Standard$0.06581,250
DALL-E 3Premium$0.080100,000
DALL-E 3 HDPremium$0.120150,000

Unknown media models default to 125,000 credits ($0.125) to protect margins. Media requests are logged as IMAGE_GENERATION events.

Wallet Top-Up

Credits are purchased via Stripe one-time payments. Minimum top-up is 5M Credits ($5). Credits have no cash value, are non-transferable, and expire after 12 months of account inactivity. Wallet deductions are atomic and transaction-safe — you can never be double-charged or spend below zero, even under high concurrency.

Frequently Asked Questions

How much does a typical AI request cost?

A short question-and-answer exchange (about 100-200 words total) with Llama 3 70B costs roughly 100-500 credits in Managed Mode. That means 1,000,000 Hub Credits can cover thousands of requests. Image generation uses tiered pricing: Performance models (e.g., Flux.1-schnell) cost ~3,750 credits, Standard models (e.g., SDXL) ~50,000, and Premium models (e.g., DALL-E 3) ~100,000 credits per image.

Can I use BYOK for one provider and Managed for another?

Yes! This is called hybrid mode. If you have a Groq API key stored in BYOK but not a Together.ai key, requests to Groq will use your key (free Hub charge) and requests to Together.ai will use Managed Mode (deducted from your wallet). The Hub handles this automatically.

What happens if my wallet runs out mid-request?

The Hub checks your balance BEFORE sending the request. If your wallet can't cover the estimated cost, the request is rejected with a 402 error and a clear message telling you to top up. You'll never be charged more than your balance.

Is the DLP / security layer free?

Yes. The AI Firewall runs on every request regardless of billing mode. Whether you use BYOK or Managed Mode, your data is always scanned and protected at no extra charge.

What does the 25%/30% markup cover?

The markup covers the AI Firewall (DLP), Smart Routing, infrastructure, API key management, wallet billing, usage analytics, and dashboard features. Open-source models carry a 25% markup; closed-source models (OpenAI, Gemini, xAI) carry a 30% markup to account for higher wholesale costs. It's the service fee for not having to build all of this yourself.

Success Response Reference

Complete field-by-field reference for every successful /v1/chat/completions response.

Understanding Latency Fields

┌───────────────── Total Request (elapsed) ──────────────────┐
│                                                            │
│  ┌─ Hub Overhead ─┐  ┌─── Provider (upstream) ───┐        │
│  │                 │  │                           │        │
│  │  DLP scan       │  │  Network + LLM inference  │        │
│  │  routing        │  │                           │        │
│  │  validation     │  │                           │        │
│  │                 │  │                           │        │
│  └─── latency_ms ──┘  └── upstream_latency_ms ───┘        │
│       (hub_meta)             (hub_meta)                    │
│                                                            │
│  dlp_latency_ms ⊆ latency_ms                              │
└────────────────────────────────────────────────────────────┘

latency_ms          = total_elapsed − upstream_latency_ms
dlp_latency_ms      ⊆ latency_ms  (DLP scan portion only)
upstream_latency_ms  = time spent waiting for the provider
200

Full Success Response Example

The response follows the standard OpenAI format with an additional hub_metadata object. Any client expecting OpenAI-style responses will work unchanged — hub_metadata can simply be ignored if not needed.

{
  // ── Standard OpenAI-compatible fields ──────────────
  "id": "chatcmpl-DKxHaZHGxRtRfLlIUglZWIcs2Q7FY",
  "object": "chat.completion",
  "created": 1773886814,
  "model": "o3-mini",
  "system_fingerprint": "fp_5b51a51e10",
  "service_tier": "default",

  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": "Abraham Lincoln was the 16th president ...",
        "tool_calls": null,
        "function_call": null,
        "provider_specific_fields": { "refusal": null },
        "annotations": []
      },
      "provider_specific_fields": {}
    }
  ],

  "usage": {
    "prompt_tokens": 502,
    "completion_tokens": 2909,
    "total_tokens": 3411,
    "completion_tokens_details": {
      "accepted_prediction_tokens": 0,
      "audio_tokens": 0,
      "reasoning_tokens": 2368,
      "rejected_prediction_tokens": 0
    },
    "prompt_tokens_details": {
      "audio_tokens": 0,
      "cached_tokens": 0
    }
  },

  // ── Hub-enriched fields ────────────────────────────
  "x_request_id": "req_8138a426fe794d0a",

  "hub_metadata": {
    // Routing & mode
    "requested_model": "oah/o3-mini",
    "provider_selected": "openai",
    "routing_mode": "smart_route",
    "mode": "MANAGED",
    "request_type": "CHAT_COMPLETION",
    "intent": "GENERAL_CHAT",
    "is_managed": true,

    // Latency breakdown (milliseconds)
    "latency_ms": 320,
    "dlp_latency_ms": 68,
    "upstream_latency_ms": 4977,

    // Throughput
    "tokens_per_sec": 150.3,

    // Cost (USD)
    "wholesale_cost_usd": 0.013352,
    "cost_usd": 0.0173576,
    "new_balance_usd": 0.95314815,

    // DLP / PII results
    "pii_detected": true,
    "dlp_action": "redact",
    "violations_count": 37,
    "entity_types_detected": [
      "DATE_TIME", "LOCATION", "NRP", "PERSON", "URL"
    ],
    "redacted_prompt": "[REDACTED] ([REDACTED]) was the 16th president of [REDACTED], serving from ..."
  }
}
400

Block Mode Response Example

When the DLP policy action is set to BLOCK, the request is rejected immediately and never forwarded to the AI provider. Each violation includes the entity type, character positions, and a confidence score.

{
  "error": "pii_policy_violation",
  "message": "Request blocked: PII detected in prompt",
  "request_id": "req_a08a3d1067654a16",
  "violations": [
    {
      "entity_type": "EMAIL_ADDRESS",
      "start": 45,
      "end": 64,
      "score": 1.0
    },
    {
      "entity_type": "US_SSN",
      "start": 110,
      "end": 121,
      "score": 1.0
    },
    {
      "entity_type": "CREDIT_CARD",
      "start": 142,
      "end": 161,
      "score": 1.0
    },
    {
      "entity_type": "IBAN_CODE",
      "start": 172,
      "end": 199,
      "score": 1.0
    },
    {
      "entity_type": "STREET_ADDRESS",
      "start": 263,
      "end": 283,
      "score": 1.0
    },
    {
      "entity_type": "PERSON",
      "start": 33,
      "end": 43,
      "score": 0.85
    },
    {
      "entity_type": "DATE_TIME",
      "start": 88,
      "end": 98,
      "score": 0.95
    },
    {
      "entity_type": "IP_ADDRESS",
      "start": 306,
      "end": 315,
      "score": 0.95
    },
    {
      "entity_type": "LOCATION",
      "start": 285,
      "end": 293,
      "score": 0.85
    }
  ]
}

Block Response Fields

FieldTypeDescription
errorstringAlways "pii_policy_violation" for DLP blocks.
messagestringHuman-readable reason: "Request blocked: PII detected in prompt".
request_idstringHub correlation ID for tracing and support (e.g. "req_a08a3d10...").
violationsarrayArray of detected entities. Each contains entity_type, start/end character offsets, and confidence score.
violations[].entity_typestringThe PII entity type detected (e.g. "EMAIL_ADDRESS", "PERSON", "CREDIT_CARD").
violations[].startintegerStart character offset in the original prompt.
violations[].endintegerEnd character offset in the original prompt.
violations[].scorefloatDetection confidence: 1.0 for pattern-based (email, SSN), 0.85–0.95 for NLP-based (person names, locations, dates).

Top-Level Response Fields (Success / Redact Mode)

Standard OpenAI-compatible fields plus Hub-enriched additions.

FieldTypeSourceDescription
idstringProviderUnique completion ID assigned by the upstream provider (e.g. "chatcmpl-DKxH...").
objectstringProviderAlways "chat.completion".
createdintegerProviderUnix timestamp (seconds) when the completion was generated.
modelstringHubActual provider model ID that ran the request (e.g. "o3-mini", "llama-3.3-70b-versatile"). This is the real model, not the requested virtual name.
system_fingerprintstringProviderProvider system version fingerprint. May be null for some providers.
service_tierstringProviderService tier used (e.g. "default"). Provider-specific; may be absent for non-OpenAI providers.
choicesarrayProviderArray of completion choices. Each contains index, finish_reason, message (with role, content, tool_calls, function_call, annotations), and optional provider_specific_fields.
usageobjectProviderToken usage: prompt_tokens, completion_tokens, total_tokens. May include completion_tokens_details (reasoning_tokens, etc.) and prompt_tokens_details (cached_tokens, etc.).
x_request_idstringHubHub correlation ID for tracing and support (e.g. "req_8138a42..."). Same value as the x-request-id response header.
hub_metadataobjectHubHub-enriched metadata block with routing, latency, cost, and DLP details. See detailed reference below.

hub_metadata Field Reference

FieldTypePresentDescription
requested_modelstringAlwaysExact model string the caller sent (e.g. "oah/o3-mini" or "gemini-2.5-flash").
provider_selectedstringAlwaysProvider that served the request (openai, groq, together, deepinfra, mistral, anthropic, gemini, xai).
routing_modestringAlways"smart_route" if an oah/* virtual model was used; "explicit" if a provider model was specified directly.
modestringAlwaysBilling mode — "MANAGED" (Hub keys, billed to wallet) or "BYOK" (your own provider keys).
request_typestringAlways"CHAT_COMPLETION" or "IMAGE_GENERATION".
intentstringAlwaysClassified prompt intent (GENERAL_CHAT, CODE_GENERATION, SUMMARIZATION, etc.).
is_managedbooleanManaged onlyPresent and true only when mode is MANAGED.
latency_msintegerAlwaysHub overhead in milliseconds — total elapsed minus upstream time. Includes DLP scan, routing, validation, response assembly.
dlp_latency_msintegerAlwaysTime spent on DLP/PII scanning (subset of latency_ms). 0 if no scan was performed.
upstream_latency_msintegerAlwaysTime the LLM provider took to respond — network round-trip plus model inference.
tokens_per_secnumberAlwaysThroughput: total_tokens / upstream_latency_seconds.
wholesale_cost_usdnumberAlwaysRaw provider cost in USD (what the provider charges).
cost_usdnumberAlwaysBilled cost in USD. Managed mode: wholesale + markup. BYOK: $0 (you pay provider directly).
new_balance_usdnumberManaged onlyRemaining wallet balance after deduction. Only present in Managed mode.
pii_detectedbooleanAlwaystrue if the DLP scanner found any PII / sensitive entities in the request.
dlp_actionstringIf PII found"redact" (entities replaced with [REDACTED]) or "detect" (flagged but not modified). Absent when no PII detected.
violations_countintegerIf PII foundNumber of individual PII entities detected (e.g. 37 means 37 separate matches).
entity_types_detectedstring[]If PII foundSorted list of unique entity types found (e.g. ["DATE_TIME", "LOCATION", "PERSON"]).
redacted_promptstringIf redactedThe prompt after PII entities were replaced with [REDACTED]. Only present when dlp_action is "redact".
media_eventsintegerImages onlyNumber of images processed. Only present for IMAGE_GENERATION requests.

Response Headers

Key metrics are also available as HTTP headers for clients that need them without parsing the JSON body.

HeaderExampleDescription
x-request-idreq_e0e48046...Unique correlation ID for this request. Use this when contacting support.
x-hub-latency490Hub overhead in ms (same as hub_metadata.latency_ms).
x-dlp-latency189DLP scan time in ms (same as hub_metadata.dlp_latency_ms).
x-hub-intentGENERAL_CHATClassified prompt intent.
x-hub-tokens-per-sec114.33Throughput metric.

Total elapsed time is not returned directly in the response. Calculate it as: latency_ms + upstream_latency_ms. For the example above: 320 + 4,977 = 5,297 ms total.

Error Reference

Structured JSON error responses with correlation IDs for every failure mode.

400

Security Violation — DLP Block

Returned when the AI Firewall detects PII, secrets, or a prompt injection pattern and the DLP policy action is BLOCK. The request is never forwarded to the LLM.

{
  "error": {
    "message": "Security policy violation: request blocked.",
    "type": "security_violation",
    "code": 400,
    "violations": [
      {
        "entity": "PROMPT_INJECTION",
        "action": "BLOCK",
        "start": 0,
        "end": 47
      }
    ],
    "correlation_id": "req_a1b2c3d4"
  }
}
400

Invalid Request

The request is malformed — empty messages array, empty last message content, streaming requested (not yet supported), or an unsafe image URL.

{
  "detail": {
    "error": "invalid_request",
    "message": "Invalid request: messages array must not be empty.",
    "correlation_id": "req_b2c3d4e5"
  }
}
400

Unknown Virtual Model

The oah/* virtual model name is not recognized in the Hub's model registry. This is caught early — before DLP scanning or provider calls. Includes similar model suggestions when available.

{
  "detail": {
    "error": "unknown_virtual_model",
    "message": "Virtual model 'oah/invalid_model' is not recognized. Did you mean: oah/llama-3-70b, oah/llama-4-maverick?",
    "suggested_models": ["oah/llama-3-70b", "oah/llama-4-maverick"],
    "docs_url": "/docs#models",
    "correlation_id": "req_b2c3d4e5"
  }
}
400

Unknown Model (Managed Mode)

In Managed Mode, an explicit (non-virtual) model name was not found in the registry. The response includes a suggested virtual model name when available.

{
  "detail": {
    "error": "unknown_model",
    "message": "Model 'gpt-5-turbo' is not in the registry. Did you mean 'oah/gpt-4.1'?",
    "suggested_model": "oah/gpt-4.1",
    "correlation_id": "req_c3d4e5f6"
  }
}
400

Upstream Bad Request

The LLM provider rejected the request (e.g. invalid parameters, unsupported features for the model). Passed through from the provider.

{
  "detail": {
    "error": "upstream_bad_request",
    "message": "The AI provider rejected this request. Did you mean 'oah/llama-3-70b'? Please check your model name and parameters.",
    "suggested_model": "oah/llama-3-70b",
    "correlation_id": "req_d4e5f6g7"
  }
}
401

Unauthorized

The API key is missing from the Authorization header.

{
  "detail": "Missing API key"
}
402

Insufficient Balance

Managed Mode only. The wallet balance is too low to cover the estimated cost, or was exhausted mid-request. Also returned when no provider credentials could be resolved.

{
  "detail": {
    "error": "insufficient_balance",
    "message": "Your Hub Wallet balance is $0.001. Minimum required: $0.01. Please top up.",
    "correlation_id": "req_e5f6g7h8"
  }
}
402

Budget Exceeded

Managed Mode only. The request would exceed the project's configured daily or monthly budget cap.

{
  "detail": {
    "error": "budget_exceeded",
    "message": "This request would exceed your project daily budget of $5.00. Current spend: $4.98.",
    "correlation_id": "req_f6g7h8i9"
  }
}
403

Forbidden

The API key is valid but not authorized for this action — revoked key, playground key used externally, or a premium model accessed without a Pro subscription.

{
  "detail": "Invalid or revoked API key"
}
400

Model Does Not Support Vision

The request contains image content (image_url parts) but the selected model does not support vision/multimodal inputs. The response includes suggested vision-capable models.

{
  "detail": {
    "error": "model_does_not_support_vision",
    "message": "Model 'oah/llama-3.3-70b-versatile' does not support image inputs. Try a vision-capable model: oah/gemini-2.5-flash, oah/gpt-4o-mini, oah/claude-3.5-sonnet",
    "suggested_models": ["oah/gemini-2.5-flash", "oah/gpt-4o-mini", "oah/claude-3.5-sonnet"],
    "correlation_id": "req_v1s10n"
  }
}
413

Payload Too Large

An image payload exceeds the 5 MB per-image limit.

{
  "detail": {
    "error": "image_too_large",
    "message": "Image exceeds 5MB limit (received 7.2MB).",
    "correlation_id": "req_g7h8i9j0"
  }
}
429

Rate Limited

Too many requests in the current window. Respect the Retry-After header.

{
  "detail": {
    "error": "rate_limit_error",
    "message": "Rate limit exceeded. Try again in 60 seconds.",
    "correlation_id": "req_h8i9j0k1"
  }
}
500

Security Processing Error

The PII/DLP scanning service is temporarily unavailable. The Hub fails closed — requests are blocked, never forwarded unscanned.

{
  "detail": {
    "error": "security_processing_error",
    "message": "PII safety scan is temporarily unavailable. Requests are blocked for your protection. Please try again shortly.",
    "correlation_id": "req_i9j0k1l2"
  }
}
502

Upstream Provider Error

The LLM provider returned an unexpected error, rejected the credentials (Managed Mode), or could not be reached.

{
  "detail": "The AI provider returned an error. Please try again or try a different model."
}
503

Service Temporarily Unavailable

An internal dependency (authentication, key resolution, or policy lookup) is temporarily unavailable. Safe to retry.

{
  "detail": "Unable to verify your API key right now. Please try again shortly."
}
504

Upstream Timeout

The LLM provider did not respond within the timeout window. The request was not completed.

{
  "detail": "The AI provider took too long to respond. Please try again."
}

Postman Collection

Import our pre-built collection to test every endpoint — including security violation and balance check scenarios.

OpenSourceAIHub API Collection

Includes requests for Smart-Routed completions, explicit provider calls, streaming, vision/multi-modal, PII detection tests, prompt injection tests, and all error scenarios. Pre-configured with collection variables.

Download Postman Collection

Postman v2.1 format · Import via File → Import in Postman

Frequently Asked Questions

Common questions about architecture, compatibility, and privacy.

Does the Hub remember my conversation history?

No. The Hub is completely stateless — it processes each request independently and immediately forgets it. Conversation memory is your application's responsibility. If a user is on their 10th message, your app sends all 10 previous messages in the messages[] array every time. The Hub scans the entire array for PII, routes it to the AI, and discards everything. Only metadata (token count, cost, latency) is logged — never your actual messages. This is exactly how the OpenAI API works.

Why do longer conversations cost more?

Because your app resends the full conversation history with every request. A 10-message conversation means the 10th request contains all 10 messages — so you're billed for all those tokens each time. This is standard for all LLM APIs (not Hub-specific). To manage costs, consider trimming older messages from the array or summarizing earlier conversation turns.

Do you support the OpenAI Threads / Assistants API?

No, and this is by design. The Threads API (/v1/threads) stores conversation messages on the server, making it stateful. This would require us to store billions of user messages — a massive storage cost and a direct privacy liability. Our stateless architecture means we never hold your conversation data, so there's nothing to breach, subpoena, or accidentally leak. If you need conversation memory, manage the messages[] array in your own code or database and send the full history with each request.

Does the Hub store my data?

No. We only log metadata (like "1 email address was blocked") — we never store the actual content of your messages or the AI's responses. Your data passes through our security layer and is forwarded to the AI provider. Nothing is saved on our servers. Images sent for vision scanning are processed in RAM only and immediately discarded.

Is this compatible with my existing code?

If your app uses the OpenAI SDK (the most popular AI library), you only need to change two lines: the API key and the base URL. Everything else — your models, your prompts, your response handling — stays exactly the same. The Hub implements the standard /v1/chat/completions endpoint.

What is the default DLP policy?

The Hub applies a "Maximum Protection" default policy to every request that doesn't have a custom project-level policy. This default scans for all 28 built-in entity types (PII, credentials, financial data, etc.) plus prompt injection patterns, and redacts any matches with [REDACTED] before forwarding to the AI model. Hub keys always use this default. Project keys use their project's custom policy if one exists, otherwise they also get this default. You are protected from your very first API call — no configuration required.

How do I customize my DLP policy?

Go to your project's DLP Policy page (Projects → Your Project → Policy). There you can change the action (REDACT vs. BLOCK), select specific entity types, add custom regex patterns, or apply quick templates (GDPR, Fintech, DevOps, etc.). Every save creates a new immutable version — you can view version history and restore any previous version with one click.

Does streaming (stream: true) work?

Not yet. In Phase 1 the AI Firewall scans the full request before forwarding, which requires processing the complete prompt. Sending stream: true will return a 400 error. Streaming with output scanning is planned for Phase 2.

Glossary

Quick reference for every key term used in this documentation.

API (Application Programming Interface)A way for software applications to communicate with each other. When your app sends a request to the Hub, it's using an API.
API KeyA secret string used to identify and authenticate your application when making requests. Like a password for machine-to-machine communication.
Base URLThe root address of an API. For OpenSourceAIHub, this is https://api.opensourceaihub.ai/v1. This is the address your application sends requests to.
Bearer TokenThe format used to send your API key in HTTP requests. It looks like: Authorization: Bearer os_hub_your_key_here
BYOK (Bring Your Own Key)A mode where you provide your own AI provider API keys. The Hub uses your credentials to access the AI model. You pay the provider directly; the Hub's security layer is free.
DLP (Data Loss Prevention)Technology that detects and prevents sensitive data (like credit card numbers or personal information) from being leaked to external systems.
EndpointA specific URL path that accepts requests. The Hub's main endpoint is POST /v1/chat/completions.
Fail-ClosedA security design where, if the protection system fails or is unavailable, all requests are BLOCKED rather than allowed through unprotected. The Hub uses this approach.
LLM (Large Language Model)The AI "brain" that processes text and generates responses. Examples: Llama 3, Claude, Mistral, GPT-4o, Gemini, Grok. These are hosted by providers.
Managed ModeA billing mode where the Hub uses its own provider keys on your behalf. Costs are deducted from your prepaid wallet at wholesale price + 25% markup (open-source models) or 30% markup (closed-source models).
Entity RecognitionAI-powered detection that identifies specific types of information in text — like names, locations, dates, and organizations — even when they don't follow a fixed pattern. The Hub uses this as part of its DLP scanning.
OCR (Optical Character Recognition)Technology that extracts readable text from images. The Hub uses OCR to scan images for sensitive data before they reach the AI model.
PII (Personally Identifiable Information)Any data that can identify a specific person — names, email addresses, phone numbers, social security numbers, etc.
PromptThe text input you send to an AI model. In the OpenAI format, prompts are structured as 'messages' with roles (system, user, assistant).
Prompt InjectionA security attack where a user crafts a special input designed to trick the AI model into ignoring its instructions or revealing its system prompt.
ProviderA company that hosts and runs AI models on their servers. The Hub supports 8 providers: Groq (fast inference), Together.ai (model variety), DeepInfra (low cost), Mistral AI (Mistral models), Anthropic (Claude), OpenAI (GPT-4.1), Google Gemini, and xAI (Grok). Any model your provider supports works — you're not limited to our curated catalog.
RedactionReplacing sensitive data with a placeholder like [REDACTED] so the AI model never sees the original value.
SDK (Software Development Kit)A pre-built library that makes it easier to use an API. The OpenAI SDK (npm: openai, pip: openai) works with the Hub.
Sovereign ModelAn open-source/open-weight AI model hosted by multiple providers. Examples: Llama 3, Mistral, Mixtral.
External ModelA closed-source AI model only available from its creator. Examples: GPT-4o (OpenAI), Gemini (Google), Grok (xAI).
Smart RoutingThe Hub's automatic system that periodically indexes pricing across all supported providers and selects the most cost-efficient route for your request. Routing is best-effort and balances cost, latency, and availability.
TokenThe unit AI models use to measure text. Roughly 1 token ≈ 3/4 of a word or 4 characters. Costs are calculated per token.
Virtual ModelA model name starting with oah/ that is provider-agnostic. The Smart Router decides which provider to use based on cost and availability.
WalletYour prepaid balance on the Hub, measured in US dollars. Top up via Stripe. Costs are deducted automatically in Managed Mode.
WebhookAn automated HTTP request sent from one system to another when an event happens. The Hub uses Stripe webhooks to process payments.