Phi Gateway
Many models. One API.
The AI gateway for protected health data. For developers, one OpenAI-compatible base URL — no SDK rewrite, no provider keys in the application. For IT and compliance, a control point at the wire where every request is identified, policy-checked, credential-isolated, and recorded before it leaves your boundary.
Drop in and route
Phi Gateway speaks the OpenAI API. The application changes one line. The gateway handles identity, policy, provider keys, and audit.
from openai import OpenAI
# Same SDK. New base URL. No provider keys in the app.
client = OpenAI(
base_url="https://phi.your-hospital.internal/v1",
api_key="agent:abc123...", # identity token issued by Phi Gateway
)
response = client.chat.completions.create(
model="anthropic/claude-sonnet-4", # provider/model — any route policy allows
messages=[
{"role": "user", "content": "Summarize the findings in this report."}
],
)
Address any model as provider/model — openai/gpt-4o, anthropic/claude-sonnet-4, google/gemini-2.0-pro, local/llama-3-70b. Phi Gateway resolves the route, swaps in the real provider credential, forwards the request, and records the transaction. The application never holds a provider key.
How a request flows
The four mechanisms — policy, routing, credential isolation, audit — are not features layered on. They are the four phases of every request, enforced at the infrastructure boundary on the way out and on the way back.
Cloud or on-prem
Same API surface, same SDK, two deployment shapes.
Phi Cloud
Flux-hosted. One endpoint, BAA with Flux, customer or Flux-pooled provider keys.
Built for evaluation, early deployments, and mid-size practices that want the control plane without operating it.
Phi On-Prem
Customer-deployed. Docker image, OVA, or Windows installer. Customer keys, customer perimeter, customer-chosen routes including local LLMs.
Built for hospitals, IDNs, regulated enterprises, and air-gapped sites where the trust boundary cannot move outside the network.
Phi Vault
The local store inside your perimeter — provider keys, identity tokens, the audit ledger, and optional de-identification mappings for workloads where reversible tokenization is sound. Ships with Phi On-Prem.
Not a magical PHI scrubber. Reversible tokenization works for structured extraction, classification, and controlled-schema report transforms where the model preserves tokens. Free-form chat that paraphrases or invents identifiers is out of scope. The vault is credible because its scope is explicit.
For developers
Drop-in compatibility
The OpenAI SDK works unchanged. Existing clients, agent frameworks, and internal tools point at one new base URL. Model strings move from gpt-4o to openai/gpt-4o so the gateway can route.
No provider keys in the application
Phi Gateway issues identity tokens scoped to your agent or service. Real provider keys live at the gateway. A compromised client has nothing to leak — its token is useful only inside the boundary.
Audit and cost for free
Every request is logged with caller, model, tokens, latency, and cost. Operator dashboards surface spend per agent and per provider. The same JSON feeds external SIEMs, billing systems, and the MGL fleet view.
Same shape as the open standard
Phi Gateway is built on the cllama credential-starvation proxy. The wire protocol, identity model, and audit log shape are the public cllama specification — applications built against Phi Gateway run against any cllama-compatible proxy.
For IT and compliance
Policy at the infrastructure layer
Per-organization and per-role rules authored in Model Governance Layer. Policy is enforced by infrastructure, not prompt — a billing workflow and a radiology assistant do not share permissions.
Audit-ready evidence
Caller, route, model, decision, latency, tokens, and cost recorded for every request. Append-only, independent of the model, suitable for HIPAA, PHIPA, or GDPR special-category review.
BAA path or your perimeter
Phi Cloud carries a BAA with Flux. Phi On-Prem keeps Flux out of the data path entirely — the gateway, the keys, the ledger, and the de-identification mappings all live inside your network.
Deployment forms you already operate
Docker container, OVA appliance, or Windows installer — the same packaging pattern used by the rest of the Flux medical fleet. No bespoke runtime, no proprietary orchestrator.