Design a Multi-Tenant Enterprise AI Platform

Clarify tenant boundaries

Separate compute isolation, data isolation, policy isolation, and billing isolation. Some tenants need hard separation; others can share models and infrastructure with strict metadata, key, and permission controls.

Architecture

Tenant identity: every request carries tenant, user, role, policy, and budget context.
Data layer: per-tenant indexes or ACL-aware shared indexes with strict metadata filters.
Model gateway: route by tenant policy, cost budget, latency SLO, and quality target.
Tool gateway: enforce per-tenant tool permissions and approval gates.
Observability: trace model, prompt, retrieval, tools, cost, latency, and policy decisions by tenant.
Admin controls: quotas, audit logs, model allowlists, eval dashboards, and incident controls.

Build-vs-buy decisions

Use managed models and vector stores early if differentiation is in the product workflow. Build custom routing, evals, and policy layers only when tenant requirements demand it.

Metrics

cost per tenant and feature
latency by tenant and route
isolation violations, ideally zero
retrieval permission failures
model quality by tenant
quota exhaustion and noisy-neighbor incidents

Failure modes

Cross-tenant data leak: the highest-severity failure. Prevent at retrieval and tool access time.
Noisy neighbor: one tenant exhausts GPU, vector DB, or rate-limit capacity.
Cost runaway: one feature or tenant silently dominates spend.
Policy drift: tenant-specific rules change but prompts/tools/indexes are not updated.

What the architect signal looks like

Close with the isolation decision: hard isolation for regulated or high-risk tenants, shared infrastructure with strict policy and audit controls for lower-risk tenants.