Your AI agents should not surprise you with a bill.
TokSuan is operated by TokenSmart LLC and sits between your agents and model providers, making every request visible, capped, and safely routed. Simple turns stop burning frontier-model money; hard turns keep the quality they need.
No token markup. Bring your own provider keys. Same-provider BYO judging by default. Self-hostable when you need it.
gpt-5.5gemini-2.5-flash-liteFour steps, no agent rewrite.
Add the provider key you already use, mint a TokSuan project key, swap base_url, then inspect the first receipt.
Open runtime. Hosted policy operations.
The gateway path is inspectable: budgets, routing, provider resolution, receipts, and key handling are open. Hosted TokSuan adds the work nobody wants to operate every week: benchmark rosters, aggregate routing intelligence, policy promotion, rollback, provider health, and abuse review.
Policy factory
Private eval recipes and model rosters produce candidate routing policies without touching the runtime path.
Aggregate intelligence
Hosted and opt-in self-host aggregates surface route pairs that repeatedly save money after privacy thresholds.
Ops guardrails
Provider health, DB sanity, incident snapshots, and report approvals keep policy changes reversible.
Keep the SDK. Swap the gateway.
Cursor, OpenClaw, Hermes, LangChain, Vercel AI SDK, Cline, and internal bots can keep their OpenAI-compatible workflow. TokSuan adds the receipts, budgets, and routing layer in the middle.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://gateway.tokensmt.com/v1",
apiKey: "ts_your_project_key",
});
await client.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Ship it." }],
});Model gateways give access. TokSuan makes routing decisions.
Agent workloads are not normal API traffic. They retry, call tools, branch into long sessions, and silently swap from cheap to frontier models. TokSuan decides when a turn can move down, when it must stay on an advanced model, and gives each decision a receipt.
Bills arrive after the damage
Which agent, project, or prompt created the spike?
Agents repeat expensive mistakes
Looping sessions can keep spending while nobody is watching.
Routing needs proof
Cheaper models need evidence before production traffic moves.
See it, cap it, shrink it.
Three product surfaces work together: a ledger for visibility, budget guards for control, and routing proof for safe savings.
Every request becomes a receipt.
Project, provider, model, tags, latency, input tokens, output tokens, routing reason, and cost land in one ledger your team can inspect.
Budgets block before upstream.
Daily and monthly caps stop runaway spend before provider billing.
Loop detection catches repeats.
Repeated fingerprints can be stopped before an agent burns the same turn again.
Route cheaper only when the evidence is good enough.
Public benchmarks provide the day-one frontier. Shadow trials and project history teach TokSuan which provider works best for your agent over time.
No token markup
Keep paying providers directly. TokSuan is the control layer, not a reseller.
Encrypted BYO keys
Your app uses ts_ project keys while upstream secrets stay encrypted at rest.
Self-hostable
Run the Apache-2.0 code with Postgres when your team needs full control.
The questions buyers ask first.
The short version for engineering, finance, and security before you put production agent calls behind a gateway.
Do I need to change my app?
Usually one base_url and one API key. Your request shape stays OpenAI-compatible.
Is this OpenRouter?
No. OpenRouter gives access to many models. TokSuan decides which model an agent turn should use, enforces budgets, and learns from your workload.
Can I see the exact request that spent money?
Yes. The ledger stores model, provider, tokens, latency, tags, cost, and receipt headers.
Will this hurt model quality?
Routes promote only when the policy and your receipts show the cheaper model is safe; shadow trials let you prove quality before switching production traffic.
What if an agent loops?
Loop detection and budgets can stop repeated turns before they reach upstream.
Is hosted the only option?
No. Use the hosted gateway or self-host when deployment control matters more.
Send one agent request. Get one receipt your team can trust.
Estimate the opportunity, connect a provider key, and inspect the first request before changing a production route.