Spend control and routing for AI agents

Your AI agents should not surprise you with a bill.

TokSuan is operated by TokenSmart LLC and sits between your agents and model providers, making every request visible, capped, and safely routed. Simple turns stop burning frontier-model money; hard turns keep the quality they need.

Start free with email Estimate savings View routing wins OpenClaw guide Hermes guide Self-host docs

No token markup. Bring your own provider keys. Same-provider BYO judging by default. Self-hostable when you need it.

Asked model

gpt-5.5

Landed model

gemini-2.5-flash-lite

Cheaper route

89×

≈$4.2k / 100k similar turns

Quality proof

Quality checked

A receipt tells your team what changed, why it changed, and whether the cheaper route is ready to promote.

Get started in one SDK change

Four steps, no agent rewrite.

Add the provider key you already use, mint a TokSuan project key, swap base_url, then inspect the first receipt.

01Provider key

02Project key

03base_url

04Receipt

Hosted value

Open runtime. Hosted policy operations.

The gateway path is inspectable: budgets, routing, provider resolution, receipts, and key handling are open. Hosted TokSuan adds the work nobody wants to operate every week: benchmark rosters, aggregate routing intelligence, policy promotion, rollback, provider health, and abuse review.

See trust boundary See aggregate proof

Policy factory

Private eval recipes and model rosters produce candidate routing policies without touching the runtime path.

Aggregate intelligence

Hosted and opt-in self-host aggregates surface route pairs that repeatedly save money after privacy thresholds.

Ops guardrails

Provider health, DB sanity, incident snapshots, and report approvals keep policy changes reversible.

Developer friendly

Keep the SDK. Swap the gateway.

Cursor, OpenClaw, Hermes, LangChain, Vercel AI SDK, Cline, and internal bots can keep their OpenAI-compatible workflow. TokSuan adds the receipts, budgets, and routing layer in the middle.

Cursor OpenClaw Hermes LangChain Vercel AI SDK Cline

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://gateway.tokensmt.com/v1",
  apiKey: "ts_your_project_key",
});

await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Ship it." }],
});

Why teams add a control plane

Model gateways give access. TokSuan makes routing decisions.

Agent workloads are not normal API traffic. They retry, call tools, branch into long sessions, and silently swap from cheap to frontier models. TokSuan decides when a turn can move down, when it must stay on an advanced model, and gives each decision a receipt.

Bills arrive after the damage

Which agent, project, or prompt created the spike?

Agents repeat expensive mistakes

Looping sessions can keep spending while nobody is watching.

Routing needs proof

Cheaper models need evidence before production traffic moves.

Control loop

See it, cap it, shrink it.

Three product surfaces work together: a ledger for visibility, budget guards for control, and routing proof for safe savings.

Observe

Every request becomes a receipt.

Project, provider, model, tags, latency, input tokens, output tokens, routing reason, and cost land in one ledger your team can inspect.

Control

Budgets block before upstream.

Daily and monthly caps stop runaway spend before provider billing.

Agents

Loop detection catches repeats.

Repeated fingerprints can be stopped before an agent burns the same turn again.

Optimize

Route cheaper only when the evidence is good enough.

Public benchmarks provide the day-one frontier. Shadow trials and project history teach TokSuan which provider works best for your agent over time.

No token markup

Keep paying providers directly. TokSuan is the control layer, not a reseller.

Encrypted BYO keys

Your app uses ts_ project keys while upstream secrets stay encrypted at rest.

Self-hostable

Run the Apache-2.0 code with Postgres when your team needs full control.

Before you route traffic

The questions buyers ask first.

The short version for engineering, finance, and security before you put production agent calls behind a gateway.

Do I need to change my app?

Usually one base_url and one API key. Your request shape stays OpenAI-compatible.

Is this OpenRouter?

No. OpenRouter gives access to many models. TokSuan decides which model an agent turn should use, enforces budgets, and learns from your workload.

Can I see the exact request that spent money?

Yes. The ledger stores model, provider, tokens, latency, tags, cost, and receipt headers.

Will this hurt model quality?

Routes promote only when the policy and your receipts show the cheaper model is safe; shadow trials let you prove quality before switching production traffic.

What if an agent loops?

Loop detection and budgets can stop repeated turns before they reach upstream.

Is hosted the only option?

No. Use the hosted gateway or self-host when deployment control matters more.

Start with one real request

Send one agent request. Get one receipt your team can trust.

Estimate the opportunity, connect a provider key, and inspect the first request before changing a production route.

Start free Quickstart guide See trust boundary