AIMay 2026 · 2 min read

Cost-routing AI calls: what I learned shipping a Claude-powered translator

How we kept token bills bounded while translating hundreds of pages of Japanese business docs through Claude.

We needed a way for the team to translate Japanese spec docs, customer materials, and architecture diagrams into English on demand. Excel files. Word docs. drawio diagrams. The obvious answer was Claude — but the obvious-er problem was: token bills add up fast when business documents are running through the API.

After shipping the first version we ended up with two engineering problems that mattered, and one decision that solved them both.

The format-preservation problem

You can't just ship raw text to Claude and paste the response back. Excel cares about formulas. Word cares about styles, tables, and headings. drawio cares about shape positions and connections. If translation breaks any of that, the document arrives broken.

The fix was format-specific: a parser per file type that walks the structure, extracts only the human-readable strings, sends them to Claude in batched chunks, then rebuilds the document by mapping the translated strings back to their original cell / run / shape. The structure round-trips cleanly. Only the text changes.

The cost problem

The other half was just: spending money. The cheap fix is to use a cheaper model for everything. The honest fix is to recognize that not every chunk needs the same model.

We made cost a routing decision. Every chunk goes through:

  1. Cache lookup first — normalized source text + target language is the key. Identical strings (and we see a lot of them in business docs — repeated section headers, common phrases) never hit Claude twice.
  2. Tiered model selection — short and low-complexity strings go through a cheaper model. Anything with domain-specific or mixed-language context routes to the larger model.

No per-document tuning. The router decides.

What ended up mattering

The interesting thing is which problem turned out to be the harder one. I expected the cost problem; the format-preservation problem ate more design time. Specifically, drawio: its file format is a deflated XML inside a base64 wrapper, and shape positions plus connection labels both need translation but only the labels should change.

If you're building anything similar, my one piece of advice: design the cache and the model router before you write the first translation call. Retrofitting cost discipline after you're in production is harder than it sounds.