AI unit-economics worksheet
You don't have a business case until you can fill this in at production volume, with the TokenOps levers applied. The proof-of-concept ran on forty dollars. Production is a different animal.
The workflow: ______
Estimated monthly volume (transactions / documents / tasks): ______
Section 1 — What it costs today (per month)
| Cost line | How to calculate it | Your number |
|---|---|---|
| (a) Labor — people-hours on this workflow × fully loaded cost/hr | _______ hrs × $_______ /hr | $ |
| (b) Error and rework cost | Estimate re-do rate × labor above, or known error cost | $ |
| (c) Delay or opportunity cost | Late fees, lost throughput, or cost of the queue | $ |
| Total current cost / month = (a) + (b) + (c) | $ |
Section 2 — What the AI version costs (per month at production volume)
Apply the four TokenOps levers before you enter inference cost. Entering the demo price here is the most common error.
TokenOps lever worksheet
| Lever | Applies here? (Y/N) | Estimated savings or note |
|---|---|---|
| Caching — mark stable system instructions / reference docs as cacheable. Cached reads cost roughly 90% less than standard input (per Anthropic pricing; AWS Bedrock announced same 90% figure Dec 2024). | ||
| Batch processing — for work where "by tomorrow morning" is fine: flat 50% discount across major providers (Anthropic, OpenAI, Google Gemini). Stacks with caching multiplicatively. | ||
| Model routing — send the easy majority to a cheaper small model; reserve the frontier model for genuinely hard requests. RouteLLM research (UC Berkeley / Anyscale, ICLR 2025) held 95% of GPT-4 quality while routing only 14–26% of requests to the premium model, cutting per-query cost up to 85%. | ||
| Context discipline — don't stuff the window. Send only the relevant section, not the whole document. Every unnecessary token costs money on every call, forever. |
AI run-cost table (after applying levers above)
| Cost line | Your number |
|---|---|
| Inference / token cost at production volume (levers applied) | $ |
| Software / platform / seat licenses | $ |
| Human-in-the-loop review time (hrs/month × fully loaded rate) | $ |
| Maintenance + quarterly re-validation (amortized monthly) | $ |
| Total run cost / month | $ |
Section 3 — The honest answer
| Line | Amount |
|---|---|
| Monthly saving = total current cost − total run cost | $ |
| One-time build / setup cost | $ |
| Payback (months) = build cost ÷ monthly saving | _______ months |
| Cost per task (run cost ÷ monthly volume) | $_______ per task |
Section 4 — Spike risk check
The planned workload is predictable. Accidents blow up the bill — a retry loop, a workflow that re-sends the full context on every turn, a batch job pointed at the wrong model.
- ☐ We have a model gateway (e.g., LiteLLM, OpenRouter) that routes calls, enforces budgets, and logs every token.
- ☐ We have spike alerts that catch a runaway loop the afternoon it happens, not at month-end.
- ☐ We have per-team or per-workflow token budgets so one experiment can't consume the whole company's allowance.
Decision gate: Payback under 12 months and AI Readiness score 18 or above (see Appendix A.1) = real candidate to scale. Can't fill in "total current cost"? Go measure the workflow first — you don't yet understand it well enough to automate it.
Want a second set of eyes on this in your firm? The no-sell promise applies — if it isn't a fit, I'll tell you in the first ten minutes.
Book a 30-Minute Call →