Chapter 22 · companion worksheet

AI unit-economics worksheet

You don't have a business case until you can fill this in at production volume, with the TokenOps levers applied. The proof-of-concept ran on forty dollars. Production is a different animal.

The workflow: ______

Estimated monthly volume (transactions / documents / tasks): ______

Section 1 — What it costs today (per month)

Cost line	How to calculate it	Your number
(a) Labor — people-hours on this workflow × fully loaded cost/hr	_______ hrs × $_______ /hr	$
(b) Error and rework cost	Estimate re-do rate × labor above, or known error cost	$
(c) Delay or opportunity cost	Late fees, lost throughput, or cost of the queue	$
Total current cost / month = (a) + (b) + (c)		$

Section 2 — What the AI version costs (per month at production volume)

Apply the four TokenOps levers before you enter inference cost. Entering the demo price here is the most common error.

TokenOps lever worksheet

Lever	Applies here? (Y/N)	Estimated savings or note
Caching — mark stable system instructions / reference docs as cacheable. Cached reads cost roughly 90% less than standard input (per Anthropic pricing; AWS Bedrock announced same 90% figure Dec 2024).
Batch processing — for work where "by tomorrow morning" is fine: flat 50% discount across major providers (Anthropic, OpenAI, Google Gemini). Stacks with caching multiplicatively.
Model routing — send the easy majority to a cheaper small model; reserve the frontier model for genuinely hard requests. RouteLLM research (UC Berkeley / Anyscale, ICLR 2025) held 95% of GPT-4 quality while routing only 14–26% of requests to the premium model, cutting per-query cost up to 85%.
Context discipline — don't stuff the window. Send only the relevant section, not the whole document. Every unnecessary token costs money on every call, forever.

AI run-cost table (after applying levers above)

Cost line	Your number
Inference / token cost at production volume (levers applied)	$
Software / platform / seat licenses	$
Human-in-the-loop review time (hrs/month × fully loaded rate)	$
Maintenance + quarterly re-validation (amortized monthly)	$
Total run cost / month	$

Section 3 — The honest answer

Line	Amount
Monthly saving = total current cost − total run cost	$
One-time build / setup cost	$
Payback (months) = build cost ÷ monthly saving	_______ months
Cost per task (run cost ÷ monthly volume)	$_______ per task

Section 4 — Spike risk check

The planned workload is predictable. Accidents blow up the bill — a retry loop, a workflow that re-sends the full context on every turn, a batch job pointed at the wrong model.

☐ We have a model gateway (e.g., LiteLLM, OpenRouter) that routes calls, enforces budgets, and logs every token.
☐ We have spike alerts that catch a runaway loop the afternoon it happens, not at month-end.
☐ We have per-team or per-workflow token budgets so one experiment can't consume the whole company's allowance.

Decision gate: Payback under 12 months and AI Readiness score 18 or above (see Appendix A.1) = real candidate to scale. Can't fill in "total current cost"? Go measure the workflow first — you don't yet understand it well enough to automate it.

Want a second set of eyes on this in your firm? The no-sell promise applies — if it isn't a fit, I'll tell you in the first ten minutes.

Book a 30-Minute Call →