Tutorials AI & LLM Engineering for .NET Architects
LLM Cost Estimation: Token accounting and budget strategies
On this page
AI Unit Economics
Cloud compute is cheap. AI compute is Expensive. A single GPT-4 call can cost $0.03. If you have 1 million users, that's $30,000 per request. You must be an architect of costs as much as code.
1. Token Accounting
Always track tokens programmatically. Use libraries like Tiktoken to calculate the token count *before* sending the request. If a user tries to upload a 500-page PDF, block it at the gateway to prevent a $50 bill for a single call.
2. The "Model Tier" Strategy
Don't use GPT-4 for everything.
- GPT-3.5 / Llama-3-8B ($): For simple summarization, classification, or formatting.
- GPT-4 / Claude 3 Opus ($$$): For complex reasoning, coding, and multi-step math.
4. Interview Mastery
Q: "What is 'Prompt Caching' and how does it save money?"
Architect Answer: "In a RAG system, you often send the same 10,000-word 'Knowledge Base' with every request. Prompt caching (available in Azure OpenAI) allows the provider to 'Keep' that prefix in memory. You only pay for the full 10,000 tokens once; subsequent calls only pay for the 'New' user message. This can reduce your AI bill by up to **90%** for repetitive enterprise tasks."