How Are You Tracking AI API Costs in Your SaaS?
Practical patterns for tracking and optimizing AI API costs in SaaS applications -- per-user attribution, budgets, and alerts.
AI-powered SaaS applications have a cost structure that traditional SaaS does not -- every user action that triggers an LLM call or a data API request costs real money. Unlike compute costs that scale predictably with infrastructure, AI API costs scale with usage patterns that are hard to forecast. One power user running complex queries can consume more API budget than a hundred casual users.
Tracking and optimizing these costs is not optional once you pass the prototype stage. Here are the patterns that work in production.
Instrument Every API Call
The foundation of cost tracking is per-request instrumentation. Every call to an LLM or external data API should log the cost, latency, and context. At minimum, track:
- Which user or tenant triggered the request
- Which feature or workflow the request belongs to
- The model or API endpoint called
- Input and output token counts for LLM calls
- Credit cost for data API calls
- Response latency
interface ApiCostEvent {
userId: string;
tenantId: string;
feature: string;
provider: "openai" | "anthropic" | "scavio";
endpoint: string;
inputTokens?: number;
outputTokens?: number;
credits?: number;
costUsd: number;
latencyMs: number;
timestamp: Date;
}
async function trackApiCost(event: ApiCostEvent) {
await db.apiCosts.insert(event);
await metrics.increment("api.cost.total", event.costUsd, {
provider: event.provider,
feature: event.feature,
});
}Attribute Costs to Features, Not Just Users
Per-user cost tracking tells you who is expensive. Per-feature cost tracking tells you what is expensive. The second is more actionable. When you know that your "competitive analysis" feature costs $0.45 per use and your "product search" feature costs $0.03, you can make informed decisions about pricing tiers and feature gating.
Tag every API call with the feature that triggered it. This lets you build dashboards that answer the questions that matter: which features are cost-effective, which need optimization, and which should be gated behind higher pricing tiers.
Set Budget Alerts, Not Just Limits
Hard rate limits frustrate users. Budget alerts let you respond before costs become a problem. Implement a three-tier alerting system:
- Warning -- at 70% of projected monthly budget. Review usage patterns and identify anomalies
- Critical -- at 90% of budget. Investigate high-cost users or features and consider temporary throttling
- Hard limit -- at 110% of budget. Degrade gracefully by switching to cached results or cheaper model tiers
Optimize the Data Layer First
In most AI SaaS applications, the data acquisition layer -- search APIs, web scraping, document retrieval -- is a larger cost driver than model inference. Two strategies reduce data costs significantly:
Response caching -- Cache search results for queries that are likely to repeat. A 15-minute cache on popular searches can cut data API costs by 30-50%.
async function searchWithCache(query: string, platform: string) {
const cacheKey = `search:${platform}:${query}`;
const cached = await redis.get(cacheKey);
if (cached) return JSON.parse(cached);
const response = await fetch("https://api.scavio.dev/api/v1/search", {
method: "POST",
headers: {
"x-api-key": process.env.SCAVIO_API_KEY!,
"Content-Type": "application/json",
},
body: JSON.stringify({ platform, query }),
});
const data = await response.json();
await redis.setex(cacheKey, 900, JSON.stringify(data));
return data;
}Request deduplication -- If multiple users trigger the same search within a short window, collapse them into a single API call.
Build a Cost Dashboard Early
Do not wait until your API bill surprises you. Build a cost dashboard during your first month of production traffic. It does not need to be fancy -- a daily summary of spend by provider, feature, and tenant is enough to catch problems early.
The teams that manage AI API costs well share one trait: they treat API spend as a first-class metric, right alongside revenue, churn, and latency. Track it from day one, attribute it to features, and optimize the expensive parts before they become unmanageable.