API Cost Optimisation: How Engineering Teams Cut Cloud Spend by 40%
The bill arrived and it was $23,000. Not from a DDoS. Not from runaway infrastructure. From an AI API that was called once per page view, wasn't cached, and didn't have any per-user rate limiting. The feature was three weeks old.
This is the new class of cost bug. Not just AI APIs — it includes any third-party API with consumption-based pricing: mapping APIs, data enrichment, SMS/email providers, database read credits, and CDN egress. Engineering teams that treat cost as a CFO problem instead of an engineering quality problem eventually face preventable budget shocks.
This guide covers auditing current API spend, implementing cost controls in code, and testing cost assumptions in CI.
The API Cost Audit
Before optimizing, you need a clear picture of what you're spending:
flowchart TD
A[Identify all external API dependencies] --> B
B[Classify by cost model\nper-call / per-token / per-unit] --> C
C[Instrument call frequency\nby feature and endpoint] --> D
D[Calculate cost per user journey] --> E
E{Cost per journey\nvs revenue?}
E -->|Sustainable| F[Monitor + alert on budget]
E -->|Unsustainable| G[Optimize call patterns]
G --> H[Implement caching]
G --> I[Batch & deduplicate]
G --> J[Add rate limiting]
API Cost by Category
| API Category | Typical Pricing | Cost Risk | Mitigation |
|---|---|---|---|
| OpenAI GPT-4o | ~$2.50/1M input tokens | High | Caching, model selection, prompt optimization |
| Google Maps | $5–$7/1000 requests | Medium | Cache geocoding results (legal per ToS) |
| Twilio SMS | $0.0079/SMS | Low-Medium | Dedup, verify opt-ins, don't send test SMS to real numbers |
| SendGrid/Resend Email | $0.001/email | Low | Ensure no duplicate sends |
| Stripe API reads | Free | Low | Cache customer objects |
| AWS S3 GET | $0.0004/1000 | Low | CDN in front of S3 |
| Postgres read credits | Varies (Supabase: $0.09/million) | Medium at scale | Connection pooling, query optimization |
| IP Geolocation | $1–5/1000 | Low-Medium | Cache by IP with short TTL |
Instrumenting API Calls
You cannot manage what you cannot measure. Wrap external API calls to record cost metrics:
// lib/api-cost-tracking.ts
import { createClient } from '@/lib/supabase/server';
interface ApiCallMetrics {
service: string;
endpoint: string;
userId?: string;
inputTokens?: number;
outputTokens?: number;
estimatedCostUsd?: number;
durationMs: number;
cached: boolean;
}
export async function trackApiCall(metrics: ApiCallMetrics): Promise<void> {
// Log to your analytics/observability platform
// (PostHog, Segment, custom DB table, etc.)
const supabase = await createClient();
await supabase.from('api_cost_metrics').insert({
service: metrics.service,
endpoint: metrics.endpoint,
user_id: metrics.userId,
input_tokens: metrics.inputTokens,
output_tokens: metrics.outputTokens,
estimated_cost_usd: metrics.estimatedCostUsd,
duration_ms: metrics.durationMs,
cached: metrics.cached,
created_at: new Date().toISOString(),
});
}
// Wrapped OpenAI client
export async function chatCompletion(
messages: Array<{ role: string; content: string }>,
options: { userId?: string; cacheKey?: string; model?: string },
) {
const model = options.model ?? 'gpt-4o-mini'; // Default to cheaper model
const startTime = Date.now();
// Check cache first
if (options.cacheKey) {
const cached = await getFromCache(options.cacheKey);
if (cached) {
await trackApiCall({
service: 'openai',
endpoint: 'chat.completions',
userId: options.userId,
durationMs: Date.now() - startTime,
cached: true,
estimatedCostUsd: 0,
});
return cached;
}
}
const response = await openai.chat.completions.create({ model, messages });
const usage = response.usage!;
const costPerMillionInput = model === 'gpt-4o' ? 2.5 : 0.15; // GPT-4o vs GPT-4o-mini
const costPerMillionOutput = model === 'gpt-4o' ? 10.0 : 0.6;
const estimatedCost =
(usage.prompt_tokens * costPerMillionInput + usage.completion_tokens * costPerMillionOutput) / 1_000_000;
await trackApiCall({
service: 'openai',
endpoint: 'chat.completions',
userId: options.userId,
inputTokens: usage.prompt_tokens,
outputTokens: usage.completion_tokens,
estimatedCostUsd: estimatedCost,
durationMs: Date.now() - startTime,
cached: false,
});
// Cache the result
if (options.cacheKey) {
await setInCache(options.cacheKey, response, { ttl: 3600 });
}
return response;
}
Cost-Aware Testing
Write tests that assert on cost behavior, similar to how you write performance tests with latency budgets:
// tests/cost/api-calls.test.ts
import { test, expect } from '@playwright/test';
test('dashboard page load does not trigger AI API calls', async ({ page }) => {
const aiApiCalls: string[] = [];
// Monitor outbound requests to AI providers
page.on('request', (req) => {
const url = req.url();
if (
url.includes('api.openai.com') ||
url.includes('anthropic.com') ||
url.includes('generativelanguage.googleapis.com')
) {
aiApiCalls.push(url);
}
});
await page.goto('/dashboard');
await page.waitForLoadState('networkidle');
// Dashboard load should NEVER trigger AI API calls (too expensive per page view)
expect(aiApiCalls, `Unexpected AI API calls on dashboard load: ${aiApiCalls.join(', ')}`).toHaveLength(0);
});
test('scan analysis uses cached result on repeat calls', async ({ request }) => {
const scanId = 'test-scan-123';
// First call — generates AI analysis
const first = await request.post('/api/scan/analyze', {
data: { scanId },
headers: { Authorization: `Bearer ${process.env.TEST_TOKEN}` },
});
expect(first.status()).toBe(200);
// Check if first call hit the AI API
const firstCostHeader = first.headers()['x-api-cost-usd'];
// Second identical call — should use cache
const second = await request.post('/api/scan/analyze', {
data: { scanId },
headers: { Authorization: `Bearer ${process.env.TEST_TOKEN}` },
});
const secondCostHeader = second.headers()['x-api-cost-usd'];
// Cached response should cost $0
expect(parseFloat(secondCostHeader ?? '0')).toBe(0);
// Content should be identical
expect(await second.json()).toEqual(await first.json());
});
Per-User Cost Budgets
Implement hard limits to prevent any single user from generating runaway costs:
// lib/rate-limiting/api-budget.ts
import { redis } from '@/lib/redis';
const DAILY_AI_BUDGET_USD = 0.5; // Max $0.50 AI spend per user per day
export async function checkAndDeductApibudget(
userId: string,
estimatedCostUsd: number,
): Promise<{ allowed: boolean; remainingBudget: number }> {
const key = `api_budget:${userId}:${new Date().toISOString().slice(0, 10)}`;
const pipeline = redis.pipeline();
pipeline.incrbyfloat(key, estimatedCostUsd);
pipeline.expire(key, 86400); // Auto-expire after 24h
const [newTotal] = (await pipeline.exec()) as [number, unknown];
if (newTotal > DAILY_AI_BUDGET_USD) {
// Deduct back and deny
await redis.incrbyfloat(key, -estimatedCostUsd);
return {
allowed: false,
remainingBudget: Math.max(0, DAILY_AI_BUDGET_USD - (newTotal - estimatedCostUsd)),
};
}
return {
allowed: true,
remainingBudget: DAILY_AI_BUDGET_USD - newTotal,
};
}
CI Budget Test in Practice
// tests/cost/budget-smoke.test.ts
test('typical user session stays within $0.01 API cost budget', async ({ page }) => {
// Simulate a typical user session: login, browse, run one scan
const sessionId = `test-session-${Date.now()}`;
await page.goto('/login');
await login(page, 'test@example.com', 'TestPassword123!');
await page.goto('/dashboard');
await page.click('[data-testid="new-scan-btn"]');
await page.fill('[data-testid="scan-url"]', 'https://example.com');
await page.click('[data-testid="start-scan"]');
await page.waitForSelector('[data-testid="scan-complete"]', { timeout: 60_000 });
// Query cost tracking for this session
const costResponse = await fetch(`/api/test/session-cost/${sessionId}`);
const { totalCostUsd } = await costResponse.json();
console.log(`Session cost: $${totalCostUsd.toFixed(4)}`);
// Typical user session should not exceed $0.01
expect(totalCostUsd).toBeLessThan(0.01);
});
Related articles: Also see optimising CI/CD pipelines to reduce the API calls that drive costs, observability tooling to track API usage and surface cost anomalies, and aligning cost optimisation with SLOs and error budget policies.
Cost Optimization Quick Wins
| Optimization | Effort | Potential Savings |
|---|---|---|
| Cache AI responses for identical inputs | Low | 40–80% |
| Downgrade model (GPT-4o → GPT-4o-mini) | Low | 90%+ |
| Batch API calls instead of per-item | Medium | 30–50% |
| Cache geocoding results in Redis | Low | 70–90% |
| Add per-user daily spend limits | Low | Prevents runaway spend |
| Alert on 2× baseline daily spend | Low | Early warning |
| Remove AI from non-essential features | Medium | Case-by-case |
| Prompt compression (remove redundant context) | Medium | 20–40% token reduction |
Engineering teams that treat API costs as a quality metric — with automated budgets, cost tracking, and CI tests — avoid the billing surprises that are otherwise inevitable in a consumption-based pricing world.
Monitor your production application's behavior and health continuously: Try ScanlyApp free and set up automated checks that validate your application is performing correctly and efficiently.
