API Cost Optimisation: How Engineering Teams Cut Cloud Spend by 40%

The bill arrived and it was $23,000. Not from a DDoS. Not from runaway infrastructure. From an AI API that was called once per page view, wasn't cached, and didn't have any per-user rate limiting. The feature was three weeks old.

This is the new class of cost bug. Not just AI APIs — it includes any third-party API with consumption-based pricing: mapping APIs, data enrichment, SMS/email providers, database read credits, and CDN egress. Engineering teams that treat cost as a CFO problem instead of an engineering quality problem eventually face preventable budget shocks.

This guide covers auditing current API spend, implementing cost controls in code, and testing cost assumptions in CI.

The API Cost Audit

Before optimizing, you need a clear picture of what you're spending:

flowchart TD
    A[Identify all external API dependencies] --> B
    B[Classify by cost model\nper-call / per-token / per-unit] --> C
    C[Instrument call frequency\nby feature and endpoint] --> D
    D[Calculate cost per user journey] --> E
    E{Cost per journey\nvs revenue?}
    E -->|Sustainable| F[Monitor + alert on budget]
    E -->|Unsustainable| G[Optimize call patterns]
    G --> H[Implement caching]
    G --> I[Batch & deduplicate]
    G --> J[Add rate limiting]

API Cost by Category

API Category	Typical Pricing	Cost Risk	Mitigation
OpenAI GPT-4o	~$2.50/1M input tokens	High	Caching, model selection, prompt optimization
Google Maps	$5–$7/1000 requests	Medium	Cache geocoding results (legal per ToS)
Twilio SMS	$0.0079/SMS	Low-Medium	Dedup, verify opt-ins, don't send test SMS to real numbers
SendGrid/Resend Email	$0.001/email	Low	Ensure no duplicate sends
Stripe API reads	Free	Low	Cache customer objects
AWS S3 GET	$0.0004/1000	Low	CDN in front of S3
Postgres read credits	Varies (Supabase: $0.09/million)	Medium at scale	Connection pooling, query optimization
IP Geolocation	$1–5/1000	Low-Medium	Cache by IP with short TTL

Instrumenting API Calls

You cannot manage what you cannot measure. Wrap external API calls to record cost metrics:

// lib/api-cost-tracking.ts
import { createClient } from '@/lib/supabase/server';

interface ApiCallMetrics {
  service: string;
  endpoint: string;
  userId?: string;
  inputTokens?: number;
  outputTokens?: number;
  estimatedCostUsd?: number;
  durationMs: number;
  cached: boolean;
}

export async function trackApiCall(metrics: ApiCallMetrics): Promise<void> {
  // Log to your analytics/observability platform
  // (PostHog, Segment, custom DB table, etc.)
  const supabase = await createClient();

  await supabase.from('api_cost_metrics').insert({
    service: metrics.service,
    endpoint: metrics.endpoint,
    user_id: metrics.userId,
    input_tokens: metrics.inputTokens,
    output_tokens: metrics.outputTokens,
    estimated_cost_usd: metrics.estimatedCostUsd,
    duration_ms: metrics.durationMs,
    cached: metrics.cached,
    created_at: new Date().toISOString(),
  });
}

// Wrapped OpenAI client
export async function chatCompletion(
  messages: Array<{ role: string; content: string }>,
  options: { userId?: string; cacheKey?: string; model?: string },
) {
  const model = options.model ?? 'gpt-4o-mini'; // Default to cheaper model
  const startTime = Date.now();

  // Check cache first
  if (options.cacheKey) {
    const cached = await getFromCache(options.cacheKey);
    if (cached) {
      await trackApiCall({
        service: 'openai',
        endpoint: 'chat.completions',
        userId: options.userId,
        durationMs: Date.now() - startTime,
        cached: true,
        estimatedCostUsd: 0,
      });
      return cached;
    }
  }

  const response = await openai.chat.completions.create({ model, messages });

  const usage = response.usage!;
  const costPerMillionInput = model === 'gpt-4o' ? 2.5 : 0.15; // GPT-4o vs GPT-4o-mini
  const costPerMillionOutput = model === 'gpt-4o' ? 10.0 : 0.6;
  const estimatedCost =
    (usage.prompt_tokens * costPerMillionInput + usage.completion_tokens * costPerMillionOutput) / 1_000_000;

  await trackApiCall({
    service: 'openai',
    endpoint: 'chat.completions',
    userId: options.userId,
    inputTokens: usage.prompt_tokens,
    outputTokens: usage.completion_tokens,
    estimatedCostUsd: estimatedCost,
    durationMs: Date.now() - startTime,
    cached: false,
  });

  // Cache the result
  if (options.cacheKey) {
    await setInCache(options.cacheKey, response, { ttl: 3600 });
  }

  return response;
}

Cost-Aware Testing

Write tests that assert on cost behavior, similar to how you write performance tests with latency budgets:

// tests/cost/api-calls.test.ts
import { test, expect } from '@playwright/test';

test('dashboard page load does not trigger AI API calls', async ({ page }) => {
  const aiApiCalls: string[] = [];

  // Monitor outbound requests to AI providers
  page.on('request', (req) => {
    const url = req.url();
    if (
      url.includes('api.openai.com') ||
      url.includes('anthropic.com') ||
      url.includes('generativelanguage.googleapis.com')
    ) {
      aiApiCalls.push(url);
    }
  });

  await page.goto('/dashboard');
  await page.waitForLoadState('networkidle');

  // Dashboard load should NEVER trigger AI API calls (too expensive per page view)
  expect(aiApiCalls, `Unexpected AI API calls on dashboard load: ${aiApiCalls.join(', ')}`).toHaveLength(0);
});

test('scan analysis uses cached result on repeat calls', async ({ request }) => {
  const scanId = 'test-scan-123';

  // First call — generates AI analysis
  const first = await request.post('/api/scan/analyze', {
    data: { scanId },
    headers: { Authorization: `Bearer ${process.env.TEST_TOKEN}` },
  });
  expect(first.status()).toBe(200);

  // Check if first call hit the AI API
  const firstCostHeader = first.headers()['x-api-cost-usd'];

  // Second identical call — should use cache
  const second = await request.post('/api/scan/analyze', {
    data: { scanId },
    headers: { Authorization: `Bearer ${process.env.TEST_TOKEN}` },
  });

  const secondCostHeader = second.headers()['x-api-cost-usd'];

  // Cached response should cost $0
  expect(parseFloat(secondCostHeader ?? '0')).toBe(0);

  // Content should be identical
  expect(await second.json()).toEqual(await first.json());
});

Per-User Cost Budgets

Implement hard limits to prevent any single user from generating runaway costs:

// lib/rate-limiting/api-budget.ts
import { redis } from '@/lib/redis';

const DAILY_AI_BUDGET_USD = 0.5; // Max $0.50 AI spend per user per day

export async function checkAndDeductApibudget(
  userId: string,
  estimatedCostUsd: number,
): Promise<{ allowed: boolean; remainingBudget: number }> {
  const key = `api_budget:${userId}:${new Date().toISOString().slice(0, 10)}`;

  const pipeline = redis.pipeline();
  pipeline.incrbyfloat(key, estimatedCostUsd);
  pipeline.expire(key, 86400); // Auto-expire after 24h

  const [newTotal] = (await pipeline.exec()) as [number, unknown];

  if (newTotal > DAILY_AI_BUDGET_USD) {
    // Deduct back and deny
    await redis.incrbyfloat(key, -estimatedCostUsd);
    return {
      allowed: false,
      remainingBudget: Math.max(0, DAILY_AI_BUDGET_USD - (newTotal - estimatedCostUsd)),
    };
  }

  return {
    allowed: true,
    remainingBudget: DAILY_AI_BUDGET_USD - newTotal,
  };
}

CI Budget Test in Practice

// tests/cost/budget-smoke.test.ts

test('typical user session stays within $0.01 API cost budget', async ({ page }) => {
  // Simulate a typical user session: login, browse, run one scan
  const sessionId = `test-session-${Date.now()}`;

  await page.goto('/login');
  await login(page, 'test@example.com', 'TestPassword123!');

  await page.goto('/dashboard');
  await page.click('[data-testid="new-scan-btn"]');
  await page.fill('[data-testid="scan-url"]', 'https://example.com');
  await page.click('[data-testid="start-scan"]');

  await page.waitForSelector('[data-testid="scan-complete"]', { timeout: 60_000 });

  // Query cost tracking for this session
  const costResponse = await fetch(`/api/test/session-cost/${sessionId}`);
  const { totalCostUsd } = await costResponse.json();

  console.log(`Session cost: $${totalCostUsd.toFixed(4)}`);

  // Typical user session should not exceed $0.01
  expect(totalCostUsd).toBeLessThan(0.01);
});

Cost Optimization Quick Wins

Optimization	Effort	Potential Savings
Cache AI responses for identical inputs	Low	40–80%
Downgrade model (GPT-4o → GPT-4o-mini)	Low	90%+
Batch API calls instead of per-item	Medium	30–50%
Cache geocoding results in Redis	Low	70–90%
Add per-user daily spend limits	Low	Prevents runaway spend
Alert on 2× baseline daily spend	Low	Early warning
Remove AI from non-essential features	Medium	Case-by-case
Prompt compression (remove redundant context)	Medium	20–40% token reduction

Engineering teams that treat API costs as a quality metric — with automated budgets, cost tracking, and CI tests — avoid the billing surprises that are otherwise inevitable in a consumption-based pricing world.

Monitor your production application's behavior and health continuously: Try ScanlyApp free and set up automated checks that validate your application is performing correctly and efficiently.

API Cost Optimisation: How Engineering Teams Cut Cloud Spend by 40%

API Cost Optimisation: How Engineering Teams Cut Cloud Spend by 40%

The API Cost Audit

API Cost by Category

Instrumenting API Calls

Cost-Aware Testing

Per-User Cost Budgets

CI Budget Test in Practice

Cost Optimization Quick Wins

Related Posts

Chaos Engineering: Break Your System on Purpose Before Your Users Do It for You

Webhook Testing: How to Guarantee Delivery, Retry Logic, and Correct Event Ordering

Testing Helm Charts: Catch Kubernetes Configuration Bugs Before They Reach Production