Back to Blog

Database Locks and Deadlocks: How to Diagnose the Silent Killer of API Performance

Deadlocks and lock contention are among the hardest bugs to reproduce in testing environments — they require concurrent access, specific data states, and timing precision. This guide explains how to detect, reproduce, and prevent database locking issues before they hit production.

Published

7 min read

Reading time

Database Locks and Deadlocks: How to Diagnose the Silent Killer of API Performance

The checkout flow worked perfectly in your staging environment. Thousands of automated test runs, never a failure. Then you released to production and started seeing sporadic 500 errors with the message: ERROR: deadlock detected.

Deadlocks and lock contention failures are the quintessential concurrency bugs: you cannot reproduce them single-threaded, they depend on precise timing, and they only emerge at production volumes. Yet they are entirely testable once you know the patterns.

This guide focuses on PostgreSQL, which is the database underlying most modern SaaS platforms (including Supabase), but the concepts apply broadly.


What Is a Deadlock?

A deadlock occurs when two or more transactions each hold a lock that the other transaction needs, and none can proceed:

sequenceDiagram
    participant T1 as Transaction 1
    participant A as Row A
    participant B as Row B
    participant T2 as Transaction 2

    T1->>A: LOCK (acquired)
    T2->>B: LOCK (acquired)
    T1->>B: WAITING for lock...
    T2->>A: WAITING for lock...
    Note over T1,T2: 💀 DEADLOCK<br/>PostgreSQL kills one transaction

PostgreSQL detects this cycle and terminates one of the transactions with an error. The loser transaction must be retried. If your application does not handle this retry, the user sees a 500.


Lock Types in PostgreSQL

Not all locks conflict with each other. Understanding the lock matrix is essential for writing correct concurrent queries:

Lock Mode Conflicts With
ACCESS SHARE (SELECT) ACCESS EXCLUSIVE only
ROW SHARE (SELECT FOR UPDATE) EXCLUSIVE, ACCESS EXCLUSIVE
ROW EXCLUSIVE (INSERT/UPDATE/DELETE) SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, ACCESS EXCLUSIVE
SHARE (CREATE INDEX) ROW EXCLUSIVE, SHARE ROW EXCLUSIVE, EXCLUSIVE
EXCLUSIVE Everything except ACCESS SHARE
ACCESS EXCLUSIVE (ALTER TABLE, DROP) Everything

The practical implication: SELECT FOR UPDATE + concurrent UPDATE = contention. ALTER TABLE = full table lock.


Common Deadlock Patterns in Application Code

Pattern 1: Inconsistent Update Order

The most common deadlock in application code: two transactions updating the same rows in different orders:

// ❌ DEADLOCK RISK: Two paths update rows in different orders

// Handler A: processes order first, then inventory
async function handleOrderA(orderId: string) {
  await db.transaction(async (trx) => {
    await trx('orders').where({ id: orderId }).update({ status: 'processing' });
    await trx('inventory').where({ product_id: '123' }).decrement('quantity', 1);
  });
}

// Handler B: updates inventory first, then order
async function handleOrderB(orderId: string) {
  await db.transaction(async (trx) => {
    await trx('inventory').where({ product_id: '123' }).decrement('quantity', 1);
    await trx('orders').where({ id: orderId }).update({ status: 'confirmed' });
  });
}

Fix: Always update tables/rows in the same canonical order throughout your codebase:

// ✅ Consistent order: inventory first, then orders — always
async function handleOrderConsistent(orderId: string) {
  await db.transaction(async (trx) => {
    // Always acquire locks in alphabetical or defined canonical order
    await trx('inventory').where({ product_id: '123' }).decrement('quantity', 1);
    await trx('orders').where({ id: orderId }).update({ status: 'processing' });
  });
}

Pattern 2: Missing Index on FK with Concurrent Deletes

-- Parent-child delete without index on FK = table lock cascade
DELETE FROM orders WHERE user_id = $1;

-- If order_items.order_id has no index, PostgreSQL must scan
-- the entire order_items table to find related rows
-- → full table lock during delete
-- Fix: CREATE INDEX CONCURRENTLY ON order_items(order_id);

Testing for Deadlocks

Concurrent Request Simulation with Playwright

// tests/concurrency/deadlock.test.ts
import { test, expect } from '@playwright/test';

test('concurrent checkout does not deadlock', async ({ browser }) => {
  // Create two independent contexts (two browser sessions)
  const contextA = await browser.newContext();
  const contextB = await browser.newContext();

  const pageA = await contextA.newPage();
  const pageB = await contextB.newPage();

  // Authenticate both sessions
  await authenticateAs(pageA, 'user-a@test.com');
  await authenticateAs(pageB, 'user-b@test.com');

  // Navigate both to the same limited-stock product
  await Promise.all([pageA.goto('/products/limited-item'), pageB.goto('/products/limited-item')]);

  // Both attempt to add to cart simultaneously
  await Promise.all([pageA.click('[data-testid="add-to-cart"]'), pageB.click('[data-testid="add-to-cart"]')]);

  // Both attempt checkout simultaneously
  const [responseA, responseB] = await Promise.all([
    pageA.waitForResponse('/api/checkout/initiate'),
    pageB.waitForResponse('/api/checkout/initiate'),
  ]);

  // Both should succeed or at most one should fail gracefully (not 500)
  const statuses = [responseA.status(), responseB.status()];

  // At least one should succeed
  expect(statuses).toContain(200);

  // Neither should return a 500 (unhandled deadlock)
  expect(statuses).not.toContain(500);

  await contextA.close();
  await contextB.close();
});

API-Level Concurrent Request Test

// tests/concurrency/api-concurrent.test.ts
test('inventory decrement is safe under concurrent requests', async ({ request }) => {
  const productId = 'test-product-with-qty-1';

  // Reset inventory to exactly 1 unit
  await request.post('/api/test/reset-inventory', {
    data: { productId, quantity: 1 },
  });

  // Fire 10 concurrent requests to purchase the single item
  const responses = await Promise.all(
    Array.from({ length: 10 }, () =>
      request.post('/api/cart/purchase', {
        data: { productId, quantity: 1 },
      }),
    ),
  );

  const statusCodes = responses.map((r) => r.status());
  const successes = statusCodes.filter((s) => s === 200).length;
  const expectedFailures = statusCodes.filter((s) => s === 409).length; // Out of stock
  const serverErrors = statusCodes.filter((s) => s >= 500).length;

  // Exactly 1 should succeed (only 1 unit in stock)
  expect(successes).toBe(1);

  // The rest should get a graceful 409, not a 500
  expect(serverErrors).toBe(0);
  expect(expectedFailures).toBe(9);
});

PostgreSQL Deadlock Diagnostics

When a deadlock occurs in production, PostgreSQL logs it. Here's how to enable and parse those logs:

-- Enable deadlock logging in postgresql.conf
-- (or via Supabase dashboard → Database → Connection Pool → Logs)
log_lock_waits = on
deadlock_timeout = 500ms  -- Log any lock wait > 500ms

-- Query current locks and what's blocking what
SELECT
  blocked_locks.pid AS blocked_pid,
  blocking_locks.pid AS blocking_pid,
  blocked_activity.query AS blocked_query,
  blocking_activity.query AS blocking_query
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_locks blocking_locks
  ON blocking_locks.locktype = blocked_locks.locktype
  AND blocking_locks.relation = blocked_locks.relation
  AND blocking_locks.granted
  AND NOT blocked_locks.granted
JOIN pg_catalog.pg_stat_activity blocked_activity
  ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity
  ON blocking_activity.pid = blocking_locks.pid;

Connection Pool Configuration

Connection pool exhaustion produces errors that look like database errors but are actually infrastructure issues:

// Recommended Supabase/pg connection pool configuration
// server/db.ts

import { Pool } from 'pg';

const pool = new Pool({
  connectionString: process.env.DATABASE_URL,

  // Max connections this service will open
  // Typically: (CPU cores × 2) + active disk spindles
  // For a 2-core server: max 8-10 connections per instance
  max: 10,

  // Kill idle connections after 10s to free pool slots
  idleTimeoutMillis: 10_000,

  // Fail fast if connection cannot be obtained in 5s
  // (better to fail immediately than have users waiting)
  connectionTimeoutMillis: 5_000,

  // Retry queries on transient errors (including deadlocks)
  // Pool-level retry is not built-in; use application-level retry below
});

// Application-level retry wrapper for deadlock-safe queries
export async function withRetry<T>(fn: () => Promise<T>, maxRetries = 3, delayMs = 100): Promise<T> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await fn();
    } catch (err: unknown) {
      const isDeadlock = (err as Error).message?.includes('deadlock detected');
      const isRetryable = (err as Error).message?.includes('could not serialize');

      if ((isDeadlock || isRetryable) && attempt < maxRetries - 1) {
        await new Promise((resolve) => setTimeout(resolve, delayMs * 2 ** attempt));
        continue;
      }
      throw err;
    }
  }
  throw new Error('Unreachable');
}

Related articles: Also see the full database performance optimisation checklist beyond lock management, testing practices that catch concurrency issues before production, and distributed transaction challenges in microservices that cause deadlocks.


Prevention Checklist

Category Action Priority
Query order Canonical lock acquisition order 🔴 Critical
Indexes FK indexes on all foreign keys 🔴 Critical
Transactions Keep transactions short-lived 🔴 Critical
Retry logic Handle deadlock errors + exponential backoff 🔴 Critical
DDL changes Run schema changes in maintenance windows 🟡 High
SELECT FOR UPDATE Use skip locked for queue patterns 🟡 High
Monitoring Log and alert on lock waits > 1s 🟡 High

Deadlocks and lock contention are reproducible and testable when you instrument your staging database correctly and run concurrent load tests.

Test your database layer under concurrent load: Try ScanlyApp free and validate your API endpoints stay healthy under simultaneous user sessions.

Related Posts

Testing CDN Caching Rules and Cache Invalidation: A Developer's Guide
Performance & Scalability
6 min read

Testing CDN Caching Rules and Cache Invalidation: A Developer's Guide

CDN misconfiguration is one of the hardest bugs to catch in QA — it works perfectly in staging (which bypasses the CDN) but fails in production. Learn how to test cache headers, validate invalidation logic, and build automated checks that keep your caching layer honest.