Back to Blog

Testing in Production Safely: 6 Techniques That Will Not Cost You Customer Trust

Learn how to safely test in production with feature flags progressive delivery, canary deployments, synthetic monitoring, and chaos engineering techniques.

Michael Chen

Senior Test Architect

Published

9 min read

Reading time

Testing in Production Safely: 6 Techniques That Will Not Cost You Customer Trust

"We don't test in production." Every QA team says it. But here's the uncomfortable truth: you're already testing in production—you're just not doing it intentionally, safely, or measurably.

No staging environment perfectly replicates production load, data diversity, or edge cases. The moment you deploy, you're running an experiment on real users. The question isn't whether to test in production, but how to do it safely and effectively.

This guide covers modern strategies for testing in production: feature flags, progressive delivery, canary deployments, synthetic monitoring, and controlled chaos—all designed to catch issues before they impact your entire user base.

Table of Contents

  1. Why Test in Production?
  2. Feature Flags for Safe Testing
  3. Progressive Delivery Strategies
  4. Canary Deployments
  5. A/B Testing for Quality
  6. Synthetic Monitoring
  7. Chaos Engineering in Production
  8. Real User Monitoring
  9. Rollback Strategies
  10. Best Practices

Why Test in Production?

Limitations of Staging Environments

Issue Staging Reality Production Reality Testing Gap
Data volume 1,000 records 10,000,000 records ❌ Missing scale issues
User behavior QA scripts Unpredictable patterns ❌ Missing edge cases
Load Minimal 10,000 req/sec ❌ Missing performance issues
Integrations Mocked/stubbed Real third-parties ❌ Missing integration failures
Network Reliable LAN Global, variable latency ❌ Missing network issues

The Case for Intentional Production Testing

Catch real-world edge cases: Actual user behavior reveals bugs QA never imagined
Validate at scale: True performance only visible with production load
Verify third-party integrations: Staging mocks don't catch API changes
Test with real data: Data diversity exposes validation issues
Reduce risk: Gradual rollouts limit blast radius

Feature Flags for Safe Testing

Basic Feature Flag Implementation

// lib/feature-flags.ts
export interface FeatureFlags {
  newCheckoutFlow: boolean;
  enhancedSearch: boolean;
  aiRecommendations: boolean;
}

export class FeatureFlagService {
  private flags: Map<string, boolean> = new Map();

  constructor(private userId?: string) {}

  async initialize() {
    // Fetch flags from config service
    const response = await fetch('/api/feature-flags', {
      headers: {
        'X-User-ID': this.userId || '',
      },
    });

    const flags = await response.json();

    Object.entries(flags).forEach(([key, value]) => {
      this.flags.set(key, value as boolean);
    });
  }

  isEnabled(flag: keyof FeatureFlags): boolean {
    return this.flags.get(flag) ?? false;
  }

  // Testing helper: force enable flag
  forceEnable(flag: keyof FeatureFlags) {
    this.flags.set(flag, true);
  }
}

// Usage in application
const featureFlags = new FeatureFlagService(currentUser.id);
await featureFlags.initialize();

if (featureFlags.isEnabled('newCheckoutFlow')) {
  return <NewCheckoutFlow />;
} else {
  return <LegacyCheckoutFlow />;
}

Testing Feature Flags with Playwright

// tests/feature-flags.spec.ts
import { test, expect } from '@playwright/test';

test.describe('Feature Flag Testing', () => {
  test('new checkout flow - flag enabled', async ({ page, context }) => {
    // Enable feature flag via cookie
    await context.addCookies([
      {
        name: 'feature_new_checkout',
        value: 'true',
        domain: 'localhost',
        path: '/',
      },
    ]);

    await page.goto('/checkout');

    // Verify new flow is active
    await expect(page.locator('[data-test="new-checkout-flow"]')).toBeVisible();
    await expect(page.locator('[data-test="legacy-checkout"]')).not.toBeVisible();
  });

  test('legacy checkout flow - flag disabled', async ({ page }) => {
    await page.goto('/checkout');

    // Verify legacy flow is active
    await expect(page.locator('[data-test="legacy-checkout"]')).toBeVisible();
    await expect(page.locator('[data-test="new-checkout-flow"]')).not.toBeVisible();
  });

  test('feature flag toggle works in real-time', async ({ page, context }) => {
    await page.goto('/dashboard');

    // Initially disabled
    await expect(page.locator('[data-test="ai-recommendations"]')).not.toBeVisible();

    // Enable via browser console (simulating hot-reload)
    await page.evaluate(() => {
      localStorage.setItem('feature_ai_recommendations', 'true');
      window.dispatchEvent(new Event('feature-flags-updated'));
    });

    await page.waitForTimeout(500);

    // Now visible
    await expect(page.locator('[data-test="ai-recommendations"]')).toBeVisible();
  });
});

Percentage-Based Rollouts

// lib/progressive-rollout.ts
export class ProgressiveRollout {
  /**
   * Determine if a feature should be enabled for a given user
   * @param userId - Unique user identifier
   * @param rolloutPercentage - Percentage of users to enable (0-100)
   * @param featureName - Feature identifier for consistent hashing
   */
  isEnabledForUser(userId: string, rolloutPercentage: number, featureName: string): boolean {
    if (rolloutPercentage >= 100) return true;
    if (rolloutPercentage <= 0) return false;

    // Consistent hashing: same user always gets same result
    const hash = this.hashCode(`${userId}:${featureName}`);
    const bucket = Math.abs(hash % 100);

    return bucket < rolloutPercentage;
  }

  private hashCode(str: string): number {
    let hash = 0;
    for (let i = 0; i < str.length; i++) {
      const char = str.charCodeAt(i);
      hash = (hash << 5) - hash + char;
      hash = hash & hash; // Convert to 32-bit integer
    }
    return hash;
  }
}

// Usage
const rollout = new ProgressiveRollout();

// 10% rollout
if (rollout.isEnabledForUser(user.id, 10, 'new-dashboard')) {
  // User is in 10% group
}

// Testing: verify consistent assignment
test('users consistently assigned to rollout groups', () => {
  const rollout = new ProgressiveRollout();
  const userId = 'user-123';

  const result1 = rollout.isEnabledForUser(userId, 50, 'feature-x');
  const result2 = rollout.isEnabledForUser(userId, 50, 'feature-x');

  // Same user should always get same result
  expect(result1).toBe(result2);
});

test('rollout percentage is approximately correct', () => {
  const rollout = new ProgressiveRollout();
  const testUsers = Array.from({ length: 10000 }, (_, i) => `user-${i}`);

  const enabledCount = testUsers.filter((userId) => rollout.isEnabledForUser(userId, 25, 'test-feature')).length;

  const actualPercentage = (enabledCount / testUsers.length) * 100;

  // Should be close to 25% (within 2% margin)
  expect(actualPercentage).toBeGreaterThan(23);
  expect(actualPercentage).toBeLessThan(27);
});

Progressive Delivery Strategies

Ring Deployment Structure

graph TB
    A[New Feature] --> B[Ring 0: Internal<br/>5 minutes]
    B --> C[Ring 1: Beta Users<br/>1 hour]
    C --> D[Ring 2: 10% Users<br/>6 hours]
    D --> E[Ring 3: 50% Users<br/>24 hours]
    E --> F[Ring 4: 100% Users<br/>Full rollout]

    B -.-> G[Monitor: Errors, Latency]
    C -.-> G
    D -.-> G
    E -.-> G

    G -->|Issues Detected| H[Auto-Rollback]

    style F fill:#90EE90
    style H fill:#FF6B6B

Automated Progressive Rollout

// lib/progressive-delivery.ts
interface RolloutStage {
  name: string;
  percentage: number;
  duration: number; // minutes
  successCriteria: {
    maxErrorRate: number;
    maxLatencyP95: number;
    minSuccessRate: number;
  };
}

export class ProgressiveDeliveryController {
  private stages: RolloutStage[] = [
    {
      name: 'Internal',
      percentage: 0,
      duration: 5,
      successCriteria: {
        maxErrorRate: 0.01,
        maxLatencyP95: 500,
        minSuccessRate: 0.99,
      },
    },
    {
      name: 'Beta',
      percentage: 1,
      duration: 60,
      successCriteria: {
        maxErrorRate: 0.005,
        maxLatencyP95: 400,
        minSuccessRate: 0.995,
      },
    },
    {
      name: 'Small',
      percentage: 10,
      duration: 360,
      successCriteria: {
        maxErrorRate: 0.003,
        maxLatencyP95: 350,
        minSuccessRate: 0.997,
      },
    },
    {
      name: 'Large',
      percentage: 50,
      duration: 1440,
      successCriteria: {
        maxErrorRate: 0.002,
        maxLatencyP95: 300,
        minSuccessRate: 0.998,
      },
    },
    {
      name: 'Full',
      percentage: 100,
      duration: 0,
      successCriteria: {
        maxErrorRate: 0.001,
        maxLatencyP95: 250,
        minSuccessRate: 0.999,
      },
    },
  ];

  async executeRollout(featureName: string): Promise<void> {
    for (const stage of this.stages) {
      console.log(`🚀 Starting ${stage.name} rollout (${stage.percentage}%)`);

      // Update feature flag percentage
      await this.updateFeatureFlag(featureName, stage.percentage);

      // Wait for stage duration
      await this.sleep(stage.duration * 60 * 1000);

      // Check metrics
      const metrics = await this.getMetrics(featureName);

      if (!this.meetsSuccessCriteria(metrics, stage.successCriteria)) {
        console.error(`❌ ${stage.name} stage failed criteria. Rolling back.`);
        await this.rollback(featureName);
        throw new Error(`Rollout failed at ${stage.name} stage`);
      }

      console.log(`✅ ${stage.name} stage passed`);
    }

    console.log(`🎉 Full rollout complete for ${featureName}`);
  }

  private meetsSuccessCriteria(metrics: any, criteria: RolloutStage['successCriteria']): boolean {
    return (
      metrics.errorRate <= criteria.maxErrorRate &&
      metrics.latencyP95 <= criteria.maxLatencyP95 &&
      metrics.successRate >= criteria.minSuccessRate
    );
  }

  private async updateFeatureFlag(feature: string, percentage: number) {
    await fetch('/api/admin/feature-flags', {
      method: 'PATCH',
      body: JSON.stringify({ feature, percentage }),
    });
  }

  private async getMetrics(feature: string) {
    const response = await fetch(`/api/metrics?feature=${feature}`);
    return await response.json();
  }

  private async rollback(feature: string) {
    await this.updateFeatureFlag(feature, 0);
  }

  private sleep(ms: number): Promise<void> {
    return new Promise((resolve) => setTimeout(resolve, ms));
  }
}

Canary Deployments

Canary Testing with Health Checks

// tests/canary.spec.ts
test.describe('Canary Deployment Validation', () => {
  const CANARY_URL = process.env.CANARY_URL || 'https://canary.example.com';
  const PRODUCTION_URL = 'https://example.com';

  test('canary health check passes', async ({ request }) => {
    const response = await request.get(`${CANARY_URL}/health`);

    expect(response.status()).toBe(200);

    const health = await response.json();
    expect(health.status).toBe('healthy');
    expect(health.version).toMatch(/^\d+\.\d+\.\d+$/);
  });

  test('canary performance matches production', async ({ request }) => {
    const endpoints = ['/api/users', '/api/products', '/api/orders'];

    for (const endpoint of endpoints) {
      // Test canary
      const canaryStart = Date.now();
      const canaryResponse = await request.get(`${CANARY_URL}${endpoint}`);
      const canaryDuration = Date.now() - canaryStart;

      // Test production
      const prodStart = Date.now();
      const prodResponse = await request.get(`${PRODUCTION_URL}${endpoint}`);
      const prodDuration = Date.now() - prodStart;

      // Canary should be within 50% of production performance
      expect(canaryDuration).toBeLessThan(prodDuration * 1.5);

      console.log(`${endpoint}: Canary ${canaryDuration}ms vs Prod ${prodDuration}ms`);
    }
  });

  test('canary error rate acceptable', async ({ request }) => {
    const requests = 100;
    let errors = 0;

    const promises = Array.from({ length: requests }, () =>
      request.get(`${CANARY_URL}/api/test`).catch(() => errors++),
    );

    await Promise.all(promises);

    const errorRate = errors / requests;

    // Error rate should be < 1%
    expect(errorRate).toBeLessThan(0.01);
  });
});

A/B Testing for Quality

A/B Test with Quality Metrics

// lib/ab-testing.ts
export class ABTestingFramework {
  assignVariant(userId: string, testName: string): 'A' | 'B' {
    const hash = this.hash(`${userId}:${testName}`);
    return hash % 2 === 0 ? 'A' : 'B';
  }

  trackEvent(userId: string, testName: string, eventName: string, value?: any) {
    const variant = this.assignVariant(userId, testName);

    // Send to analytics
    fetch('/api/analytics', {
      method: 'POST',
      body: JSON.stringify({
        userId,
        testName,
        variant,
        eventName,
        value,
        timestamp: new Date().toISOString(),
      }),
    });
  }

  private hash(str: string): number {
    let hash = 0;
    for (let i = 0; i < str.length; i++) {
      hash = (hash << 5) - hash + str.charCodeAt(i);
    }
    return Math.abs(hash);
  }
}

// Usage: A/B test checkout flows
const abTest = new ABTestingFramework();

test('A/B test - variant A performance', async ({ page }) => {
  // Force user into variant A
  await page.addInitScript(() => {
    localStorage.setItem('ab_checkout_variant', 'A');
  });

  await page.goto('/checkout');

  const startTime = Date.now();
  await page.click('[data-test="complete-purchase"]');
  await page.waitForURL('/confirmation');
  const duration = Date.now() - startTime;

  // Track completion time
  expect(duration).toBeLessThan(5000);
  console.log(`Variant A: ${duration}ms`);
});

test('A/B test - variant B performance', async ({ page }) => {
  // Force user into variant B
  await page.addInitScript(() => {
    localStorage.setItem('ab_checkout_variant', 'B');
  });

  await page.goto('/checkout');

  const startTime = Date.now();
  await page.click('[data-test="complete-purchase"]');
  await page.waitForURL('/confirmation');
  const duration = Date.now() - startTime;

  expect(duration).toBeLessThan(5000);
  console.log(`Variant B: ${duration}ms`);
});

Synthetic Monitoring

Continuous Production Monitoring

// tests/synthetic-monitoring.spec.ts
import { test, expect } from '@playwright/test';

test.describe('Synthetic Monitoring - Production', () => {
  test.use({ baseURL: 'https://example.com' });

  test('critical user journey - signup to first purchase', async ({ page }) => {
    // Track journey timing
    const journeyStart = Date.now();

    // 1. Signup
    await page.goto('/signup');
    const email = `monitor-${Date.now()}@example.com`;
    await page.fill('[data-test="email"]', email);
    await page.fill('[data-test="password"]', 'MonitorPass123!');
    await page.click('[data-test="signup"]');

    await expect(page).toHaveURL('/dashboard', { timeout: 5000 });

    // 2. Browse products
    await page.goto('/products');
    await page.waitForSelector('[data-test="product-card"]', { timeout: 3000 });

    // 3. Add to cart
    await page.click('[data-test="product-1"] [data-test="add-to-cart"]');
    await expect(page.locator('[data-test="cart-count"]')).toHaveText('1', { timeout: 2000 });

    // 4. Checkout
    await page.goto('/checkout');
    await page.fill('[data-test="card-number"]', '4242424242424242');
    await page.fill('[data-test="card-expiry"]', '12/25');
    await page.fill('[data-test="card-cvc"]', '123');
    await page.click('[data-test="pay"]');

    // 5. Confirmation
    await expect(page).toHaveURL(/\/confirmation/, { timeout: 10000 });

    const journeyDuration = Date.now() - journeyStart;

    // Report to monitoring service
    await reportMetric('critical_journey_duration', journeyDuration);

    // Verify reasonable performance
    expect(journeyDuration).toBeLessThan(30000); // 30 seconds max
  });

  test('API availability - all critical endpoints', async ({ request }) => {
    const endpoints = ['/api/health', '/api/users/me', '/api/products', '/api/orders'];

    for (const endpoint of endpoints) {
      const start = Date.now();
      const response = await request.get(endpoint, {
        headers: { Authorization: `Bearer ${process.env.API_TOKEN}` },
      });
      const duration = Date.now() - start;

      expect(response.status()).toBeLessThan(400);
      expect(duration).toBeLessThan(1000);

      await reportMetric(`api_latency_${endpoint}`, duration);
    }
  });
});

async function reportMetric(name: string, value: number) {
  await fetch('https://monitoring.example.com/metrics', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      metric: name,
      value,
      timestamp: Date.now(),
      environment: 'production',
    }),
  });
}

Scheduled Synthetic Checks

# .github/workflows/synthetic-monitoring.yml
name: Synthetic Monitoring

on:
  schedule:
    - cron: '*/5 * * * *' # Every 5 minutes
  workflow_dispatch:

jobs:
  synthetic-checks:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'

      - name: Install dependencies
        run: npm ci

      - name: Run synthetic monitoring tests
        run: npx playwright test tests/synthetic-monitoring.spec.ts
        env:
          BASE_URL: https://example.com
          API_TOKEN: ${{ secrets.PROD_API_TOKEN }}

      - name: Alert on failure
        if: failure()
        uses: 8398a7/action-slack@v3
        with:
          status: ${{ job.status }}
          text: '🚨 Synthetic monitoring failed!'
          webhook_url: ${{ secrets.SLACK_WEBHOOK }}

Chaos Engineering in Production

Controlled Failure Injection

// lib/chaos.ts
export class ChaosMonkey {
  constructor(private enabled: boolean = false) {}

  async withRandomLatency<T>(fn: () => Promise<T>, maxLatency: number = 1000): Promise<T> {
    if (this.enabled && Math.random() > 0.9) {
      const delay = Math.random() * maxLatency;
      await new Promise((resolve) => setTimeout(resolve, delay));
    }
    return await fn();
  }

  async withRandomFailure<T>(fn: () => Promise<T>, failureRate: number = 0.05): Promise<T> {
    if (this.enabled && Math.random() < failureRate) {
      throw new Error('Chaos Monkey: Simulated failure');
    }
    return await fn();
  }
}

// Usage in API
const chaos = new ChaosMonkey(process.env.CHAOS_ENABLED === 'true' && process.env.NODE_ENV === 'production');

app.get('/api/products', async (req, res) => {
  try {
    const products = await chaos.withRandomLatency(() => db.query('SELECT * FROM products'), 2000);

    res.json(products);
  } catch (error) {
    res.status(500).json({ error: 'Service unavailable' });
  }
});

Rollback Strategies

Instant Rollback via Feature Flags

// lib/emergency-rollback.ts
export class EmergencyRollback {
  async killSwitch(featureName: string): Promise<void> {
    console.error(`🚨 KILL SWITCH ACTIVATED: ${featureName}`);

    // Disable feature flag immediately
    await fetch('/api/admin/feature-flags/disable', {
      method: 'POST',
      body: JSON.stringify({ feature: featureName }),
      headers: {
        Authorization: `Bearer ${process.env.ADMIN_TOKEN}`,
        'Content-Type': 'application/json',
      },
    });

    // Clear CDN cache
    await this.purgeCDN();

    // Notify team
    await this.notifyTeam(`Feature ${featureName} rolled back`);
  }

  private async purgeCDN() {
    // Implementation depends on CDN provider
  }

  private async notifyTeam(message: string) {
    await fetch(process.env.SLACK_WEBHOOK!, {
      method: 'POST',
      body: JSON.stringify({ text: message }),
    });
  }
}

Best Practices

Production Testing Checklist

Strategy Risk Level Rollback Time When to Use
Feature Flags 🟢 Low Instant Always
Canary (5%) 🟡 Medium < 5 min Major releases
Progressive (10→50→100) 🟡 Medium < 15 min New features
A/B Testing 🟢 Low Instant UX changes
Chaos Engineering 🟡 Medium N/A Resilience validation
Synthetic Monitoring 🟢 Low N/A Always

Key Principles

  1. Always have a kill switch: Feature flags enable instant rollback
  2. Monitor everything: Errors, latency, success rate, user behavior
  3. Start small: 1% → 10% → 50% → 100%
  4. Automate rollback: Set thresholds and auto-revert on breach
  5. Separate deploy from release: Ship dark, enable gradually
  6. Test the rollback: Practice emergency procedures
  7. Communicate clearly: Alert team before/during/after tests

Conclusion

Testing in production isn't reckless—it's essential. With feature flags, progressive rollouts, synthetic monitoring, and automated rollback mechanisms, you can safely validate changes with real users, real data, and real scale.

The key is intentionality: test in production deliberately, monitor obsessively, and always have an instant rollback plan. Start with feature flags, add canary deployments, and gradually build up to chaos engineering.

Your staging environment will never catch everything. Production testing will.

Related articles: Also see de-risking the deployment that precedes production testing, the observability foundation required to test safely in production, and deployment strategies that make production testing incremental and safe.


Ready to safely test in production? Try ScanlyApp with built-in synthetic monitoring, progressive rollout tracking, and automated rollback triggers. Start free—no credit card required.

Related Posts