Flaky Tests: How to Identify, Diagnose, and Eliminate Them Once and for All
Flaky tests are the silent killers of continuous integration. They pass sometimes, fail other times, and provide no real value except frustration. A test suite with even 5% flakiness can undermine team confidence and lead to ignoring real failures. This comprehensive guide provides battle-tested strategies for identifying, debugging, and eliminating test flakiness.
Understanding Flaky Tests
A flaky test is a test that exhibits non-deterministic behavior—sometimes passing and sometimes failing without any code changes.
The Cost of Flakiness
graph TD
A[Flaky Test Fails] --> B{Real Bug?}
B -->|Check| C[Developer Investigates]
C --> D{Actually Flaky}
D -->|Yes| E[Wasted Time]
D -->|No| F[Fix Bug]
E --> G[Re-run Tests]
G --> H{Still Fails?}
H -->|Yes| C
H -->|No| I[Merge, but no confidence]
style A fill:#ff6b6b
style E fill:#ff9a76
style I fill:#ffd93d
Impact Analysis:
| Flakiness Rate | Impact | Annual Cost (10-person team) |
|---|---|---|
| 1-2% | Minor annoyance | $10,000 - $20,000 |
| 3-5% | Significant disruption | $30,000 - $60,000 |
| 6-10% | Major productivity loss | $80,000 - $150,000 |
| >10% | Test suite becomes useless | $200,000+ |
Cost calculation based on: investigation time, re-runs, missed bugs, lost confidence
Root Causes of Flaky Tests
1. Timing Issues (60% of flaky tests)
Problem: Race Conditions
// ❌ FLAKY: Race condition
test('displays search results', async ({ page }) => {
await page.goto('/search');
await page.fill('[name="query"]', 'playwright');
await page.click('button[type="submit"]');
// PROBLEM: Results might not be loaded yet
const results = await page.locator('.result-item').count();
expect(results).toBeGreaterThan(0);
});
// ✅ FIXED: Explicit wait for element
test('displays search results', async ({ page }) => {
await page.goto('/search');
await page.fill('[name="query"]', 'playwright');
await page.click('button[type="submit"]');
// Wait for at least one result to appear
await page.waitForSelector('.result-item', { timeout: 5000 });
const results = await page.locator('.result-item').count();
expect(results).toBeGreaterThan(0);
});
Problem: Network Latency
// ❌ FLAKY: Assumes immediate response
test('fetches user data', async ({ page }) => {
await page.goto('/dashboard');
// PROBLEM: API might take varying time to respond
const userName = await page.locator('[data-testid="user-name"]').textContent();
expect(userName).toBe('John Doe');
});
// ✅ FIXED: Wait for network idle and specific state
test('fetches user data', async ({ page }) => {
await page.goto('/dashboard', { waitUntil: 'networkidle' });
// Wait for loading state to disappear
await page.waitForSelector('[data-testid="loading"]', { state: 'hidden' });
// Now data should be loaded
const userName = await page.locator('[data-testid="user-name"]');
await expect(userName).toHaveText('John Doe');
});
Problem: Animation and Transitions
// ❌ FLAKY: Clicks during animation
test('opens modal', async ({ page }) => {
await page.goto('/dashboard');
await page.click('[data-testid="open-modal"]');
// PROBLEM: Modal is animating in, button might not be clickable
await page.click('[data-testid="modal-submit"]');
});
// ✅ FIXED: Wait for stable state
test('opens modal', async ({ page }) => {
await page.goto('/dashboard');
await page.click('[data-testid="open-modal"]');
// Wait for modal to be fully visible and stable
const modal = page.locator('[data-testid="modal"]');
await expect(modal).toBeVisible();
// Wait for animation to complete
await page.waitForFunction(() => {
const element = document.querySelector('[data-testid="modal"]');
return element && getComputedStyle(element).transitionProperty === 'none';
});
await page.click('[data-testid="modal-submit"]');
});
2. Test Isolation Issues (20% of flaky tests)
Problem: Shared State
``typescript // ❌ FLAKY: Tests share database state describe('User Management', () => { // First test runs test('creates user', async () => { const user = await createUser({ email: 'test@example.com' }); expect(user.id).toBeDefined(); });
// Second test fails if first ran already test('prevents duplicate email', async () => { // PROBLEM: test@example.com might already exist await expect( createUser({ email: 'test@example.com' }) ).rejects.toThrow('Email already exists'); }); });
// ✅ FIXED: Proper isolation describe('User Management', () => { afterEach(async () => { // Clean up after each test await cleanupTestUsers(); });
test('creates user', async () => {
const email = test-${Date.now()}@example.com;
const user = await createUser({ email });
expect(user.id).toBeDefined();
});
test('prevents duplicate email', async () => {
const email = test-${Date.now()}@example.com;
// Create first user
await createUser({ email });
// Attempt duplicate
await expect(
createUser({ email })
).rejects.toThrow('Email already exists');
}); });
**Problem: Cookie/Storage Leakage**
```typescript
// ❌ FLAKY: Tests share cookies
describe('Authentication', () => {
test('logs in successfully', async ({ page }) => {
await page.goto('/login');
await page.fill('[name="email"]', 'test@example.com');
await page.fill('[name="password"]', 'password');
await page.click('button[type="submit"]');
await expect(page).toHaveURL(/.*dashboard/);
});
test('requires authentication', async ({ page }) => {
// PROBLEM: Might still have auth cookie from previous test
await page.goto('/dashboard');
await expect(page).toHaveURL(/.*login/); // Might fail!
});
});
// ✅ FIXED: Clear context between tests
describe('Authentication', () => {
test('logs in successfully', async ({ page, context }) => {
await page.goto('/login');
await page.fill('[name="email"]', 'test@example.com');
await page.fill('[name="password"]', 'password');
await page.click('button[type="submit"]');
await expect(page).toHaveURL(/.*dashboard/);
});
test('requires authentication', async ({ browser }) => {
// Create fresh context without cookies
const context = await browser.newContext();
const page = await context.newPage();
await page.goto('/dashboard');
await expect(page).toHaveURL(/.*login/);
await context.close();
});
});
// OR: Use test-isolation:storage in playwright.config.ts
3. External Dependencies (15% of flaky tests)
Problem: Third-Party APIs
// ❌ FLAKY: Depends on external API
test('fetches weather data', async ({ page }) => {
await page.goto('/weather');
// PROBLEM: External API might be slow or down
const temp = await page.locator('[data-testid="temperature"]').textContent();
expect(parseFloat(temp!)).toBeGreaterThan(0);
});
// ✅ FIXED: Mock external dependencies
test('fetches weather data', async ({ page }) => {
// Intercept and mock API call
await page.route('**/api/weather/**', (route) => {
route.fulfill({
status: 200,
contentType: 'application/json',
body: JSON.stringify({
temperature: 72,
conditions: 'sunny',
}),
});
});
await page.goto('/weather');
await expect(page.locator('[data-testid="temperature"]')).toHaveText('72°F');
});
Problem: Date/Time Dependencies
// ❌ FLAKY: Depends on current time
test('shows greeting based on time', async ({ page }) => {
await page.goto('/dashboard');
// PROBLEM: Fails at midnight, morning, afternoon, etc.
const hour = new Date().getHours();
const expectedGreeting = hour < 12 ? 'Good morning' : 'Good afternoon';
await expect(page.locator('[data-testid="greeting"]')).toHaveText(expectedGreeting);
});
// ✅ FIXED: Control time
test('shows morning greeting', async ({ page }) => {
// Set fixed time: 9 AM
await page.addInitScript(() => {
const mockDate = new Date('2026-01-01T09:00:00');
Date.now = () => mockDate.getTime();
});
await page.goto('/dashboard');
await expect(page.locator('[data-testid="greeting"]')).toHaveText('Good morning');
});
test('shows afternoon greeting', async ({ page }) => {
// Set fixed time: 2 PM
await page.addInitScript(() => {
const mockDate = new Date('2026-01-01T14:00:00');
Date.now = () => mockDate.getTime();
});
await page.goto('/dashboard');
await expect(page.locator('[data-testid="greeting"]')).toHaveText('Good afternoon');
});
4. Resource Constraints (5% of flaky tests)
Problem: Memory Pressure
// ❌ FLAKY: Memory leaks cause crashes
test.describe('Heavy Media Tests', () => {
test('loads 100 images', async ({ page }) => {
await page.goto('/gallery');
// PROBLEM: Might run out of memory
for (let i = 0; i < 100; i++) {
await page.click(`[data-image-id="${i}"]`);
await page.waitForSelector('[data-testid="lightbox"]');
// Lightbox component not properly cleaned up
}
});
});
// ✅ FIXED: Proper cleanup
test.describe('Heavy Media Tests', () => {
test('loads 100 images', async ({ page }) => {
await page.goto('/gallery');
for (let i = 0; i < 100; i++) {
await page.click(`[data-image-id="${i}"]`);
await page.waitForSelector('[data-testid="lightbox"]');
// Close lightbox to free memory
await page.click('[data-testid="close-lightbox"]');
await page.waitForSelector('[data-testid="lightbox"]', { state: 'hidden' });
}
});
// OR: Break into smaller tests
test('loads images 0-25', async ({ page }) => {
await testImageRange(page, 0, 25);
});
test('loads images 26-50', async ({ page }) => {
await testImageRange(page, 26, 50);
});
});
Detecting Flaky Tests
Strategy 1: Statistical Analysis
// scripts/detect-flaky-tests.ts
interface TestRun {
name: string;
passed: boolean;
duration: number;
timestamp: Date;
}
interface FlakinessSummary {
testName: string;
totalRuns: number;
failures: number;
flakeRate: number;
avgDuration: number;
stdDevDuration: number;
}
function analyzeFlakiness(runs: TestRun[]): FlakinessSummary[] {
// Group by test name
const grouped = new Map<string, TestRun[]>();
for (const run of runs) {
if (!grouped.has(run.name)) {
grouped.set(run.name, []);
}
grouped.get(run.name)!.push(run);
}
// Analyze each test
const summaries: FlakinessSummary[] = [];
for (const [testName, testRuns] of grouped) {
if (testRuns.length < 10) continue; // Need enough data
const failures = testRuns.filter((r) => !r.passed).length;
const flakeRate = (failures / testRuns.length) * 100;
// Calculate duration statistics
const durations = testRuns.map((r) => r.duration);
const avgDuration = durations.reduce((a, b) => a + b, 0) / durations.length;
const variance = durations.reduce((sum, d) => sum + Math.pow(d - avgDuration, 2), 0) / durations.length;
const stdDevDuration = Math.sqrt(variance);
// Flag as flaky if:
// 1. Failure rate between 5% and 95% (consistent failures aren't flaky)
// 2. High duration variance (>30% coefficient of variation)
const isFlaky = (flakeRate > 5 && flakeRate < 95) || stdDevDuration / avgDuration > 0.3;
if (isFlaky) {
summaries.push({
testName,
totalRuns: testRuns.length,
failures,
flakeRate,
avgDuration,
stdDevDuration,
});
}
}
// Sort by flake rate descending
return summaries.sort((a, b) => b.flakeRate - a.flakeRate);
}
// Usage: Analyze last 100 test runs
const flakyTests = analyzeFlakiness(recentTestRuns);
console.log('🔴 Flaky Tests Detected:\n');
for (const test of flakyTests) {
console.log(`${test.testName}`);
console.log(` Flake Rate: ${test.flakeRate.toFixed(1)}%`);
console.log(` Failures: ${test.failures}/${test.totalRuns}`);
console.log(` Duration: ${test.avgDuration.toFixed(0)}ms ± ${test.stdDevDuration.toFixed(0)}ms`);
console.log('');
}
Strategy 2: Automated Flakiness Detection in CI
# .github/workflows/flaky-test-detection.yml
name: Flaky Test Detection
on:
schedule:
# Run nightly
- cron: '0 2 * * *'
workflow_dispatch:
jobs:
detect-flaky-tests:
runs-on: ubuntu-latest
strategy:
matrix:
run: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '22'
- run: npm ci
- run: npx playwright install --with-deps
# Run tests
- run: npm run test:e2e
continue-on-error: true
env:
RUN_NUMBER: ${{ matrix.run }}
# Upload results
- name: Upload results
if: always()
uses: actions/upload-artifact@v4
with:
name: test-results-${{ matrix.run }}
path: test-results/
analyze-results:
needs: detect-flaky-tests
runs-on: ubuntu-latest
if: always()
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
# Download all results
- uses: actions/download-artifact@v4
with:
path: all-test-results/
# Analyze for flakiness
- run: npm ci
- run: node scripts/detect-flaky-tests.js
# Create issue if flaky tests found
- name: Create Issue
if: failure()
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const flakyTests = JSON.parse(fs.readFileSync('flaky-tests.json'));
let body = '## 🔴 Flaky Tests Detected\n\n';
body += '| Test | Flake Rate | Failures |\n';
body += '|------|------------|----------|\n';
for (const test of flakyTests) {
body += `| ${test.name} | ${test.flakeRate}% | ${test.failures}/10 |\n`;
}
body += '\n---\n';
body += 'These tests exhibited non-deterministic behavior over 10 runs.\n';
body += 'Please investigate and fix before they spread to more failures.';
github.rest.issues.create({
owner: context.repo.owner,
repo: context.repo.repo,
title: `Flaky Tests Detected: ${flakyTests.length} tests need attention`,
body: body,
labels: ['flaky-test', 'testing', 'priority-high']
});
Strategy 3: Playwright Test Retries with Reporting
// playwright.config.ts
import { defineConfig } from '@playwright/test';
export default defineConfig({
// Retry failed tests
retries: process.env.CI ? 2 : 0,
reporter: [
['html'],
['json', { outputFile: 'test-results.json' }],
// Custom reporter to track retries
['./reporters/flaky-test-reporter.ts'],
],
use: {
trace: 'retain-on-failure',
video: 'retain-on-failure',
screenshot: 'only-on-failure',
},
});
// reporters/flaky-test-reporter.ts
import { Reporter, TestCase, TestResult } from '@playwright/test/reporter';
class FlakyTestReporter implements Reporter {
private flakyTests: Map<string, number> = new Map();
onTestEnd(test: TestCase, result: TestResult) {
// Track tests that passed but had retries
if (result.status === 'passed' && result.retry > 0) {
const testName = test.titlePath().join(' > ');
this.flakyTests.set(testName, result.retry);
}
// Track tests that failed even with retries
if (result.status === 'failed' && result.retry === test.retries) {
const testName = test.titlePath().join(' > ');
console.log(`❌ Test failed after ${result.retry} retries: ${testName}`);
}
}
onEnd() {
if (this.flakyTests.size > 0) {
console.log('\n⚠️ Flaky Tests Detected (passed after retry):');
for (const [test, retries] of this.flakyTests) {
console.log(` - ${test} (${retries} ${retries === 1 ? 'retry' : 'retries'})`);
}
console.log('\nThese tests should be investigated and fixed.\n');
}
}
}
export default FlakyTestReporter;
Debugging Flaky Tests
Technique 1: Stress Testing
# Run test 100 times to reproduce flakiness
for i in {1..100}; do
echo "Run $i"
npx playwright test tests/flaky-test.spec.ts --reporter=line
if [ $? -ne 0 ]; then
echo "Failed on run $i"
break
fi
done
# OR: Use Playwright repeat-each
npx playwright test tests/flaky-test.spec.ts --repeat-each=100
Technique 2: Video and Trace Analysis
// playwright.config.ts
export default defineConfig({
use: {
// Record video for every test (temporarily for debugging)
video: 'on',
// Capture trace for flaky tests
trace: 'on',
// Slow down actions to see what's happening
launchOptions: {
slowMo: 100, // 100ms delay between actions
},
},
});
// In flaky test, add debug logging
test('potentially flaky test', async ({ page }) => {
console.log('Starting test...');
await page.goto('/dashboard');
console.log('Navigated to dashboard');
await page.click('[data-testid="load-data"]');
console.log('Clicked load data button');
// Add screenshot before assertion
await page.screenshot({ path: 'before-assertion.png' });
const data = page.locator('[data-testid="data-loaded"]');
await expect(data).toBeVisible();
console.log('Data loaded successfully');
});
Technique 3: Verbose Logging
// lib/test-utils/enhanced-actions.ts
export class EnhancedActions {
constructor(private page: Page) {}
async clickWithLogging(selector: string, options?: { timeout?: number }) {
console.log(`[ACTION] Clicking: ${selector}`);
const element = this.page.locator(selector);
// Wait for element
console.log(`[WAIT] Waiting for element: ${selector}`);
await element.waitFor({ state: 'visible', timeout: options?.timeout });
// Check if clickable
console.log(`[CHECK] Verifying element is enabled: ${selector}`);
await expect(element).toBeEnabled();
// Perform click
console.log(`[CLICK] Executing click: ${selector}`);
await element.click();
console.log(`[SUCCESS] Clicked: ${selector}`);
}
async fillWithLogging(selector: string, value: string) {
console.log(`[ACTION] Filling ${selector} with: "${value}"`);
const element = this.page.locator(selector);
await element.waitFor({ state: 'visible' });
await element.fill(value);
// Verify value was set
const actualValue = await element.inputValue();
if (actualValue !== value) {
console.error(`[ERROR] Expected "${value}" but got "${actualValue}"`);
throw new Error(`Fill verification failed for ${selector}`);
}
console.log(`[SUCCESS] (Filled: ${selector}`);
}
async waitForNetworkIdleWithLogging(options?: { timeout?: number }) {
console.log('[WAIT] Waiting for network idle...');
await this.page.waitForLoadState('networkidle', options);
console.log('[SUCCESS] Network idle achieved');
}
}
// Usage
test('form submission with detailed logging', async ({ page }) => {
const actions = new EnhancedActions(page);
await page.goto('/form');
await actions.fillWithLogging('[name="email"]', 'test@example.com');
await actions.fillWithLogging('[name="password"]', 'password123');
await actions.clickWithLogging('button[type="submit"]');
await actions.waitForNetworkIdleWithLogging();
await expect(page).toHaveURL(/.*success/);
});
Prevention Strategies
1. Design for Testability
// ❌ BAD: Hard to test reliably
function getCurrentTime(): string {
return new Date().toLocaleTimeString();
}
// Component
function Clock() {
const [time, setTime] = useState(getCurrentTime());
useEffect(() => {
const interval = setInterval(() => {
setTime(getCurrentTime());
}, 1000);
return () => clearInterval(interval);
}, []);
return <div>{time}</div>;
}
// ✅ GOOD: Testable design
interface ClockProps {
getCurrentTime?: () => Date;
}
function Clock({ getCurrentTime = () => new Date() }: ClockProps) {
const [time, setTime] = useState(getCurrentTime());
useEffect(() => {
const interval = setInterval(() => {
setTime(getCurrentTime());
}, 1000);
return () => clearInterval(interval);
}, [getCurrentTime]);
return <div>{time.toLocaleTimeString()}</div>;
}
// Test with controlled time
test('clock displays correct time', async ({ page }) => {
await page.addInitScript(() => {
// Mock time
window.mockDate = new Date('2026-01-01T12:00:00');
window.getCurrentTime = () => window.mockDate;
});
await page.goto('/clock');
await expect(page.locator('[data-testid="clock"]')).toHaveText('12:00:00 PM');
});
2. Implement Test Data Builders
// lib/test-data/builders.ts
class UserBuilder {
private user: Partial<User> = {};
withEmail(email: string): this {
this.user.email = email;
return this;
}
withName(name: string): this {
this.user.name = name;
return this;
}
withRole(role: Role): this {
this.user.role = role;
return this;
}
build(): User {
return {
id: `user-${Date.now()}-${Math.random()}`,
email: this.user.email || `test-${Date.now()}@example.com`,
name: this.user.name || 'Test User',
role: this.user.role || 'user',
createdAt: new Date(),
...this.user,
};
}
async create(): Promise<User> {
const user = this.build();
await db.users.insert(user);
return user;
}
}
// Usage - no data conflicts
test('user permissions', async () => {
const admin = await new UserBuilder().withRole('admin').create();
const user = await new UserBuilder().withRole('user').create();
// Each test has unique data
expect(admin.id).not.toBe(user.id);
});
3. Use Proper Waiting Strategies
// lib/test-utils/robust-waits.ts
export class RobustWaits {
static async waitForActionToComplete(
page: Page,
action: () => Promise<void>,
completionIndicator: string,
): Promise<void> {
// Wait for any loading states to disappear
await page
.waitForSelector('[data-loading="true"]', {
state: 'hidden',
timeout: 1000,
})
.catch(() => {}); // Ignore if doesn't exist
// Perform action
await action();
// Wait for completion
await page.waitForSelector(completionIndicator, { timeout: 10000 });
// Wait for network to settle
await page.waitForLoadState('networkidle');
}
static async waitForStableDOM(page: Page, timeout = 2000): Promise<void> {
let lastHTML = '';
let stableCount = 0;
const requiredStableCount = 3;
const startTime = Date.now();
while (stableCount < requiredStableCount) {
if (Date.now() - startTime > timeout) {
throw new Error('DOM did not stabilize within timeout');
}
const currentHTML = await page.content();
if (currentHTML === lastHTML) {
stableCount++;
} else {
stableCount = 0;
}
lastHTML = currentHTML;
await page.waitForTimeout(100);
}
}
}
// Usage
test('complex interaction with robust waiting', async ({ page }) => {
await page.goto('/dashboard');
await RobustWaits.waitForActionToComplete(
page,
() => page.click('[data-testid="refresh-data"]'),
'[data-testid="data-refreshed"]',
);
await RobustWaits.waitForStableDOM(page);
const dataCount = await page.locator('.data-item').count();
expect(dataCount).toBeGreaterThan(0);
});
Quarantine Strategy
When you can't fix a flaky test immediately, quarantine it:
// tests/flaky/README.md
/*
Quarantine Policy:
- Tests here are known to be flaky
- They run separately and don't block CI
- Maximum quarantine: 2 weeks
- After 2 weeks: fix or delete
*/
// playwright.config.ts
export default defineConfig({
projects: [
{
name: 'stable',
testIgnore: /.*flaky.*/,
},
{
name: 'quarantine',
testMatch: /.*flaky.*/,
retries: 3,
// Run separately in CI
},
],
});
// .github/workflows/tests.yml
jobs:
stable-tests:
runs-on: ubuntu-latest
steps:
- run: npx playwright test --project=stable
quarantine-tests:
runs-on: ubuntu-latest
continue-on-error: true # Don't block pipeline
steps:
- run: npx playwright test --project=quarantine
Success Metrics
Track your progress eliminating flaky tests:
| Metric | Target | How to Measure |
|---|---|---|
| Flakiness Rate | <1% | (flaky tests / total tests) × 100 |
| Test Confidence | >95% | Survey team: "Do you trust test results?" |
| Retry Rate | <5% | Tests requiring retries to pass |
| Time in Quarantine | <1 week avg | Days between quarantine and fix |
| False Positive Rate | <2% | Failures investigated that weren't real bugs |
Conclusion: Build Trust in Your Test Suite
Flaky tests are a cancer that metastasizes through your test suite, undermining trust and wasting time. Eliminating flakiness requires:
- Understanding root causes (timing, isolation, dependencies)
- Systematic detection (statistical analysis, automated checks)
- Rigorous debugging (stress testing, logging, trace analysis)
- Prevention practices (proper waits, test isolation, data management)
- Ongoing vigilance (metrics, quarantine, continuous improvement)
A reliable test suite is worth the investment—it's the foundation of confident continuous deployment.
Eliminate Flaky Tests with ScanlyApp
ScanlyApp provides advanced test stability features including flakiness detection, automatic retries with analysis, and detailed debugging traces to help you build a reliable test suite.
Start Your Free Trial and gain confidence in your test automation.
Related articles: Also see a forensic CI/CD approach to diagnosing flaky tests, cutting test time without introducing concurrency-driven flakiness, and reliable test environments as the foundation for stable tests.
