Back to Blog

Kubernetes Ephemeral Test Environments: Spin Up a Fresh Cluster Per PR, Tear Down After Merge

Ephemeral per-PR environments solve the staging contention problem — every pull request gets its own isolated environment, runs tests against real infrastructure, and tears down automatically on merge. This guide covers building ephemeral Kubernetes test environments with namespace isolation and automated lifecycle management.

Published

6 min read

Reading time

Kubernetes Ephemeral Test Environments: Spin Up a Fresh Cluster Per PR, Tear Down After Merge

The shared staging environment problem is universal: Team A is testing a database migration, Team B is running load tests, and Team C is demonstrating a new feature to a customer. All three teams compete for the same environment, which is now simultaneously broken in three different ways.

Ephemeral environments solve this by giving every pull request or branch its own disposable, isolated environment. Spin it up when the PR opens, run tests against it, use it for review, tear it down when the PR merges. The infrastructure cost is temporary and proportional to active PRs.

This guide covers implementing ephemeral environments with Kubernetes namespace isolation.


Architecture Overview

flowchart TD
    A[Developer opens PR] --> B[CI workflow triggered]
    B --> C[Build Docker images]
    C --> D[Create Kubernetes namespace:\npr-42-feature-auth]
    D --> E[Deploy services to namespace]
    E --> F[Run automated tests]
    F --> G{Tests pass?}
    G -->|Yes| H[Post PR URL as comment]
    G -->|No| I[Post failure report]
    H --> J[Team uses environment for review]
    J --> K[PR merged or closed]
    K --> L[Delete namespace + all resources]
    L --> M[Environment gone]

Each namespace is fully isolated: its own frontend, backend, database, worker services. No shared state between PRs.


Namespace Isolation Pattern

The core Kubernetes primitive is the namespace. Each ephemeral environment gets a namespace derived from the PR number:

# deploy/ephemeral/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: pr-${PR_NUMBER}
  labels:
    environment: ephemeral
    pr-number: '${PR_NUMBER}'
    created-by: 'ci'
  annotations:
    # Auto-cleanup annotation (requires custom controller or scheduled job)
    ttl: '7d'
    created-at: '${CREATED_AT}'
#!/bin/bash
# deploy/scripts/create-ephemeral-env.sh
set -e

PR_NUMBER=$1
IMAGE_TAG=${2:-"pr-${PR_NUMBER}"}
NAMESPACE="pr-${PR_NUMBER}"

echo "Creating ephemeral environment for PR #${PR_NUMBER}..."

# Create namespace
kubectl create namespace "$NAMESPACE" --dry-run=client -o yaml | kubectl apply -f -

# Label for lifecycle management
kubectl label namespace "$NAMESPACE" \
  environment=ephemeral \
  pr-number="${PR_NUMBER}" \
  created-at="$(date -u +%Y-%m-%dT%H:%M:%SZ)"

# Deploy all services with PR-specific image tags
helm upgrade --install "scanly-pr-${PR_NUMBER}" ./helm/scanly \
  --namespace "$NAMESPACE" \
  --set image.tag="${IMAGE_TAG}" \
  --set environment=ephemeral \
  --set database.url="${EPHEMERAL_DB_URL}" \
  --set ingress.host="pr-${PR_NUMBER}.staging.scanlyapp.com" \
  --wait \
  --timeout 5m

echo "Environment ready: https://pr-${PR_NUMBER}.staging.scanlyapp.com"

CI/CD Integration

# .github/workflows/ephemeral-env.yml
name: Ephemeral Environment
on:
  pull_request:
    types: [opened, synchronize, reopened, closed]

jobs:
  deploy-ephemeral:
    if: github.event.action != 'closed'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Configure kubectl
        uses: azure/k8s-set-context@v3
        with:
          kubeconfig: ${{ secrets.KUBECONFIG_STAGING }}

      - name: Build and push images
        run: |
          docker build -t registry.example.com/scanly:pr-${{ github.event.pull_request.number }} .
          docker push registry.example.com/scanly:pr-${{ github.event.pull_request.number }}

      - name: Deploy ephemeral environment
        run: |
          bash deploy/scripts/create-ephemeral-env.sh \
            ${{ github.event.pull_request.number }} \
            "pr-${{ github.event.pull_request.number }}"

      - name: Run smoke tests
        run: |
          PR_URL="https://pr-${{ github.event.pull_request.number }}.staging.scanlyapp.com"

          # Wait for health check
          npx wait-on "$PR_URL/api/health" --timeout 120000

          # Run smoke test suite
          BASE_URL="${PR_URL}" npx playwright test tests/smoke/

      - name: Comment PR with environment URL
        uses: actions/github-script@v7
        with:
          script: |
            const url = `https://pr-${{ github.event.pull_request.number }}.staging.scanlyapp.com`;
            github.rest.issues.createComment({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: context.issue.number,
              body: `🚀 **Preview environment ready:** [${url}](${url})\n\nAutomatically tears down when PR is closed.`
            });

  teardown-ephemeral:
    if: github.event.action == 'closed'
    runs-on: ubuntu-latest
    steps:
      - name: Configure kubectl
        uses: azure/k8s-set-context@v3
        with:
          kubeconfig: ${{ secrets.KUBECONFIG_STAGING }}

      - name: Tear down ephemeral environment
        run: |
          NAMESPACE="pr-${{ github.event.pull_request.number }}"
          helm uninstall "scanly-${NAMESPACE}" --namespace "$NAMESPACE" --ignore-not-found
          kubectl delete namespace "$NAMESPACE" --ignore-not-found
          echo "Environment torn down."

Database Isolation Strategies

Each ephemeral environment needs its own data store. Three common approaches:

Strategy Setup Time Isolation Cost Complexity
Separate DB per namespace 2-3 min Full Medium Low
DB namespace (schema per PR) 30s Logical Low Medium
Shared DB with RLS per env 0s Logical Minimal High
Point-in-time snapshot branch 1-2 min Full + real data Medium Medium

The database-namespace approach is usually best for CI speed:

-- Create a schema-per-PR namespace
-- Run on test DB cluster during environment setup

DO $$
DECLARE
  pr_schema TEXT := 'pr_' || {PR_NUMBER};
BEGIN
  EXECUTE format('CREATE SCHEMA IF NOT EXISTS %I', pr_schema);
  EXECUTE format('SET search_path TO %I, public', pr_schema);
  -- Run migrations in this schema
END $$;

Resource Limits and Cost Control

Ephemeral environments should have strict resource limits to prevent cost runaway:

# deploy/ephemeral/resource-quota.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
  name: ephemeral-quota
  namespace: pr-${PR_NUMBER}
spec:
  hard:
    requests.cpu: '2'
    requests.memory: 4Gi
    limits.cpu: '4'
    limits.memory: 8Gi
    pods: '20'
    services: '10'
# Cleanup stale ephemeral environments (runs as daily cron job)
#!/bin/bash

# Find namespaces older than MAX_AGE_HOURS
MAX_AGE_HOURS=48

kubectl get namespaces -l environment=ephemeral -o json | \
  jq -r '.items[] |
    select(.metadata.annotations["created-at"] != null) |
    .metadata.name + " " + .metadata.annotations["created-at"]' | \
  while read namespace created_at; do
    age_hours=$(( ($(date +%s) - $(date -d "$created_at" +%s)) / 3600 ))

    if [ "$age_hours" -gt "$MAX_AGE_HOURS" ]; then
      echo "Deleting stale namespace: $namespace (age: ${age_hours}h)"
      kubectl delete namespace "$namespace"
    fi
  done

Related articles: Also see Docker as the building block for Kubernetes-based test environments, testing the Helm charts your ephemeral environments are deployed with, and a complete strategy for managing ephemeral and persistent test environments.


Testing Within Ephemeral Environments

The ephemeral environment enables tests that are impossible on shared staging:

// tests/smoke/ephemeral.test.ts
// Runs in the ephemeral environment context

test('full signup → onboarding → first scan flow', async ({ page }) => {
  const uniqueEmail = `e2e-${Date.now()}@test.example.com`;

  // Test the complete new-user journey in isolated env
  await page.goto('/signup');
  await page.fill('[data-testid="email"]', uniqueEmail);
  await page.fill('[data-testid="password"]', 'TestPassword123!');
  await page.click('[data-testid="signup-btn"]');

  // Complete email verification (ephemeral env has own mailbox)
  const otp = await getEmailOtp(uniqueEmail);
  await page.fill('[data-testid="otp-input"]', otp);
  await page.click('[data-testid="verify-btn"]');

  // Complete onboarding
  await page.fill('[data-testid="project-name"]', 'My Test Project');
  await page.fill('[data-testid="project-url"]', 'https://example.com');
  await page.click('[data-testid="start-scan-btn"]');

  // Wait for first scan to complete (ephemeral env has own worker)
  await expect(page.locator('[data-testid="scan-status"]')).toHaveText('Complete', { timeout: 60_000 });
});

Ephemeral Kubernetes environments are the most powerful testing infrastructure pattern available for SaaS teams. Once set up, they change the entire nature of pre-merge review: every PR has a live, testable, throwaway instance of the full system.

Further Reading

  • Kubernetes Documentation: The official Kubernetes docs covering deployments, namespaces, resource quotas, and everything used in this guide
  • kind — Kubernetes in Docker: Run local Kubernetes clusters using Docker for fast ephemeral environment testing in CI
  • Helm Documentation: The Kubernetes package manager used for templating and deploying complex multi-service applications
  • GitHub Actions — Environments: Configure GitHub Actions deployment environments with approval gates and environment-scoped secrets

Validate your staging and production environments continuously: Try ScanlyApp free and set up automated health checks against your deployed environments.

Related Posts

API Cost Optimisation: How Engineering Teams Cut Cloud Spend by 40%
DevOps & Infrastructure
7 min read

API Cost Optimisation: How Engineering Teams Cut Cloud Spend by 40%

Cloud API costs are the silent killer of SaaS unit economics. AI APIs in particular can generate unexpected bills when called without budgets, caching, or rate limiting. This guide covers a systematic approach to auditing, controlling, and testing your API cost assumptions before they become business-critical surprises.

Chaos Engineering: Break Your System on Purpose Before Your Users Do It for You
DevOps & Infrastructure
6 min read

Chaos Engineering: Break Your System on Purpose Before Your Users Do It for You

Chaos engineering deliberately breaks things before they break on their own — in a controlled environment, with observability, and with a hypothesis. This guide covers practical chaos experiments for SaaS applications: network latency injection, dependency failure simulation, and building confidence that your system degrades gracefully under real-world failure conditions.