Docker Production Checklist · The 15 Things That Matter

01Use a minimal base image

Default to distroless, alpine, or -slim variants. The default Ubuntu image is ~80 MB before you add anything; node:20-alpine is ~50 MB; gcr.io/distroless/nodejs20 is ~30 MB and contains no shell, no package manager, no userland tools.

Smaller images mean faster pulls, faster cold starts, and less attack surface. An image without a shell can't have a shell escape.

02Use multi-stage builds

✓ multi-stage Dockerfile

# Stage 1: build
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build

# Stage 2: runtime (only what we need)
FROM gcr.io/distroless/nodejs20
WORKDIR /app
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/dist ./dist
USER nonroot
CMD ["dist/server.js"]

Build dependencies (compilers, npm, dev packages) stay in stage 1. Only the runtime artifacts copy into the final image. Typical size reduction: 60-80%.

03Don't run as root

Default Docker user is root. If an attacker exploits your application, they have root inside the container — and depending on container escape vectors, root on the host.

Add a non-root user:

✓ non-root user

RUN addgroup -S app && adduser -S app -G app
USER app

# Or for distroless, use the built-in nonroot user:
USER nonroot

04Pin everything — base image, packages, layers

FROM node:20 means "whatever node:20 points to today." Pin to a digest or specific version:

✓ pinned base image

# Pin to specific version
FROM node:20.11.0-alpine3.19

# Or pin to digest (most secure — immutable)
FROM node@sha256:abc123...

Same for OS packages: apk add curl=8.5.0-r0, not apk add curl. Pinning prevents surprise updates from breaking your build or introducing vulnerabilities.

05Handle signals properly

When Kubernetes (or any orchestrator) wants to stop your container, it sends SIGTERM. If your process doesn't handle it, you have 30 seconds before SIGKILL — meaning dropped connections, lost work, and bad shutdown logs.

Two common failure modes:

Shell-form CMD (CMD npm start) wraps your process in /bin/sh, which doesn't forward signals. Use exec form: CMD ["node", "server.js"].
Multiple processes without a proper init. If you must run multiple processes, use tini as PID 1 to reap zombies and forward signals.

✓ graceful shutdown in Node

const server = app.listen(3000);

process.on('SIGTERM', () => {
  console.log('SIGTERM received, draining...');
  server.close(() => {
    console.log('Server closed');
    process.exit(0);
  });
});

06Add a healthcheck

Both Docker and Kubernetes use healthchecks to decide if your container is alive. Define both HEALTHCHECK in the Dockerfile and liveness/readiness probes in K8s.

Two distinct concepts:

Liveness — is the container alive? If not, restart it. Should be a simple "the process is responding."
Readiness — is the container ready to receive traffic? Returns false during startup, during overload, while warming caches. K8s removes the pod from the load balancer when not ready.

Healthcheck endpoints should be cheap (no database calls, no auth) and not in your application logs (they'll dominate volume).

07Never bake secrets into images

Anyone who pulls your image (registry leaks, CI compromise, image-layer inspection) can extract baked-in secrets. Even if you delete them in a later layer, they're in earlier layers.

Pass secrets at runtime:

Environment variables (basic, often acceptable)
Mounted secret files (Kubernetes Secrets, Docker Swarm secrets)
Sidecar fetching from a vault (AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager)

08Use .dockerignore aggressively

Without a .dockerignore, every COPY . . includes .git/, node_modules/, .env, build artifacts, and IDE files. Two problems: bigger images and accidentally-leaked secrets.

✓ minimal .dockerignore

.git
.gitignore
node_modules
npm-debug.log
.env*
.vscode
.idea
*.md
test/
coverage/
dist/
.DS_Store
Dockerfile*

09Order Dockerfile commands for cache efficiency

Docker caches each instruction. If a layer changes, every subsequent layer rebuilds. Put rarely-changing instructions early; frequently-changing ones (your source code) late.

✗ rebuilds dependencies on every code change

COPY . .
RUN npm install

✓ caches dependencies between code changes

COPY package*.json ./
RUN npm ci --only=production
COPY . .

Dependencies change less often than source code. Copy and install them first; only re-run npm ci when package.json changes.

10Scan images for vulnerabilities

Run vulnerability scans in CI. Tools: Trivy (open source, fast), Snyk, Grype, Docker Scout. They check your image against CVE databases and flag known vulnerabilities in base images and dependencies.

✓ trivy scan in CI

trivy image --severity HIGH,CRITICAL --exit-code 1 my-image:latest

Fail the build on CRITICAL findings. Triage HIGH findings (some can't be fixed by you because they're in transitive deps; you decide what to accept).

11Run with a read-only filesystem

Most applications don't need to write to disk at runtime. Run with read-only root filesystem; mount /tmp as a tmpfs if needed.

✓ read-only filesystem in K8s

securityContext:
  readOnlyRootFilesystem: true
volumeMounts:
  - name: tmp
    mountPath: /tmp
volumes:
  - name: tmp
    emptyDir: {}

An attacker who gains code execution can't drop a binary, can't modify configs, can't persist anything. Major defense in depth.

12Set resource limits

Without limits, a memory leak takes down the host. With limits, the container dies and gets restarted; the host stays up.

Set both requests (what you're guaranteed) and limits (the cap):

✓ K8s resource limits

resources:
  requests:
    memory: "256Mi"
    cpu: "250m"
  limits:
    memory: "512Mi"
    cpu: "1000m"

13Log to stdout/stderr — never files

Containerized applications write logs to stdout and stderr. The container runtime captures them and routes to your log aggregator.

Don't log to files inside the container — they fill the disk, they're invisible to log aggregators, and they disappear when the container dies. Even if your framework defaults to file logging, override it.

14Emit metrics and traces

Production containers expose metrics on a known endpoint (/metrics for Prometheus, OpenTelemetry exporters for traces). Without these, you're flying blind when something goes wrong.

Minimum metrics: request count, request duration, error rate, in-flight requests, queue depth (if applicable). Most frameworks have a one-line integration.

15Sign your images

Image signing (Cosign, Notary) lets you verify that the image deployed is the image you built. Without signing, anyone with registry write access can replace your image and you'd never know.

✓ sign with cosign

# Sign at build time
cosign sign --key cosign.key my-registry/my-image:v1.0.0

# Verify before deploying
cosign verify --key cosign.pub my-registry/my-image:v1.0.0

∞The complete picture

Each item on this list is small. Together they're the difference between a container you'd hand to a customer and one you're embarrassed about in a security review.

Production-grade containers aren't fancier than dev ones — they're just less casual. Smaller, more locked down, more observable, more resilient. The discipline pays back the first time something goes wrong at 3am and your container actually behaves the way you'd hope.