Docker production checklist — the 15 things that matter.
Your container runs on your laptop. Great. Production is a different environment with different demands — security, observability, signal handling, secrets. The container that works in docker run is rarely the container you want in Kubernetes at 2am. This is the working checklist: 15 specific items, each with the why and the fix.
01Use a minimal base image
Default to distroless, alpine, or -slim variants. The default Ubuntu image is ~80 MB before you add anything; node:20-alpine is ~50 MB; gcr.io/distroless/nodejs20 is ~30 MB and contains no shell, no package manager, no userland tools.
Smaller images mean faster pulls, faster cold starts, and less attack surface. An image without a shell can't have a shell escape.
02Use multi-stage builds
# Stage 1: build
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
# Stage 2: runtime (only what we need)
FROM gcr.io/distroless/nodejs20
WORKDIR /app
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/dist ./dist
USER nonroot
CMD ["dist/server.js"]
Build dependencies (compilers, npm, dev packages) stay in stage 1. Only the runtime artifacts copy into the final image. Typical size reduction: 60-80%.
03Don't run as root
Default Docker user is root. If an attacker exploits your application, they have root inside the container — and depending on container escape vectors, root on the host.
Add a non-root user:
RUN addgroup -S app && adduser -S app -G app
USER app
# Or for distroless, use the built-in nonroot user:
USER nonroot
04Pin everything — base image, packages, layers
FROM node:20 means "whatever node:20 points to today." Pin to a digest or specific version:
# Pin to specific version
FROM node:20.11.0-alpine3.19
# Or pin to digest (most secure — immutable)
FROM node@sha256:abc123...
Same for OS packages: apk add curl=8.5.0-r0, not apk add curl. Pinning prevents surprise updates from breaking your build or introducing vulnerabilities.
05Handle signals properly
When Kubernetes (or any orchestrator) wants to stop your container, it sends SIGTERM. If your process doesn't handle it, you have 30 seconds before SIGKILL — meaning dropped connections, lost work, and bad shutdown logs.
Two common failure modes:
- Shell-form CMD (
CMD npm start) wraps your process in/bin/sh, which doesn't forward signals. Use exec form:CMD ["node", "server.js"]. - Multiple processes without a proper init. If you must run multiple processes, use
tinias PID 1 to reap zombies and forward signals.
const server = app.listen(3000);
process.on('SIGTERM', () => {
console.log('SIGTERM received, draining...');
server.close(() => {
console.log('Server closed');
process.exit(0);
});
});
06Add a healthcheck
Both Docker and Kubernetes use healthchecks to decide if your container is alive. Define both HEALTHCHECK in the Dockerfile and liveness/readiness probes in K8s.
Two distinct concepts:
- Liveness — is the container alive? If not, restart it. Should be a simple "the process is responding."
- Readiness — is the container ready to receive traffic? Returns false during startup, during overload, while warming caches. K8s removes the pod from the load balancer when not ready.
Healthcheck endpoints should be cheap (no database calls, no auth) and not in your application logs (they'll dominate volume).
07Never bake secrets into images
Anyone who pulls your image (registry leaks, CI compromise, image-layer inspection) can extract baked-in secrets. Even if you delete them in a later layer, they're in earlier layers.
Pass secrets at runtime:
- Environment variables (basic, often acceptable)
- Mounted secret files (Kubernetes Secrets, Docker Swarm secrets)
- Sidecar fetching from a vault (AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager)
08Use .dockerignore aggressively
Without a .dockerignore, every COPY . . includes .git/, node_modules/, .env, build artifacts, and IDE files. Two problems: bigger images and accidentally-leaked secrets.
.git
.gitignore
node_modules
npm-debug.log
.env*
.vscode
.idea
*.md
test/
coverage/
dist/
.DS_Store
Dockerfile*
09Order Dockerfile commands for cache efficiency
Docker caches each instruction. If a layer changes, every subsequent layer rebuilds. Put rarely-changing instructions early; frequently-changing ones (your source code) late.
COPY . .
RUN npm install
COPY package*.json ./
RUN npm ci --only=production
COPY . .
Dependencies change less often than source code. Copy and install them first; only re-run npm ci when package.json changes.
10Scan images for vulnerabilities
Run vulnerability scans in CI. Tools: Trivy (open source, fast), Snyk, Grype, Docker Scout. They check your image against CVE databases and flag known vulnerabilities in base images and dependencies.
trivy image --severity HIGH,CRITICAL --exit-code 1 my-image:latest
Fail the build on CRITICAL findings. Triage HIGH findings (some can't be fixed by you because they're in transitive deps; you decide what to accept).
11Run with a read-only filesystem
Most applications don't need to write to disk at runtime. Run with read-only root filesystem; mount /tmp as a tmpfs if needed.
securityContext:
readOnlyRootFilesystem: true
volumeMounts:
- name: tmp
mountPath: /tmp
volumes:
- name: tmp
emptyDir: {}
An attacker who gains code execution can't drop a binary, can't modify configs, can't persist anything. Major defense in depth.
12Set resource limits
Without limits, a memory leak takes down the host. With limits, the container dies and gets restarted; the host stays up.
Set both requests (what you're guaranteed) and limits (the cap):
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "1000m"
13Log to stdout/stderr — never files
Containerized applications write logs to stdout and stderr. The container runtime captures them and routes to your log aggregator.
Don't log to files inside the container — they fill the disk, they're invisible to log aggregators, and they disappear when the container dies. Even if your framework defaults to file logging, override it.
14Emit metrics and traces
Production containers expose metrics on a known endpoint (/metrics for Prometheus, OpenTelemetry exporters for traces). Without these, you're flying blind when something goes wrong.
Minimum metrics: request count, request duration, error rate, in-flight requests, queue depth (if applicable). Most frameworks have a one-line integration.
15Sign your images
Image signing (Cosign, Notary) lets you verify that the image deployed is the image you built. Without signing, anyone with registry write access can replace your image and you'd never know.
Sign in CI, verify on deploy:
# Sign at build time
cosign sign --key cosign.key my-registry/my-image:v1.0.0
# Verify before deploying
cosign verify --key cosign.pub my-registry/my-image:v1.0.0
∞The complete picture
Each item on this list is small. Together they're the difference between a container you'd hand to a customer and one you're embarrassed about in a security review.
Production-grade containers aren't fancier than dev ones — they're just less casual. Smaller, more locked down, more observable, more resilient. The discipline pays back the first time something goes wrong at 3am and your container actually behaves the way you'd hope.