Detect Unhealthy Containers the Smart Way 🐳🩺

Running a container doesn’t mean your app is running fine. It might look like everything’s green from the outside… but inside? Your app could be frozen, stuck, or completely dead 🧊💀

Welcome to the world of Docker HEALTHCHECK — a super underrated feature that can make or break your reliability game. Today we’ll dive into:

✅ Why HEALTHCHECK is essential
⚠️ Real risks of skipping it
⚙️ How Docker HEALTHCHECK Works
🛠️ How to add it to your Dockerfiles
👀 Two practical test cases (healthy vs unhealthy)

❗ Why You Should Care

Let’s be honest. We often celebrate when our container is “up and running” — but that just means the process inside hasn’t crashed. It doesn’t tell us if:

The web server is responding 🕸️
The database is reachable 📉
Your app logic is frozen in a loop 🔁

Without a healthcheck, Docker assumes everything is okay. That’s dangerous in production, but also in dev: it gives you a false sense of security.

Healthchecks add real visibility — if your app doesn’t behave as expected, Docker will mark it as unhealthy, and tools like Docker Swarm or Kubernetes can act accordingly (restarts, scaling, etc.).

⚙️ How Docker `HEALTHCHECK` Works — Under the Hood

When you add a HEALTHCHECK instruction in your Dockerfile, you’re telling the Docker engine to periodically run a command inside the container to determine its health status. Here’s how it works step by step:

🧱 1. The `HEALTHCHECK` Instruction

Example:

HEALTHCHECK --interval=10s --timeout=3s --retries=3 \
  CMD curl -f http://localhost:5000/health || exit 1

You’re defining:

Option	Meaning
`CMD`	The actual command to run inside the container. It must exit with `0` for healthy, non-zero for unhealthy.
`--interval`	How often to run the health check (default: 30s).
`--timeout`	How long to wait before the command is considered failed (default: 30s).
`--retries`	Number of consecutive failures before the container is marked `unhealthy` (default: 3).

🧠 2. Docker Monitors Using a Background Healthcheck Manager

When you start a container that has a HEALTHCHECK, Docker spawns a lightweight internal timer per container. This timer schedules and executes the CMD at the interval you define.

It’s all handled by the Docker daemon, which adds a health state entry to the container’s metadata.

🧪 3. Exit Codes Determine Health

Docker executes the healthcheck command inside the container, and uses its exit code to decide the result:

Exit Code	Meaning
`0`	Healthy ✅
`1`	Unhealthy ❌
`>1`	Unhealthy ❌

CMD not found or fails to run? Still counts as unhealthy.

Docker tracks the consecutive failures, and once the retry limit is reached, the container is marked as unhealthy.

🔄 4. Status Stored in Container Metadata

You can view this with:

docker inspect --format='{{json .State.Health}}' [container_name] | jq

It shows:

Status: starting, healthy, or unhealthy
FailingStreak: how many times it failed consecutively
Log: recent healthcheck attempts with timestamps and outputs

Docker updates this metadata in real-time, and you can consume it via:

CLI (docker ps, docker inspect)
Docker Remote API (/containers/id/json)
Orchestration tools (like Swarm or Kubernetes)

🪄 5. No Magic, Just Smart Logic

Docker doesn’t inject anything magical into your container. It simply:

Executes the given command using the container’s existing binaries (like curl, wget, etc.)
Waits for the result
Updates internal health state

But this tiny mechanism becomes powerful when combined with:

Restart policies (--restart=on-failure)
Health-based load balancers (Swarm, K8s, Traefik)
Alerting systems (via Docker events or logs)

💡 A Note About “Starting”

After the container boots, healthchecks begin after a default grace period of 0s (can be configured). During this period, the container status shows as:

"Status": "starting"

Once the first successful check is done, status becomes healthy. If it fails N times, it becomes unhealthy.

🚫 What Healthchecks DON’T Do

❌ They do not stop or restart containers by themselves
❌ They don’t directly affect container networking or DNS
❌ They don’t send alerts unless you wire them to an external system

🔧 Adding a HEALTHCHECK to Your Dockerfile

It’s simple! Here’s the syntax:

HEALTHCHECK --interval=10s --timeout=3s --retries=3 CMD curl -f http://localhost:5000/health || exit 1

This checks every 10 seconds if the /health endpoint returns a success. If it fails 3 times in a row, the container becomes unhealthy.

🧪 Let’s Test It in Action

We’ll create two test containers:

✅ Healthy App

This one includes a proper /health endpoint that always returns 200 OK.

Dockerfile:

FROM python:3.11-slim
ENV DEBIAN_FRONTEND=noninteractive
WORKDIR /app
COPY app.py .
# Install curl
RUN apt-get update && \
    apt-get install -y curl && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*
RUN pip install flask
EXPOSE 5000
HEALTHCHECK --interval=10s CMD curl -f http://127.0.0.1:5000/health || exit 1
CMD ["python", "app.py"]

app.py:

from flask import Flask
app = Flask(__name__)

@app.route('/')
def home():
    return "All good!"

@app.route('/health')
def health():
    return "OK", 200

app.run(host="0.0.0.0", port=5000)

👉 Build and run:

docker build -t healthy-app .
docker run -d --name healthtest healthy-app
docker inspect --format='{{.State.Health.Status}}' healthtest

$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
55e9b6f148a9 healthcheck_test "python app.py" 18 seconds ago Up 18 seconds (healthy) 0.0.0.0:5000->5000/tcp, [::]:5000->5000/tcp healthtest

🎉 You’ll get: healthy

❌ Unhealthy App

Now let’s break the /health endpoint.

Modified app.py:

@app.route('/health')
def health():
    return "Error", 500

Build and run again:

docker build -t unhealthy-app .
docker run -d --name broken unhealthy-app
docker inspect --format='{{.State.Health.Status}}' broken

💥 Result: unhealthy

You’ll also see the logs showing failed healthcheck attempts:

$ docker inspect broken | jq '.[].State.Health.Log'
[
  {
    "Start": "2025-06-05T19:26:07.323262742+02:00",
    "End": "2025-06-05T19:26:07.367595028+02:00",
    "ExitCode": 1,
    "Output": "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\r  0     5    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\ncurl: (22) The requested URL returned error: 500\n"
  },
  {
    "Start": "2025-06-05T19:26:17.369661511+02:00",
    "End": "2025-06-05T19:26:17.408770486+02:00",
    "ExitCode": 1,
    "Output": "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\r  0     5    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\ncurl: (22) The requested URL returned error: 500\n"
  },
  {
    "Start": "2025-06-05T19:26:27.409488914+02:00",
    "End": "2025-06-05T19:26:27.450101106+02:00",
    "ExitCode": 1,
    "Output": "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\r  0     5    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\ncurl: (22) The requested URL returned error: 500\n"
  },
  {
    "Start": "2025-06-05T19:26:37.450803223+02:00",
    "End": "2025-06-05T19:26:37.492805511+02:00",
    "ExitCode": 1,
    "Output": "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\r  0     5    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\ncurl: (22) The requested URL returned error: 500\n"
  }
]

👁️ What’s the Impact?

Scenario	Behavior
No HEALTHCHECK	Docker marks container as healthy by default
HEALTHCHECK passes	Container state = `healthy` ✅
HEALTHCHECK fails	Container state = `unhealthy` 🚨

Why it matters:

Your orchestration tools (Swarm, Kubernetes, etc.) rely on this signal
You can detect failing containers early during development
It helps your CI/CD pipeline make smart decisions

🧠 Final Thoughts

A HEALTHCHECK is like a pulse check for your app ❤️‍🩹
Just because a container runs doesn’t mean your service is okay.

Whether you’re in local development or scaling in production, a tiny HEALTHCHECK line in your Dockerfile can save you hours of debugging and nights of firefighting.

So go ahead — make your containers honest.

Docker HEALTHCHECK is:

A built-in mechanism that runs periodic commands inside containers
Based entirely on the exit status of your script or command
Tracked by the Docker daemon, with results exposed via CLI & API
Powerful when combined with orchestration, restarts, and alerts

📚 Bonus tip: Want to auto-restart unhealthy containers?
Add this when running your container:

docker run --restart=on-failure ...

🩺 Docker HEALTHCHECK: Is Your App Really Alive Inside the Container?

❗ Why You Should Care

⚙️ How Docker `HEALTHCHECK` Works — Under the Hood

🧱 1. The `HEALTHCHECK` Instruction

🧠 2. Docker Monitors Using a Background Healthcheck Manager

🧪 3. Exit Codes Determine Health

🔄 4. Status Stored in Container Metadata

🪄 5. No Magic, Just Smart Logic

💡 A Note About “Starting”

🚫 What Healthchecks DON’T Do

🔧 Adding a HEALTHCHECK to Your Dockerfile

🧪 Let’s Test It in Action

✅ Healthy App

❌ Unhealthy App

👁️ What’s the Impact?

🧠 Final Thoughts

Leave a Comment Cancel Reply

❗ Why You Should Care

⚙️ How Docker HEALTHCHECK Works — Under the Hood

🧱 1. The HEALTHCHECK Instruction

🧠 2. Docker Monitors Using a Background Healthcheck Manager

🧪 3. Exit Codes Determine Health

🔄 4. Status Stored in Container Metadata

🪄 5. No Magic, Just Smart Logic

💡 A Note About “Starting”

🚫 What Healthchecks DON’T Do

🔧 Adding a HEALTHCHECK to Your Dockerfile

🧪 Let’s Test It in Action

✅ Healthy App

❌ Unhealthy App

👁️ What’s the Impact?

🧠 Final Thoughts

Leave a Comment Cancel Reply

⚙️ How Docker `HEALTHCHECK` Works — Under the Hood

🧱 1. The `HEALTHCHECK` Instruction