TIL about container lifecycle hooks
in Kubernetes, and more specifically how preStop
hooks can be used to avoid downtime during deployments.
Somewhat simplified, when new pods are rolled out during a deployment Kubernetes tries to balance the number of available pods by shutting down and starting up one replica at a time. A gotcha in the process is that the component responsible for managin the pod lifecycle is independent of the component responsible for routing traffic to pods. In practice, this means that if a pod shuts down before the routing has been updated, you can get requests routed to pods that have already been killed. This is likely to result in timeouts, 503s, and the sort.
To alleviate this, one common approach is to
add a shutdown delay to your applications. Here, a signal handler is added to you application that catches SIGTERM
and delays the normal shutdown procedure by some time (e.g. 5 seconds). This enables the application to keep responding to new requests
until the ingress controller has had time to deregister the pod. After the configured delay, normal shutdown procedure is initiated,
rejecting new requests and completing in-flight requests before shutting down.
It turns out that in Kubernetes there's a simpler approach. In your Deployment
(or wherever you specify the container configuration)
it's possible to set a preStop
lifecycle hook that will run before the shutdown signal is sent to the pod. Here it's possible to
simply wait for some duration before continuing:
lifecycle:
preStop:
exec:
command: ["sleep", "5"]
If you're running Kubernetes v1.33 or greater, or have the PodLifecycleSleepAction
feature gate enabled, there's an equivalent hook handler
implementation that let's you avoid having to have the sleep
binary in your container image:
lifecycle:
preStop:
sleep:
seconds: 5
This is not a foolproof solution, but it's a common enough thing to do that Kubernetes decided to add a hook handler specifically for this. One problem
with it is that it cannot be known exactly how long the ingress controller will take to update its registrations. There might be a more complicated way
around this that waits until the pod has been deregistered by checking the Kubernetes API, but I haven't looked further into it. Another thing to be aware
of is that the runtime of the preStop
hook is considered as part of the pod shutdown grace period. This means that if you have a sleep 30
while your terminationGracePeriodSeconds
remains its default value of 30, your pod will have 0 seconds to perform its shutdown procedure and will
be forcefully killed by Kubernetes.