Docker – Ralph's TechBlog

What the heck? The latest upgrade procedure of my Kubernetes cluster gave me headaches. Not only because it failed with a timeout – mainly because the root cause was not obvious. In fact, the maintainers of Kubernetes made an infrastructure change long time ago but forgot to properly communicate to their users.

But before we start the rant, let’s check what happened – I tried to upgrade from v1.18.2 to v1.18.14. This happened:

timed out waiting for the condition
couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced
k8s.io/kubernetes/cmd/kubeadm/app/phases/upgrade.rollbackOldManifests

So I started to re-run the upgrade with verbosity on. Nothing more information. What I saw was that the kube-apiserver won’t come up – no log file gave a reason why this could happen. I asked Google – very little information but one hint – the image pull could have been failing.

Another search revealed that Kubernetes maintainers changed their repository from gcr.io/google_containers to k8s.gcr.io – presumingly long time ago. And checking my cluster more thoroughly I found out that the old repository was being used. But why was my cluster not knowing the new one? I was upgrading each major version – since the beginning.

Next search for the information on how to change it – nothing on Kubernetes docs (WTF!) but in some change request. You need to see the kubeadm-config ConfigMap in your kube-system namespace. There you’ll find the repository address. Changing this to the correct name finally did the trick and the upgrade succeeded.

But the more I think about this challenge, the more angry I get.

How can such an essential change not be communicated more prominently – especially since the old repository was abandoned with v1.18.6 – the last image version in the old repo. Every upgrade document sind 1.18 must have a warning that the old repo is out of order now and a link to the change procedure
Why is the error message not telling anything useful? The stacktrace is useless for the information about what happened.
And why – for God’s sake – does the upgrade procedure itself not check for this essential change? Especially since v1.18.7.

This way of maintaining software is a very unprofessional one. Kubernetes is the foundation of so many productive systems now that this essential change must be taken more seriously by the maintainers. Breaking the procedures is a danger to all these systems and a proper communication or risk mitigation is not in place.

I need to stress out that upgrading Kubernetes is always risky. I experienced so many issues in the past that blocked an upgrade. Most of them were better documented and so I could resolve them. But this infrastructure change is a sign of unprofessional risk management. And I hope they will do much better next time.

I recently started to use composer-packaged applications within Docker images. But I noticed that the Docker image did not change when I rebuilt after I updated one of the module dependencies. The reason is the caching mechanism of docker builds. A very obvious solution would be the --no-cache option when building. But this is a bad solution as it would cause the complete build to make use of the cache.

Another solution I found was using ARGs that will disable the cache for a command. I personally felt this not very helpful as it would involve some awkward tweak of the build process (as I understand). So I discovered a very simple solution: the ADD command can skip the caching when using a web URL as the content to download. The command would download the content from the address and compare it to the previous cache content. We would just need a URL that returns a different content each time it is called. And luckily there is one.

So the elegant solution is to use this combination of lines in your Dockerfile and the composer modules are updated accordingly each time:

# The RUN must use cached layers, this is the hack
ADD "https://www.random.org/cgi-bin/randbyte?nbytes=10&format=h" skipcache
RUN composer install \
    && chown -R www-data:www-data /var/www/html