zet

LOG20240918073011: Wednesday, September 18, 2024, 7:30:11AM EDT

Discovered yesterday that our K8SAPP specification has some substantial flaws stemming from the build, deploy, and updeploy scripts that were added to the original spec that I had. The original spec did not call for any specific scripts, only for documenting particular procedures for this work flows. This openness allowed a description of the steps to take to upgrade any K8SAPP depending on the app itself instead of leaving this stuff out. Now that I’m tasked with upgrading our Harbor application and turning it into a K8SAPP this missing part leaves a hole that does not account for very critical procedures such as upgrading from one version to another. The Harbor upgrade procedure is well documented, but there is literally nothing in our K8SAPP specification that accounts for an upgrade procedure. Sure it can be done, but nothing in the spec codifies those steps or captures what was done, etc.

Harbor makes me realize that certain Kubernetes applications can never be made easier than just following the steps documented by the application itself. I could have finished the upgrade in a few days had I just followed the steps, this whole thing about turning it into a K8SAPP just doesn’t work.

The problem is complicated further by the original decision not to use the PostgreSQL that the Bitnami chart embeds. We have a Harbor installation that does not match any described by the upstream source making upgrades damn near impossible without a lot of work. This is the “second day” problem of Kubernetes that a lot of tech writers and architects have lamented. Fucking up a Harbor upgrade would cost us literally millions of dollars, several thousand a day and for every hour our core applications are down while we work out what PostgreSQL data migration that we missed made it all incompatible with our existing database.

“But you can just test it all in advance.” Actually, we cannot, not using the production data in our current PostgreSQL database. To do that we’d have to mirror that data in full to a separate instance of PostgreSQL in dev and attempt the migration there.

The main point is that every divergence from the upstream product distribution puts us as significant risk of creating scenarios where we are no longer able to go to the community or product owner and ask for guidance when something doesn’t work. Just maintaining our own PostgreSQL is one specific example of this.