This article explains how I stopped a software vendor and at least one of its customers from hemorraging $50K+ yearly due to unnecessary costs.
Repair means applying security patches as soon as available. Rotate means rotating credentials often. Those two are useful and, at this stage of our industry, necessary for good security hygiene.
The third R, “Repave,” meaning “bring back the software image to a known state” is popular but, it is my argument today, useless—in a truly cloud-native world, it is not needed.
That’s it! Save money, use immutable containers, don’t repave.
End of story.
(Repair, on the other hand, remains absolutely relevant: container images do contain outdated software, especially the base image—libraries, run-time systems, etc. Folk who use containers should regularly rebuild their container images to include the latest security patches. But that is a story for another time.)
I could be writing an article about the technicalities of why repaving is not needed.
I could be explaining how repaving was invented for a world where software was installed anew upon every new deployment by mutating an OS image using an installation program, how the system image remained mutable afterwards, and therefore how a risk remained for hackers to install malware on top of it.
I could be explaining how FreeBSD Jails (2000), Solaris Zones (2004), and Docker (2013) have been specifically designed and implemented to support immutable filesystem mounts, with all executable code and configuration “baked in” so that no amount of bugs, mistakes or hackers can ever change the software within the container and make it diverge from a known-good state.
I could also glaringly point out that software deployed today in clouds comes most often as immutable containers, and thus that this form of cloud-native software, once deployed, can never change—by construction.
I could then conclude that requests from enterprise-y customers for procedures and/or processes to “repave” a container-based software deployment are, at best, meaningless; and in truth, just a way to throw good money after bad.
The Enterprise mindset is to continue to do something long after it is not needed any more. News at 11! How is that even an interesting story to write about?
The real story here is how a software vendor (who shall remain nameless) got roped into spending significant effort over about a year, and thus extremely significant $$$$, to entertain a “repaving” story for container-based deployments, without anybody realizing what was going on.
This story is not even very long, but it is instructive.
The story begins as follows: a customer says “I need a repaving procedure.” Vendor assumes that the customer is installing on bare metal or in a VM. Vendor obliges and start a year-long initiative to ensure that the software behaves properly when reinstalled from scratch. This was a complex project technically because the software had to remain online while the repaving took place.
Then someone comes in and listens to a report about that customer’s deployments and learns that the customer is really using Kubernetes and deploys their software as immutable containers.
It turns out that:
- customer was saying “I need repaving” to one person, and did not mention their use of containers to that person. This first person did not ask about containers either.
- Meanwhile, the customer was saying “I use containers” to another person. The second person was not involved in the repaving story.
And so two folk at the vendor did not connect the dots, and the organization spent a minimum of $50k (I estimate $70k-$100k) in combined hours from 4 departments to solve a problem that did not need solving. Classic miscommunication.
(Also, did I mention that this vendor recommends containers as their primary deployment mechanism, and their own business strategy revolves around containers? That the repaving story even got traction internally, given that business focus, astounds me.)
To me personally, what is painful to admit is that I was listening on both sides of the story for that entire year and it took me awfully too long to connect the dots.
I would really like to be able to defend myself and say that the customer was a large organization, and thus that I was assuming that there were really two different groups of users, one using containers and one using bare metal / VMs and repaving. But that was just that—an assumption, and it was my job to check that assumption, and I did not.
What is the lesson here?
At a very minimum, when a customer says “I need something” a vendor should probably ask “why”.
Then, if the reason given is security-related, the vendor should probably ask a security expert to verify any assumptions.
Then, assuming a security expert is involved in the discussion, the expert would ask about, then take into account, the entire customer’s deployment practices to evaluate the problem to solve, in a holistic manner.
This way, lateral knowledge about the customer’s practices would surface, and could be used to point out that the “something” being asked is redundant.
And, of course, the obvious: security “best practices” should not be implemented without understanding the threat model.
Repaving is not needed with immutable containers.
So what do you think? Did I miss something? Is any part unclear? Leave your comments below.