The first question that popped into my mind when I became aware of the global IT chaos that started this Friday and the corresponding causes that were articulated was, “How do two independent objects cause each other to react so violently that the net effect generated from such interaction causes mass disruption?”
Patch failures are a known phenomenon. In fact, SecureIQLab published a thoughtful, forward-looking blog post to warn technology users about the impending technology disruption that it might bring due to patch failure. Furthermore, Microsoft boo-booing a patch distribution is not a phenomenon. Borrowing the reason from John Wheeler, it’s not a phenomenon because it has been observed in at least a few instances in the past.
CNN unjustly blamed Microsoft for this disruption. According to news sources, CrowdStrike is the culprit behind this, and a major Texas university at least independently corroborated this. I will take word from the university because the cafeteria was disrupted and had many disgruntled students. Under what circumstances does an independent entity not part of Microsoft cause a technology disruption due to a patch?
CrowdStrike is a global and NASDAQ listed cybersecurity company that produces security software and provides cybersecurity services. The software it produces is typically installed (typically because sometimes nothing needs to be installed for its product line) on top of the operating system. Since operating systems are not inherently secure, additional protection is required to keep bad guys at bay. Such additional protections are provided by cybersecurity companies such as CrowdStrike. These protections are achieved by interacting with the operating system—interaction in the form of change monitoring, hooking, detouring, pattern push, etc.
Furthermore, some cybersecurity products utilize libraries such as DLLs (dynamic link libraries) from the operating system to provide protection. It’s sad, but it’s true. They do that because best engineering practices warrant following documented as opposed to undocumented calls to achieve the desired functionality.
When I first heard the news, I speculated the following scenarios. Instead of testing and verifying by myself, I speculated because some software is hard to get, and the potential of a lawsuit is enormous. I will keep this reason being following the speculative route to this much information.
Scenario 1: It’s isolated to CrowdStrike and Windows Installation.
Scenario 2: It’s on Windows installation, irrespective of CrowdStrike installation.
Scenario 3: It’s actually a spooky action.
Scenario 4: It’s only on the modified installation of Windows and CrowdStrike together.
All of these scenarios are concerning. Let’s expound on these scenarios.
Scenario #1. This scenario pretty much exposes the information that wherever disruption happens, CrowdStrike and Windows are present. In other words, those who are heavily reliant on security by obscurity are no longer obscure. This is invaluable information to cyber bad actors wishing to attack CrowdStrike installations. They get two attack vectors: the operating system and third-party products such as those provided by CrowdStrike. Enterprises should go ahead and start assessing risk management as soon as possible. At least the good news is, it’s isolated to Windows and CrowdStrike, and other combinations or lack thereof are for the time being can breathe a sigh of relief. Responsible users should ask this question? What on earth was being done to the system files? What happened to the QA and engineering process? What is being done to ensure that critical updates are tested in bare metal along with every major and minor update?
Scenario #2. This scenario theorizes that Microsoft and CrowdStrike are linked together by a common thread controlled by CrowdStrike and that the common thread resides in a Microsoft operating system’s certain build. It means there are third-party shared resources and is, therefore, a threat that is presented by the supply chain. In other words, a critical operating system component is managed by a third party, and such integration is not adequately tested, or such workflow is not working.
Scenario#3. This scenario says that it’s spooky action. A spooky action akin to a solar flare that caused some electrons to jump from one location to another and affected a semiconductor in one-way shape or form. This gave way to the motherboard and the Windows kernel, thus affecting CrowdStrike and/or Microsoft because they were doing some really advanced stuff that only affected them, which is also plausible.
Scenario #4 The users have modified the operating system in such a way that two independent actions result in catastrophic consequences.
With all these disruptions and the potential complexities of restoring things to a working state again, we recommend the following.
Recommendation
1. Initiate risk assessment and management protocol because the bad guys are looking and are also testing. I would prioritize ransomware risk mitigation first.
2. Keep a good device restoration program for rainy days.
3. Test patches and pray like hell before they get deployed.
4. Ask the vendor what they are doing now, in a few days, and for the long-term in their engineering and QA process to apply the lessons learned. Just saying Waterfall, Agile, and DevOps are not going to cut it.
5. Ask the Vendor to publish a publicly available dashboard to impart information about updates, delivery, and installation. This should at least provide clear metrics in terms of success and failures. If this vendor publishes any more blog posts about unfound APTs before restoring lost trust due to the outages, ditch that vendor and get a new one.