Maintaining system resilience in a technological world: What the CrowdStrike outage can teach us
Posted on September 4, 2024 by Rob May
In July, a single faulty update from CrowdStrike disrupted businesses around the world. From grounded flights to halted surgeries, the incident was a stark reminder of how dependent we’ve become on interconnected digital systems. But more importantly, it raised the question: How prepared is your business to withstand such an unexpected failure?
What was the CrowdStrike outage?
The CrowdStrike outage was caused by a faulty update that was rolled out overnight on the 18th July (GMT) and 19th July for the US and Australia, and caused severe disruption across the globe.
CrowdStrike said it was an update of their Falcon Sensor that had an undetected error, causing Windows systems that had CrowdStrike installed rendered useless, showing the aptly named ‘blue screen of death’. For CrowdStrike these updates were normally routine and happened every Friday. However, the testing on these updates is less stringent, which meant the faulty issue slipped through the net.
On the 6th August, CrowdStrike released their executive summary of the issue, which stated that:
“On July 19, 2024, a Rapid Response Content update was delivered to certain Windows hosts, evolving the new capability first released in February 2024. The sensor expected 20 input fields, while the update provided 21 input fields. In this instance, the mismatch resulted in an out-of-bounds memory read, causing a system crash. Our analysis, together with a third-party review, confirmed this bug is not exploitable by a threat actor.”
What is CrowdStrike and the Falcon Sensor?
CrowdStrike is a cybersecurity company that provides cloud-delivered solutions to protect endpoints, cloud workloads, and identities. They offer a suite of cybersecurity software products, used by dozens of industries, including airlines, hospitals, banks, and retailers, to prevent hacks and data breaches.
Their Falcon Sensor is one of their flagship software products. It protects systems from cyber-attacks by monitoring computers for signs of malicious activity and helping to lock down threats. Their website lists the Falcon platform starting at $99.99 per device, per year, which means that companies that are using it may only use it on their critical devices that would see the biggest impact from a cybersecurity outage.
What was the impact of the CrowdStrike outage on Windows devices?
Across the world major industries felt the impact of the outage. In the UK, Germany and Israel, important hospital appointments were cancelled last minute, and some services declared a critical incident. In Alaska the 911 lines went down, with New Hampshire and Ohia also reporting similar issues. Payment systems for some shops in the UK were affected, with isolated incidents reported at Waitrose and Morrisons. Sky News and CBBC were off air in the morning before restarting broadcasting in the afternoon.
What can IT firms learn from the outage?
As an IT support provider, we felt the impact of the CrowdStrike outage acutely. Our team was immediately thrust into action, responding to numerous client inquiries and troubleshooting affected systems. The incident disrupted our operations and those of our clients, reinforcing the critical role we play in maintaining system stability and security. It was a stark reminder of how dependent modern businesses are on the seamless functioning of third-party software and the essential need for rapid, effective response strategies when things go wrong.
It also reinforced the importance of system resilience and our collective reliance on third party software. While these incidents can’t be foreseen, it’s important that we have contingency plans and risk assessments in place so when incidents like the outage do occur, we are able to react appropriately.
Rob May, Founder and Executive Chairman of ramsac, said “This event serves as a much needed reminder of the critical importance of exhaustive testing and contingency planning in software development. It also highlights the necessity for companies to have robust incident response plans in place, ensuring that when such unforeseen events occur, they can be swiftly and effectively managed.”
“In the broader context, I think the incident calls into question the reliability of the interconnected systems upon which modern society relies. As we continue to advance technologically, the complexity of these systems will only increase, along with the potential for similar disruptive events. It is incumbent upon us in the tech industry to continuously refine our practices, ensuring that the digital foundation of our world remains as stable and secure as possible.”
How can we prevent this level of incident in the future?
There is no stopping companies from putting out updates or a 100% guaranteed way to prevent cyber-attacks, but there are many things that businesses and IT employees can do to help mitigate the potential for an outage of this scale and financial impact.
Staggered roll outs of new features and updates
While CrowdStrike has now enabled staggered roll outs for Falcon, any company can adopt this process themselves. With rule-based systems, you can install updates in carefully planned stages to ensure that mission critical computers get updates that won’t cause severe outage.
Your IT support team will be able to help set this up and can provide insight into what the right roll out should look like.
Supply chain risk management
With something like CrowdStrike’s Falcon software that has full access to mission critical computers, it’s important to have a risk management and assessment plan in place for any suppliers. In a world where we are so heavily reliant on multiple partners and softwares, it’s key that we all understand what the risks of installing or using a supplier, and the effects of it going down or corrupting could be on the productivity of your company.
As Rob May summarised “The CrowdStrike update debacle is more than just a cautionary tale; it is a loud and clear call for heightened vigilance and resilience in the face of our growing digital dependence. I just hope that the lessons learned from this incident will be valuable and used in fortifying the systems that underpin our daily lives.”
Are you looking to increase your system resilience?
The CrowdStrike incident was a wake-up call. Don’t wait for a crisis to discover the gaps in your system resilience. Here at ramsac, our IT support and cybersecurity teams ensure you maintain system uptime and keep your business working and productive. Contact us today to safeguard your systems and secure your future. Get in touch.