The danger of single points of IT failure

ramsac single point of failure blog
A single point of failure put simply is a part of a system, which if it fails it will stop the entire system from working. This can be extremely damaging to an organisation, as the revenue loss from downtime for any business is significant, so single points of failure should be identified and where possible eliminated. When it comes to IT to an extent your system is only as strong as your weakest link, however not all links are created equal. There may be some single points of failure that are acceptable as they are quick to resolve, or prohibitively expensive to avoid. Organisations often need to apply the 80/20 rule, you can spend 20% to address 80% of the single points of failure (like having servers with redundant power supplies, RAID disks etc, multiple hosts with shared storage) but the last 20% will cost 80% of the budget to resolve (redundant connectivity coming into the building via different routes, generator backup etc). At ramsac we usually aim to deal with the major single points of failure that are relatively low cost to address, then look at other business continuity options to deal with the rest. So rather than installing generator backup power, most organisations will have systems replicated to another site, or in the cloud which can be used in the event of a power failure. That addresses far more potential issues, and costs a fraction of installing and maintaining a generator system.
Don’t overlook the obvious
Sometimes the single point of failure is right in front of your nose and for that reason it may be overlooked. A few years ago there was an organisation that had generator backup for power, which they diligently tested on a regular basis and it worked perfectly, until they actually had a full power failure and they couldn’t start the generator as the starter motor was powered from the mains supply. Another example of overlooking the obvious was in an organisation where the servers were highly resilient, but they kept having failures in part of the network once a week on a regular basis. In the end it turned out that a cleaner was unplugging a switch to plug in their vacuum, taking a big part of the network down each time. Just shows that you need to think about the system as a whole, not just focus on the big servers in the middle.
Examples of single points of failure
  • People – For some organisations it is not the hardware or software that provides the single point of failure but a person. Often you might have one or two people who are responsible for several systems, these systems usually require specialist knowledge to operate them, resolve problems and recover in the event of a problem. If the person responsible for this is on holiday, off sick or leaves the company it may leave a knowledge gap which could be devastating to a business if something goes wrong. Understanding which employees are potential single points of failure and putting processes in place to ensure there is documentation and training that shares their knowledge can help organisations overcome this issue.
  • Hardware – Hardware is probably the most obvious single point of failure and usually the most critical. If server goes down and you don’t have any backup systems it can bring everything in your organisation to a standstill. Or if a router fails and users lose internet connectivity it can be a major problem if you work in the cloud. Having another router available that can be used for redundancy, even if it can only be used for a few critical tasks can keep a business operating in the short term.
  • Services/Providers – If one of your suppliers has a problem or outage at their end, it can directly impact your organisation and become a single point of failure. Especially if they house your offsite data or provide your internet or voice services. By having a back-up plan in place to deal with issues outside of your control you can prevent issues from negatively impacting your organisation
Your single points of failure
When determining the single points of failure in your organisation, it is important to ask yourself
  • What happens if this system fails?
  • What happens if any service dependency I have fails?
  • What happens if this person is off ill
Spotting and removing single points of failure
The best place to start is to carry out an audit, by reviewing all elements of your IT Infrastructure; including, your ISP, email provider, software, other external IT services, servers, storage devices, Laptops, computers, telecoms systems and people. (Basically, anything that is connected to your network.) It is important to be thorough and include types of equipment their age, support contracts that you have in place, also within this document show the links between your IT infrastructure, what is the knock-on effect on other parts of the system if you lose internet connectivity or a server crashes. List out your single points of failure, create a matrix to show ease and cost of fixing vs the effect of the single point failing, to help you prioritise which you can tolerate or address using business continuity and which must be fixed. As Benjamin Franklin said by failing to prepare, you are preparing to fail. At ramsac we can help you identify your single points of failure as part of the free IT health check we offer to organisations to make sure your technology and information assets are working for, not against, you. Learn more in our free guide and contact us for more information.

Ensure your IT is at its strongest.

Take back control with ramsac’s free self-assessment tool that helps to strengthen your business’ IT systems.

Related Posts

  • What is Shadow IT? – Exploring the risks and opportunities

    What is Shadow IT? – Exploring the risks and opportunities

    ITTechnical Blog

    This blog explores the risks of unauthorised IT use, from security vulnerabilities to compliance breaches, while also highlighting how organisations can leverage it to uncover unmet needs, drive innovation, [...]

    Read article

  • 6 steps to designing an Identity Access Management strategy

    6 steps to designing an Identity Access Management strategy

    IT

    An IAM strategy is a powerful mechanism for controlling and monitoring access to your company’s IT network and assets, ensuring robust protection against cyber threats. [...]

    Read article

  • Getting your IT project approved: The benefits of monthly payments 

    Getting your IT project approved: The benefits of monthly payments 

    IT

    Monthly payment plans can make project approval easier and more financially sound, along with some tips for overcoming common internal objections. [...]

    Read article

  • VPNs vs ZTNA: A Comprehensive Guide to Network Security

    VPNs vs ZTNA: A Comprehensive Guide to Network Security

    ITTechnical Blog

    Understanding the key differences between Virtual Private Networks (VPNs) and Zero Trust Network Access (ZTNA) is crucial for ensuring robust network security in an increasingly remote and cloud-based world. [...]

    Read article

  • Understanding the PSTN switch-off: what it means for you

    Understanding the PSTN switch-off: what it means for you

    IT

    The old Public Switched Telephone Network (PSTN) is shutting down at the end of this year, we explain the impact this could have on organisations. [...]

    Read article

  • What does sustainability in IT look like?

    What does sustainability in IT look like?

    IT

    Sustainability isn’t something you can do once and never look at again. IT is an area that is constantly evolving and our approach to sustainability needs to adapt to [...]

    Read article

Quiz yourself

Are you more cyber savvy than an 11 year old?

11-14 year olds get asked these questions in school. Could you get these right?