Book A Demo Today

How to Identify Single Points of Failure and Ensure Continuity

Published on August 27, 2021


Single points of failure can bring your whole system to a standstill, or at best - bring it to "limp mode". If you have ever worked on a car, you know what this means. One faulty sensor can bring the whole machine to a halt in seconds.

And while one faulty sensor might seem like a minuscule problem compared to a whole functioning engine, it's that sensor that tells the engine when to fire the pistons. Small things often control much larger things and have an effect on the whole organism - and your business continuity.

Single Points of Failure Identification Steps

The best way to win your fight with single points of failure is to identify them before they become a problem - there is a checklist and a number of steps you can undertake in order to perform a SPOF audit in your company.

Make a list of your IT, tools and communication systems

In order to assess what single point might be giving you trouble, you need a road map. List any technical components connected to your network, or in your system. Things like:

  • Storage devices/backup systems for your email and cloud
  • Local/remote servers
  • ISP
  • Network infrastructure

When making this list, you may want to test out your components and see if they're all working properly. You might find weak points and learn lessons before a SPOF occurs.

If you don't have a "map" like this and disaster strikes, this should be the first step to perform after the failure happens.

When working on a real failure, use this map to test which of these components don't have redundancy - go through them one by one. This is usually enough to identify the source of the SPOF.

The Meridian BCMS business impact analysis process will allow you to identify the critical IT systems/applications your critical activities rely on, as well as capturing any available workarounds and identifying any single points of failure in these areas.

Contingency for your ISP - service provider failure

If your single points of failure are identified as something out of your control - but are the fault of your internet provider, things can get tricky.

This is why a lot of businesses that depend on having access to the internet 24/7, invest in a secondary internet provider. This is to ensure business continuity no matter what happens on your provider's end. When your internet goes out, chances are that your competitor's internet will be functioning just fine. They will have access to their website, data, and email services when you won't. Things that will become affected are:

  • VoIP phone systems
  • CRM programs
  • Cloud services and tools
  • Email
  • Other communications
  • Shipping and tracking

Another benefit of introducing a redundant internet provider as an option is that it can help you with fluctuating bandwidth during busier times and it can improve your customer experience. It's not just for emergencies but can help with the everyday functioning of your business.

Keep human-caused SPOFs in mind

Human error is behind a lot of disasters in human history, and SPOFs are only a tiny fraction of this long list of disasters.

Most commonly, a simple error is behind most failures and is caused either by an honest mistake or ignorance. Less commonly, it might be caused on purpose - while none of us like to think of this contingency, it's better to be aware of it when working to solve a problem like a single point of failure.

Avoiding Single Points of Failure

Sometimes it's hard to prevent something so unexpected - other times it's not so unexpected after all. While you might not know exactly when they'll strike, the best defense is an attack.

In this case, an "attack" is:

  • Preparing a Single Point of Failure management plan
  • Making a list of your components
  • Having a backup for every one of those components
  • Knowing who's in charge in case of emergency

The Meridian BCMS application will allow you to identify any single points of failure you may have, give you the ability to raise risks against them and put you in a better position to manage/plan for them.

Redundancy and SPOF management- pros and cons

Although redundancy might sound like something unnecessary and bad altogether, redundancy in this case means duplicating software and hardware by either continuous backups or having backup hardware available. This way, if a part of your system becomes unavailable, crashes, or becomes corrupted, you can easily replace it without loss of time and continuity.

This simply means having a "spare". Using two cloud backups instead of one - or having your data in more than one data center. It might seem "redundant" but will save you a lot of pain if something does happen.

What about the costs of redundancy?

They are often much less than operating in emergency mode and having your system go down - sometimes the loss of backup can cause your business to stop functioning for more than just a few hours. Although, "just a few hours" can be very damaging to both your bottom line and to your reputation in today's marketplace.

If it happens to a business whose main selling point is trust or brand authority, it's hard to recover and go on after a long blackout. These are huge costs, and when compared to the rather small costs of having redundancy in place the benefits are staggering.

Ensuring Business Continuity - Conclusion

We have listed some of the most common examples of single points of failure and how to prevent them through being vigil and planning for the worst. Business continuity is when your business functions go on uninterrupted even while an emergency happens.

In conclusion, let's break it down: in order to manage your emergency plan and check for potential points of failure in your business, it's best to divide all the functions by systems and by steps. This way, you can plan for the failure of each one of them in turn, but what's more important - use the list to identify these failures as they happen.

Overall, you should follow these three steps:

  1. Identification
  2. Preparedness and planning for continuity
  3. Fixing the failure while your system stays operational

Finally, make sure you dust off your disaster management planning and update it - run tests and check if it's up to date as often as you can. There is nothing worse than coming face to face with an emergency and finding out that your carefully crafted plan is outdated and can't be used!