Let me start this blog post by asking you a question.
Q. When was the last time you looked at your redundancy?
Yes, this is a very vague question because there’s a lot of interpretation you can put into it. Even though it’s very valid to think about your personal redundancy (who can fill in for you in the workplace), I’m writing about the technical redundancy, in the broadest sense.
The Reason
Before I dig in, why this post? Well, last week a data center in the Netherlands caught fire. You can read more about it here (in Dutch, but your browser should be able to translate it). In short, a fire at the power station of this data centre took it all down. At the time of writing, there was no data loss nor personal harm. But all servers and with that, services, are down. And still are, 5 days later.
There are a number of clients named in the articles you can find online. I’m not going to name any of them here. But these clients reported that one or more of their services were down, which disrupted their daily operations.
Now, many of the clients named may not even have been aware they were using that data centre; they were using a service offered by a third-party provider. I genuinely do not know if those providers offer redundancy. But if they do, apparently, their clients did not use it.
What is redundancy
Quite simply, it’s the ability to continue working during an outage. In the past, many IT companies with all their servers in their own buildings had an emergency diesel power system outside. When the power went down, the system outside started working, and everyone could go on. Between power going down and coming back up, UPSs (Uninterruptible Power Supplies) would take over to keep the power on.
The same goes for your databases. At some point, you need to perform maintenance. But you do not want the data to go offline. In this case, redundancy can mean the ability to do this without interruption for your end users. With SQL Server, you can achieve this with a High Availability Disaster Recovery setup. This also helps when a server unexpectedly goes down: this setup will detect it, reroute all traffic to a working server, and end-users will hardly notice anything.
The thing is, it won’t help you when everything is in the same building, and that building is deprived of power.
Business continuity
You can also describe this as business continuity: how much of your company keeps running if or when a specific service goes offline. What is the impact when this happens, and what are your alternatives?
The other side of the coin is: how much are you willing to invest in this? A second location with separate power, different internet connections, etcetera will cost more. But it can save you in the event of a disaster.
As you know, the chances of a data centre going down are not zero. But it’s not a weekly occurrence either. This is something to take with you into the investment-versus-continuity equation. I can’t do that for you; I can only urge you to check if you did.
The hyperscalers
Now, when you start using the current global hyperscalers (Microsoft, AWS, Google), many of their services offer some form of redundancy. I’m using Microsoft in my example as I’m the most familiar with it, but it’s more or less the same with the others.
For instance, when you deploy databases or storage accounts, you can choose between local, zone, and geo redundancies. This means your service either runs in three different racks, in three different buildings or in two different locations (at least 500ish kilometres apart). You need to realise that more redundancy will cost more money.
The global hyperscalers have infrastructure across multiple regions, making it relatively easy for them to offer this redundancy. For more local companies, it’s much harder. The local redundancy usually works, but the zone will be much harder. Let alone global.
Now, your local data centre cannot compete with this scale. And they’re not supposed to. It’s not their use case.
Your third-party supplier
However, if you’re using an online service (regardless of which one), you may want to ask them how they handle redundancy. Especially, how they handle data centre outages. Because, as the article at the beginning of this post has shown, it’s a real possibility that the data centre goes down. And you need to be aware of how your service will suffer when that happens.
Redundancy usually comes at an extra cost. At that point, you need to do a calculation. What’s your value of business continuity? How much are you willing to pay extra to avoid a mention in a newspaper or online article?
My advice
Check with your suppliers whether they offer redundancy and, if so, how it works. It will cost money, but you can do the maths yourself to see if it’s worth it. For example, I’m using a local data centre to store my personal backups. Data I prefer not to lose if my laptop fails. But it’s not critical enough to require 24/7 access.
If you’re running on Azure, for instance, check your resource configuration; zone redundancy isn’t that expensive and will save you if one building goes offline.