HA with a Dash of DR

Many times I end up discussing HA and DR with customers and colleagues. Hence I decided that defining this and sharing my view on it is a good way to start this blog:

Today most IT folks mix up DR and HA, due to the possibilities the industry gave them with technologies like stretched clusters, synchronous storage mirroring etc.

 High Availability (HA) is about Local Availability, Downtime Avoidance and Disaster Avoidance. Disaster Recovery (DR) on the other hand is all about Site Availability and Recovery, for example on how and where to recover from a disaster that has wiped out the entire datacenter.

Downtime avoidance and disaster avoidance is mainly driven by Operation Level Agreements (OLA) and Service Level Agreements (SLA). Whereas disaster recovery is driven by Recovery Time Objective (RTO: how long does it take to bring the application back online) and Recovery Point Objective (RPO: how long is the time between two backups, so to say how much data am I allowed to loose)

But let’s have a deeper look into this: Enhancing local availability is used for improving the uptime of an application in failure or maintenance situations. This can be achieved with technologies like clustering, vMotion and Livemigration for local sites. So this is HA and typical HA is site local, because with HA I do not react on complete outages of a site.

For situation where your complete site fails, e.g. a plane crashing on your site or a complete power blackout, DR is used to protect your data against this. You cannot achieve this with HA and stretched clusters (a cluster across two sites), because when your data is gone on the primary site there is nothing left to migrate over to the other site. Backup strategies are also part of DR. Stretched clusters (storage, compute, etc.) do not protect you against configuration failures or accidentally deleted data, because the information is immediately synchronized to the remote site.

Stretched cluster (for example an ESXi Cluster across two sites) require one to have L2 extension between both sites. As a result you end up with one virtual Datacenter spread across two physical sites. This is more or less an Active/Active datacenter, with all its benefits and drawbacks (this is a definitely another discussion for a future blog entry).

As conclusion on this HA and DR discussion:

Use HA for improving local downtime & disaster avoidance and DR for cross site disaster recovery and backup/restore strategies.

Stop mixing both strategies, because it will never give you a real DR solution. As somebody once said “ No, DropBox is not a Backup”!


