Disaster Avoidance and Recovery Planning in a Cloudified World

by  \  5 Jan, 2011 \  5:51pm EDT

If you need to protect critical systems, then you need to have redundancy. However, duplication of each component, including the Mechanical, Electrical & Plumbing (MEP) of your data center can get tremendously expensive. Historically there haven’t been many good alternatives to provide redundancy. In some cases the redundancy is so complex it actually increases the likelihood of an outage. Some of the mainstream clustering options are good examples of complexity. There’s also redundant hardware at each tier, and other protections in the physical infrastructure and in personnel. Its unfortunate but true that many of us have implemented disaster solutions that we pray will never be truly tested because we don’t know for sure they’ll work.

“Disaster” as defined by Wiktionary

“An unexpected natural or man-made catastrophe of substantial extent causing significant physical damage or destruction, loss of life or sometimes permanent change to the natural environment; An unforeseen event causing great loss, upset or unpleasantness of whatever kind” en.wiktionary.org/wiki/disaster

When I discuss “Disaster Recovery Planning” I prefer the phrase “Disaster Avoidance & Recovery Planning” (DARP). I use DARP because I believe that a disaster is a problem affecting your application availability that is unmitigated. In other words, the problem occurs and you have no repeatable strategy in place to return your operations to normal in a set period of time. Disaster “Avoidance” in my definition refers to the ability to avoid an outage or provide a controlled and well understood ability to recover systems to normal operations.

What if you need high availability (HA) and a DARP strategy, but can’t afford expensive clustering solutions and physical redundancy in your building MEP?

Some enterprises will always need 5X9s and will continue to build Tier IV data centers and cluster all of their systems, but this is extremely expensive and not meant for the majority of businesses.

Uptime graphic.jpg

The “Sweet spot” of 99.95% system availability equates to roughly 4.5 hours of down time in the course of a year.  There are few systems in an enterprise that can justify a 4X plus increase in availability costs just to shrink that number to 5 minutes of down time or 99.999% uptime. Most of the CEO/CFOs that I’ve talked to about the above numbers all give the same answer, “I’ll take the 4.5 hours for 25% of the cost please”.  It’s generally a simple math equation of potential cost of an outage vs. cost to avoid it. The trick is in developing a realistic cost model for down time vs. the cost of enabling higher uptime.

Cloud can dramatically change the economics of DARP and HA

The good news is, today’s technology is making “avoidance” and “rapid recovery” just a part of doing business. Virtualization and cloud solutions now offer you options for protection and recovery that historically would have been cost prohibitive. These technologies also have the potential for changing how you view your data center facilities.

Example: High Cost Traditional Data Center Redundancy

  • 1 Tier III or Tier IV facility (Primary Facility)
    • Housing for all operational (in use) applications and hardware
  • 1 Tier I or Tier II facility or vendor supplied recovery center (Secondary Facility)
    • Warm or Cold site for performing recovery operations in the event of a disaster affecting the primary site
  • Issues in the above model
    • High cost of building and maintaining Tier III and above facilities. The cost for a Tier IV facility is roughly 4X the cost of a Tier I facility of similar capacity
    • The primary facility needs to be large enough to house all production environments and the associated test and dev equipment.
    • The secondary facility is being paid for, but is generally only utilized when a disaster strikes
    • Any recovery equipment positioned at the secondary facility is wasted, but still requires on-going support to ensure its current with the production environments
    • Total footprint for both facilities is roughly 1.5X what your total production requirements are

Example: Cloud or Highly Virtualized Data Centers

  • 2 Tier II facilities (both primary active DCs
    • Each facility has a combination of production and test/dev gear
    • Both facilities are build to a lower cost Tier level (Tier II vs. Tier III or IV
  • Opportunities in the Cloud DC model
    • Much more likely to be successful in the event of an actual need!
    • Teams are active in both locations
    • As part of a routine maintenance function you can be moving applications from one site to another on a regular basis (won’t be a “surprise” activity during a disaster)
    • Lower cost through pay as you go and or only buying what you’re using
    • Total capacity is 1X your actual need because the spaces are shared

The two examples outlined above aren’t comprehensive, but are accurate in the sense that not only can you provide a much better guarantee of service to your business, but you can do it for millions less every year.  Nor are these two examples meant to give a false impression that you can easily create two “real time” environments. Under emergency circumstances there will likely be down time for those applications that were active in the affected data center. However, with proper planning and design the outage window will fit into your acceptable recovery time strategy, thereby avoiding the distinction of being a disaster.

Also, don’t forget that two of the primary motivations for moving to cloud are to make your business more agile and to reduce IT costs, the above is just more icing on the cake.

Adopting virtualization & Cloud computing to its fullest

I’ve always enjoyed working with technology solutions that provide expanded options beyond the original purchase requirement and cloud computing is one of those solutions. You may have started your cloud computing journey in a variety of ways, but likely  the use of cloud for DARP and HA strategies represents an untapped source of value and cost savings for your enterprise that are well worth leveraging. The benefits of inexpensive cloud-based HA and DARP may open up new customer markets or product offerings, or even save your business one day.

Posted on:
Add a New Comment

You must be logged in to post a comment.