Video Blog: Cloud SLAs and the Specter of the Unknown
on May 19th, 2011 at 4:59 pm EDTService Level Agreements (SLAs) can make people nervous. SLAs typically provide some level of performance guarantee and therefore offer a type of insurance against unexpected events. However, unlike life insurance where actuary tables offer fairly predictable life expectancy behaviors when applied to large pools of policy holders, the world of cloud infrastructure doesn’t have a similar track record of experience to draw on. Also the various “risk factors” that could adversely affect a cloud SLA are not really well defined or understood yet, as the underlying technologies are all maturing rapidly.
What are some of the sources of uncertainty when it comes to guaranteeing the performance of a typical enterprise business application? There are many, but suffice to say that mitigating them effectively starts by understanding your application very well. How does it use the network (WAN, LAN, IPs, Load balancers), how sensitive is it to latency and what kind of storage does it use? Is it processor heavy, memory heavy, I/O heavy or all of the above? What applications/data sources is it linked to, and how are those sources managed? You’ll also want to ensure you have a realistic assessment of availability requirements and what if anything you can live without for a period of time. Then there are other important aspects like compliance, validation, and security. In short, your typical cloud service providers can’t possibly begin to know this much about the applications workloads they are hosting.
Even if you’ve done all of the above correctly, you still have to ensure your mapping of the application is compared and weighed against the specific design of the cloud you’ll be using. Because of the aforementioned concerns and complexities, SLAs will continue to be difficult if not impossible to nail down effectively for some time to come, as I described here in a brief interview at Interop 2011 in Las Vegas last week:
The fact is, most cloud providers don’t know what to capture in an SLA that would prove meaningful to each customer’s requirements. Because they don’t know what’s meaningful, they will tend to create “feel good” SLAs. Feel good SLAs will cover those things that are obvious. I don’t want to go into the details of what might be included, but the key facts are that the service providers can’t cover your loss from an outage. Imagine a service provider with only $100 million in revenue annually that has a $20 billion a year major retailer as a customer. If the service provider had an outage that lasted for more than a few hours, the rebate costs to the customer could potentially exceed their annual income if they were expected to cover “actual business impact”. The telecom industry has been writing SLAs for many years and they actually have laws protecting them against large service failure rebates.
I’m not saying ignore SLAs, as it is a contract and any good business will attempt to live by it the best they can. In fact, try to understand exactly what is in and out of that SLA in detail, so you can better design your cloud workload disaster recovery or high availability plans around it. And certainly don’t make the mistake of assuming you’re magically protected just because your service provider gave you one.

