Posted on October 29, 2015 by Keith Groom
And with worker productivity and thousands of dollars on the line for every second of lost connectivity, this worst-case scenario is a costly one – unless the proper steps are taken upfront.
This is one topic top of mind with organizations, especially if you are running on Office 365. So what can they do to prevent and mitigate the risk of downtimes?
Microsoft’s 99.9 percent uptime guarantee doesn’t tell the whole story.
This impressive number often leads to the mistaken understanding that outages simply don’t happen, or are so rare they aren’t something to worry about. Which is patently false. As we’ve seen in recent examples, large swaths of Microsoft cloud-run tenants can go offline at any moment. In fact, according to one estimate, in Q2 of 2015 alone there were 4,550,000 days worth of downtime, across 100 million users, for apps like Exchange, Active Directory and Skype.
(It’s worth mentioning the fact that while this situation is significant, it’s a reality not unique to Microsoft’s platform — all cloud providers, from Amazon, Google have similar downtime occurrences. Anyone looking for a 100% uptime guarantee is in for a let down.)
Microsoft’s SLA remains accurate in large part due to the sheer size of the O365 user base, making even those seemingly large outage numbers negligible next to the bigger picture. But to bring this stat closer to home, consider there are 8760 hours in a year. With the uptime guarantee, that still leaves almost 9 hours of downtime potential.
With all that said, there are a number of things enterprises must do to prepare for a Microsoft cloud outage.
1. Create a contingency plan
Sounds obvious, as most good advice does. But we are constantly surprised at how few of our clients have done the proper thinking upfront, only to find themselves struggling blindly when an issue does occur. The plan you create will inform everything from how your team reacts, the technologies you put in place to remedy things, and the expectations you set with everyone from your peers, to your end-users, and to the boss pacing back and forth demanding why the heck Exchange is still down. Your plan should include the basics, such as: who do you call when there is an issue, what is the protocol, what is the SLA you can expect, and what happens if that SLA isn’t met.
2. Pick your tools to monitor, prevent and mitigate
Now it’s your job to investigate and determine the right tools and technologies to either prevent an issue or mitigate its damages. There are a number of proven strategies which we recommend.
Chief among them is to build a hybrid cloud environment, one that relies mostly on the Microsoft data center. But in case of an outage, mission critical data – like the last two weeks of Exchange mailboxes – are easily and quickly accessible through an on premise, or redundant server somewhere else. There are even solutions such as Mimecast which can prevent email downtime altogether, with its continuity services.
On top of all this, there is plenty you can do to prevent downtimes, or if all else fails, get an advanced alert that an issue is imminent. Office365Mon, for example, can alert your team when something like a configuration change will cause you problems, give you a chance to fix it before you are affected.
3. Leveraging Managed Services
We also recommend our customers consider an Office 365 managed service provider, such as the one we offer with our Keystone team. In many cases, your partner can do lots to remedy your issues with a much faster turn-around, and with far less stress. For one, you are often given instant, 24-7 access to real, live and on the phone support teams. So you can put a name and a voice to the person dealing with your issue right away. Plus, managed service providers such as Softchoice offer you a secret weapon unavailable to you on your own: clout. Even if you pay for Premiere support services, companies like Softchoice have built up longtime relationships with Microsoft and have access to faster escalation processes and detailed insight about the problem. Crucially, this edge-up often allows your service to pro-actively fix your issue right away, by implementing their own configurations on your tenant, instead of waiting around for the Microsoft fix.