Improving RTO

March 22, 2009

Since we built our first D/R infrastructure and plan, a sense of continuos improvement grabbed me, and still does. Particularly, RTO improvement is something that I’m very interested in, since near-to-perfect (read 0) RPO we reached thanks to that tech-jewel that EMC Recoverpoint is. Having to deal with open systems, my D/R scenarios fall into two main areas: physical and virtual environments. Needless to say, the virtual environment (read vmware ๐Ÿ™‚ ) is a piece of cake for D/R scenarios, even more now that we acquired and are setting up Site Recovery Manager (more on that in a future post). But what about the physical servers? Well we managed to standardise our scenarios using imaging tools, e.g. ‘ghosting’ the source OS, replicating the staging area – through recoverpoint -, and restoring at the D/R side. And here is the thing that I like less: the time required for each physical restore is far beyond the virtual environmento RTO… And this drives me mad as hell!
So, I’m trying to find more exotic alternatives that hopefully will allow me to shorten the RTO for these boxes, and the overall RTO of course.
I really liked the idea (at least for the linux boxes) of giving DRBD a try… And while it is a great tool (particularly since it allows for active/active writes and 3-way sync), it could cover only the linux side… leaving my windoze boxes in the dust.
I tried, really, to find some generic and cross platform tool that could provide continuous snapshots/imaging/replica… But it seems that nothing fits my requirements so far. So, this is what I’m going to do:
First, review some of my physical clusters, and transform them into virtual farms that rely on ‘lighter’ storage (read NAS, vs SAN).
Second, use the reclaimed SAN space as boot luns for the remaining physical nodes, in order to create a sassy boot-from-san environment.
Doing some quick math on this, it could bring my overall RTO from about 7 hours down to roughly 1 (and I’m talking of roughly 180 servers,15 of which physical).
Further improvement could come in future versions of vmware P2V features (vsphere 4.0), and by automating sw reconfiguration at the D/R side.
So far, so good ๐Ÿ™‚


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: