Structuring your migration activities

Introduction

On DCM projects of a reasonable size some careful thought is always required as to how to structure the actual migrations. On smaller projects you may just move all the in-scope servers over a weekend and you are done. You are much less likely to be able to do this on a larger project. This article looks at some commonly used structuring techniques and concepts for DCM projects.

Waves

Many DCM projects are structured in “Waves”. A wave may consist of servers that have a close business relationship or it may consist of servers of a particular type. For example you may decide that all test and development servers will form a wave. Waves allow the DCM management team to break down the whole migration into more manageable pieces. Using Waves may also help with risk reduction, depending upon how you structure them. An obvious example being that of doing test and development systems first thereby reducing  business impact and at the same time allowing the DCM team to build familiarity with the various applications and the overall as-is landscape. This familiarity, and troubleshooting experience, could help reduce the risk when the production counterparts of the test and development servers are moved in a subsequent wave. Sometimes this can only go so far as the test and development systems may not be exact replicas of production. For example test may be a single server whereas production may be some form of high availability cluster. Some organisations do have exact, or near exact replicas for Pre-production testing and migrating these environments in advance of production may provide useful insight and help with risk reduction. A DCM project will typically have multiple waves. Sometimes the DCM team may come to the conclusion that they have bitten off more than they can chew in forming a wave and decide to split it into smaller waves. This happens more frequently than you might think. It is not necessarily an indication of bad planning. It is important to realise that detailed knowledge of the migration landscape grows over time and sometimes as a result of bitter experience. Another thing to point out is that the progression of migration waves may not be sequential in nature. If you have the available resources you may decide run two or more waves in an overlapped fashion. The objective of doing this is usually to shorten the elapsed time of the overall DCM program. A wave will typically consists of a non-trivial number of servers. It is hard to put numbers on “non-trivial”. One reason for this is that migration complexity is not always tied to the sheer number of servers being migrated. Sometimes it is governed by other factors such as business criticality and technical complexity of the servers being migrated. However, for the sake of argument let’s say 30+ server images is a reasonable starter number to qualify as non-trivial.

Server Affinity groups

A wave may be subdivided by grouping servers by notional affinity. These Affinity Groups represent special business or technical relationships. The primary use for affinity groups within a DCM from my standpoint is identifying which servers can or need to be rolled back as a group. Lets say that during your live migration you hit a problem that cannot be resolved in the available time. You may consider rolling back a number of servers rather than abandoning the entire live migration. Its a bit like an atomic transaction in database terms. The affinity group, if well defined, tells you which servers you should roll back. In principle defining this could be very straight forward, for example we just roll back all the servers that are part of application “X”, the application that is manifesting the problem we can’t solve within the time window. However, sometimes it can be more complex. Consider the case where application “X” uses a database on an SQL Server Cluster that also hosts databases for many other applications. Is the SQL Server part of the affinity group or not?  I am afraid I don’t have any magic bullets for you in this space. You need to carry out detailed analysis and discuss trade-offs with the business.

Event based or drip feed?

How should you manage you migration activities, or more specifically your cutover activities? Once again most professional services organisations will often lead with an “Event Based” approach. What this means is that the actual migration cutover activities are focused into a defined period of time. Usually this “Event” will be at some agreed time out of normal business hours and will have a specified duration such as from midnight Friday to 01:00 AM Monday. There will typically be a lot of organisation and planning surrounding an “Event”. In the usual scheme of things the “Event” will encompass all the servers within a wave. Remember the “Event” is a box around the cutover activities rather than all migration activities. Sometimes all migration activities will take place within the bounds of the event but usually other activities are going on outside of the event itself. For example you may have been performing some form of storage replication for days or even weeks prior to the event. In this case only the activities of shutting down applications, closing databases and then stopping the storage replication to the target environment fall within the scope of the event.

 

The alternative to this approach is what I call “drip feed”. This is where server images are migrated in small numbers, possibly during the normal working day. This approach which by its nature is more loosely defined will not have the management overhead of an “Event”. In the extreme case we may migrate servers one by one. You may say that each of these migrations could still be considered as an “Event” and you have a point but in practice the management and planning overhead will be pared down to the minimum in a “drip feed” based approach.

In my experience the “drip feed” approach is only suitable for low risk servers, such as inactive test and development servers. Unless your entire migration consists of such servers you will probably end up adopting a mixed approach where “Event” based migration is used for business critical services. Having said that several cloud migrations I have been involved with start by moving test and development into the cloud. If that fits your profile that your first migration could quite possibly be run as a “drip feed” operation in its entirety

Conclusion

For most DCMs you will almost certainly not be able to avoid having a largely Wave/Event based program with all the management overhead involved in the approach. Executed correctly the Wave/Event based approach lowers the risk of migrations and thereby provides a higher quality solution. However, as ever, quality costs both time and money.