Large Scale Migration Best Practices
Migrations are one of the most frequent, important and challenging projects that happen numerous times in a Software Engineering career. So far, I have done many migrations related to data, software or service and learned a lot of lessons which I will share in this article. Migration is a heavy cross functional project with multiple teams involved, requires a lot of effort from planning to execution with a lot of things to consider along the way.
Goal of migrations can vary, the top two that we deal with everyday are the performance vs cost challenge, but it can be more than that, a new technology might be needed that supports 'XYZ' features. Similarly, migration can also be of many types; it can range from data movement between two services, a backend redesign like mono to micro services or revamping api/sdk.
In general, when the existing system cannot handle the requirements, faces bottlenecks, or is not cost efficient then an alternative solution must be looked into, this step is where one should start thinking about the migration.
Similar to coding and design best practices, migrations itself is so important that it has its own set of best practices that should be considered when a migration project is on the roadmap, it contributes directly to the success of the migration.
Let’s dig into some best practices that I gathered from my readings and experience.
Benchmarking is very important in software engineering, especially when deciding between multiple alternative solutions, it helps in getting a deeper level idea, it can be related to performance or cost. Benchmarking is something that should be done by every team, it's a pre step that's done in a very early stage i.e proof of concept. Migration should be kept in mind when doing the initial research work which usually.
Involving a Product Manager
Product Manager is a very important role in general and in migrations we don’t usually see involvement of product managers. The benefits of having that role as an owner of Migration are tremendous. Just like any new product, migrations needs one, to keep the migration on track, to understand the customer needs, prioritize the important pieces and unblock engineers time by filling the communication gaps. They also add another perspective to the migrations. So try to onboard one!
Prioritization, Collaboration and Communication
Daily or Weekly prioritization is important, prioritization can only be helpful if you collaborate and communicate with the end users, for example, what's the most important service that would be a good use case to test. Prioritizing mini projects within migration the right way speeds up the process. Communication is another important factor, it helps in depreciating, onboarding, learning and getting feedback as early as possible [we discuss in detail later in the article]. This helps in filling up the missing gaps that could have easily been lost during migrations without proper engagement. Again, to reiterate the previous point, having a Product Manager to solve these issues is great but that does not mean Engineers should not be doing their work in prioritizing , collaborating and communicating. Everyone plays their role!
Roll Back Strategy
Things go wrong and surprises happen more than expected. It is always better to have a roll back strategy, popular version control systems allow you to revert changes quickly, however large scale changes that take time might be problematic especially if a downtime is required related to infrastructure. It is always a good idea to keep running the old systems for the time being, it might be costly in terms of dollars and resources used to maintain and develop both, but that's temporary until new one is running smoothly, so changes would just require a switch to go back in case of emergency.
Migrations might look straight forward from surface level, or you may say it does not require that much other than migrating from one thing to another. Automating everything helps however, this is usually ignored especially in immature companies where manual work takes precedence and when things go wrong panic surfaces, incident occurs, postmortem happens and resolution takes time. Investing in internal tools that can help in migrations reduces the problems that arise due to manual processes. These tools might just be temporary one off tools focused on migrations or additions to existing tools for the long run. Automate, Automate and Automate!
Documentation and Learning Workshops
Documentation is king! However, it's hard to maintain, just like every software needs documentation, the migration process also needs one. Migration related documents are typically used initially as they cover topics like how to migrate to the new API. These self-serve documents unlock a lot of potential and speed up the onboarding process. Another important aspect is learning workshops, dedicated workshops just like a tutorial class given to a wide audience, recorded as well that can be made part of documentation. These always help in the long run. Engineers must make documentation part of their routine work so they have the right resource when needed. Big companies usually have a technical writer for these tasks but Engineers still play an important role in connecting the dots.
Tracking Progress Through Metrics
Migrations progress can be calculated in many ways, it depends what's the goal of migration as discussed earlier, e.g. if migration was done to improve the performance of the system, then just like benchmarking, successful migrations are the one that keeps giving good numbers; from technical point metrics like resource utilization, uptime/downtime, latency, etc. and comparing these factors between the old vs new service helps, Similarly, users onboarded, active daily users, data migrated, services depreciated, etc. plays are some important metrics. Building a dashboard helps in communicating this across the board.
How to make everyone start using the new service so we can kill the old one? That's an underrated challenge, deprecating systems and services is very hard in general. The number of users is one of the many factors that make it difficult, imagine 1000 of engineers using your SDK and now you want them to update their code to make this change. Ideally, an incremental process helps, but still companies usually have strongly coupled services that become the bottleneck. In order to deal with challenges, documentation and communication from the start helps and most importantly depreciating the services incrementally encourages users to migrate incrementally as well, it's also a good practice to add warning and deprecation deadlines as part of the process so everyone is aware of.
In general, Engineering culture of the company plays an important part on how to deal with challenges like Migrations. A good and healthy engineering culture can unblock a lot of things by default and reduce some pain from such projects. If you closely notice, all of the above practices are complementary to each other.
This topic came into my mind when I was reading the newsletter about how big tech companies perform migrations by The Pragmatic Engineer.