So your team can migrate faster
Cloud migration. It’s a term that comes up in most enterprise conversations at least once. While the term represents the practice of moving from on-premises infrastructure to cloud infrastructure, what is meant by “cloud migration” has evolved.
Cloud migration is no longer as simple as moving from on-prem servers to AWS EC2. It could include moving to managed databases or API gateways, or maybe you need AWS for some workloads and Azure for others. Perhaps you’re a financial or public sector organization, and you need a private cloud. Or maybe you need to meet special regulatory requirements.
In this article, we’re going to look at three best practices for making cloud migration easier for your enterprise:
- Refine your culture.
- Engage in intelligent change.
- Observe and monitor.
By applying these guiding principles to your organization’s cloud migration effort, you’ll avoid turning the cloud migration effort into an aimless and inefficient slog. Instead, you’ll have the organizational awareness and the tools in place to tackle the job with confidence.
Before we dive right into our best practices, however, let’s first touch on the different approaches typically used during cloud migration.
Though there are guidelines for cloud migration, there is no one-size-fits-all approach. Today’s cloud environment is complicated by the diversity of options, industry nuances, and business needs. Traditionally, businesses needed to take an approach from one of the following five R’s:
- Rehost: This is your traditional lift-and-shift approach. For example, you have applications running on-prem in VMs, and you redeploy your application to VMs running in a cloud.
- Refactor: This is similar to rehost, except that you have a midpoint step to make tweaks (lift, tweak, and shift).
- Revise: This is an amalgamation of rehost and refactor. The revise approach often includes significant application changes to leverage the capabilities of the destination cloud.
- Rebuild: This takes your effort one step further than the revise approach by, in some cases, fully rebuilding applications to leverage the destination cloud’s capabilities.
- Replace: With this approach, an organization cordons off and removes a piece of functionality that was being maintained in-house, replacing it with a third-party option.
Now, let’s take these and put a modern cloud spin on them:
- Rehost: You have applications running on-prem in VMs that you redeploy to VMs running in multiple clouds.
- Refactor: Your midpoint step makes tweaks for each destination cloud.
- Revise: You include significant application changes to leverage the capabilities of each destination cloud.
- Rebuild: You fully rebuild applications to leverage multiple destination cloud capabilities.
- Replace: You replace pieces of functionality with potentially multiple third-party options.
Talking about multiple clouds, specialized use cases or industry regulations quickly compound the complexity of cloud migration. Even amid this complexity, organizations can follow three key best practices to smooth out and simplify the cloud migration effort.
Perhaps one of the most difficult yet rewarding best practices is cultural refinement. You probably have a loose idea of who’s in charge of what pieces of your technical estate, and that’s a good start. However, before attempting a cloud migration, much more awareness is necessary.
One good technique is to use a RACI matrix for a given component or domain. This will clearly show who is responsible, accountable, consulted, and informed for all of the changes that will be happening during the migration. The cloud moves fast, and your team needs to move faster. Knowing who to go to for each piece is key.
Another piece of the cultural refinement is not just identifying key metrics, but also putting those metrics in writing! Many are hesitant to take this approach because it often shines a light on operational inefficiencies. Remember: If it’s not measured, it can’t be improved.
This practice should be adopted at various levels. From the application-team perspective, metrics around network and storage latency may be important. For the management level and higher, however, composite service level objectives (SLOs) that show a high-level team metric may be more important.
Be clear and concise on what comprises an SLO and, if possible, standardize the generation of the SLOs among as many teams as possible. While you might have service level agreements (SLAs) to enforce contractual guarantees, SLOs help your entire organization understand how application performance and reliability impact your customers and the overall business.
Remember, SLOs enable SLAs, and SLAs enable your customers.
Carrying forward the idea of codifying metrics, it’s essential to lower the bar for observability and monitoring. If a team member needs to copy and paste a PromQL query in order to answer a business-impacting question, you can consider this an opportunity for improvement.
This can be an expansive and complex topic, but ultimately it comes down to answering business questions as quickly and accurately as possible. Most often, this is achieved by coupling your data store(s) with a flexible visualization system that’s open and available for all levels.
Change management often conjures up the image of a stifling board of attendees whose sole jobs are to poke holes in progress and to say no. Luckily, that’s not what’s meant by “intelligent change.”
Intelligent change is an approach to cloud migration that uses technical gating instead of process gating. In other words, protection should be enforced through automated processes like end-to-end testing, continuous integration, and provability through distributed tracing.
Pieces that can’t be covered by technical gating (or would require a non-trivial amount of work) should be moved lower on the migration list.
The process of creating the technical gating for smaller or easier workloads often paves the way for more complex pieces to follow the same processes. Work through each piece iteratively until a sufficient level of correctness and functionality is achieved, then rinse and repeat.
Observation and monitoring are critical to validating the success of your cloud migration effort. Before the effort begins, then, it’s important to make systems and applications observable if they’re not already. Observability is not the same thing as monitoring. Monitoring is, in a sense, observing the observability.
As an example, you can monitor if a database is up or down based on whether or not you can make a connection to it, but a database is observable when you can see utilization metrics, query times, and active connection counts.
Second, make systems and applications monitorable if they’re not already. This is how you make informed decisions based on observability. Examples of questions to ask yourself include:
- Can I send alerts based on monitoring data?
- Can my infrastructure self-heal based on monitoring?
- How long does it take me to answer X about a system/application?
In essence, monitorability is the nexus of decision-making during a cloud migration. Without it, success is difficult to achieve if at all.
Lastly, it’s paramount to have a single pane of glass through which you can trace back each change, deployment, and ideally, the lack of impact on the environment. The ability to zero in on likely culprits of issues during your cloud migration is critical.
In reality, there are many best practices, probably enough to write several in-depth books about how to migrate to the cloud. These three practices give you the “who,” the “what,” and the “why” of how to migrate successfully. Now the difficult part is, what tooling should you use?
There’s always the roll-your-own option, but if you’re staring down a cloud migration (especially in a large tech estate), going with a homegrown option will make things more difficult.
For a cloud migration, we would almost always recommend choosing an established vendor, of which there are many. One particularly good option is Lightstep. Lightstep is building SRE tools for the cloud-native enterprise including observability, monitoring, and incident response. One of the key use cases for their platform is cloud migrations!