Cloud migration is one of the most consequential infrastructure decisions an enterprise can make. When executed well, it unlocks elasticity, reduces operational burden, accelerates delivery velocity, and positions organizations to leverage managed services that would be prohibitively expensive to build in-house. When executed poorly, it produces cost overruns, security vulnerabilities, performance regressions, and months of painful remediation work that erodes confidence in the cloud strategy entirely.
The difference between success and failure is rarely about the technology. Cloud providers have mature tooling and well-documented migration paths. The failures we see consistently at Cozcore's cloud migration practice stem from inadequate planning, incomplete discovery, underestimated dependencies, and a lack of structured methodology. Teams jump from "we need to move to the cloud" to provisioning EC2 instances without the intermediate steps that determine whether the migration will succeed.
This guide provides a comprehensive, phase-by-phase checklist for enterprise cloud migration. It is drawn from dozens of migrations we have executed across industries including fintech, healthcare, SaaS, and e-commerce. Whether you are migrating 10 workloads or 500, the methodology is the same. The scale changes; the discipline does not.
Why Cloud Migration Fails (and How to Prevent It)
Before walking through the migration checklist, it is worth understanding the patterns that lead to failure. Research from Gartner and McKinsey consistently identifies the same root causes in failed or stalled cloud migrations. Recognizing these patterns upfront allows you to build safeguards into your plan.
The Most Common Failure Patterns
Incomplete discovery. Teams underestimate the number of applications, their interdependencies, and the data volumes involved. A server that "nobody uses" turns out to run a cron job that feeds data to three downstream systems. An application that was supposed to be stateless stores session data on local disk. These surprises surface during migration execution, when they are most expensive to address.
Lift-and-shift everything. Rehosting (lift and shift) is the fastest migration strategy, but applying it indiscriminately leads to running cloud-hostile architectures on cloud infrastructure. Applications that were designed for fixed-capacity, vertically-scaled servers do not automatically benefit from cloud elasticity. Without re-architecting, you pay cloud prices for on-premises architecture, which is almost always more expensive than either staying on-premises or refactoring properly.
Ignoring the operating model. Migration is not just a technology change. It changes how teams provision infrastructure, deploy code, manage security, and respond to incidents. Organizations that migrate infrastructure without adapting their operational processes end up with cloud resources managed through manual portal clicks and support tickets, negating most of the agility benefits of cloud adoption.
Underestimating data migration. Moving terabytes or petabytes of data takes time, bandwidth, and careful orchestration. Teams that treat data migration as an afterthought discover that their 50TB database transfer will take weeks over their existing network connection, blowing past their migration window.
Phase 1: Assessment and Discovery
The assessment phase is where you build a complete picture of what you are migrating, why, and what constraints you are operating under. Rushing through this phase or skipping it entirely is the single most common cause of migration failure. Plan to spend 4 to 8 weeks on assessment for a mid-size enterprise migration.
Infrastructure Inventory Audit
Begin with a complete inventory of your existing infrastructure. This includes servers, databases, storage systems, network devices, load balancers, DNS configurations, certificates, and any third-party appliances. For each asset, capture:
- Hardware specifications: CPU cores, memory, storage capacity and type (SSD, HDD, NAS, SAN), network bandwidth.
- Software inventory: Operating system and version, installed applications and versions, middleware, runtime versions, and license types.
- Utilization data: Average and peak CPU utilization, memory usage, disk I/O, and network throughput over at least 30 days. This data is critical for right-sizing cloud instances.
- Network configuration: IP addresses, DNS entries, firewall rules, VPN connections, load balancer configurations, and SSL certificates.
Automated discovery tools dramatically accelerate this process. AWS Application Discovery Service, Azure Migrate, Google Cloud Migration Center, and third-party tools like Cloudamize and Device42 can scan your network, identify assets, capture utilization data, and map dependencies automatically. Manual inventory is error-prone and slow; automated discovery is a prerequisite for any migration involving more than a handful of servers.
Application Dependency Mapping
Individual server inventories are necessary but not sufficient. You need to understand how applications communicate with each other, which databases they depend on, which external services they call, and which shared resources they consume. A dependency map reveals migration groups: sets of applications that must be migrated together because they are tightly coupled.
Network traffic analysis tools can capture communication patterns between servers over a period of weeks, revealing dependencies that no architecture diagram or tribal knowledge fully captures. AWS Application Discovery Service agents, for example, track TCP connections between servers and identify network dependencies automatically. Complement automated discovery with stakeholder interviews for each major application to identify dependencies that may not be visible in network traffic, such as batch file transfers, shared database access, and authentication dependencies.
Current Cost Baseline
Establishing a clear cost baseline for your current infrastructure is essential for building a migration business case and for measuring post-migration cost performance. Include all direct costs: hardware depreciation or lease payments, data center hosting or colocation fees, power and cooling, network bandwidth, software licenses, and the operational staff time required to maintain the environment. Many organizations underestimate their true on-premises cost by 30 to 50 percent because they exclude facilities, power, and operations staff from the calculation.
Assessment Phase Checklist
- Complete infrastructure inventory with automated discovery tooling
- Map application dependencies and identify migration groups
- Capture 30+ days of utilization data for right-sizing
- Document all network configurations, firewall rules, and DNS entries
- Identify compliance and regulatory requirements for each workload
- Catalog all software licenses and determine cloud portability
- Calculate current total cost of ownership (TCO) baseline
- Identify applications that are candidates for retirement
- Interview application owners for each major workload
- Document data volumes, growth rates, and retention requirements
Phase 2: Strategy Selection
With a complete inventory and dependency map in hand, the next phase is assigning a migration strategy to each workload. The industry-standard framework uses the "6 Rs" taxonomy originally popularized by Gartner and refined by AWS. Each workload should be evaluated against all six strategies, and the optimal strategy is selected based on business value, technical complexity, and the organization's cloud maturity.
The 6 Rs of Cloud Migration
Rehost (Lift and Shift). Move the application to the cloud with minimal changes. The operating system, application code, and configuration remain essentially the same, running on cloud VMs instead of physical servers. This is the fastest strategy and carries the lowest technical risk, but it also delivers the fewest cloud-native benefits. Rehosting is appropriate for applications that need to leave the data center quickly (due to lease expiration, for example) and will be optimized later, or for stable applications that do not warrant the investment of refactoring.
Replatform (Lift and Reshape). Make targeted optimizations during migration without changing the core architecture. Common replatforming moves include migrating a self-managed MySQL database to Amazon RDS or Azure Database for MySQL, replacing a self-managed message queue with Amazon SQS or Azure Service Bus, or containerizing an application without re-architecting it. Replatforming delivers meaningful operational benefits (reduced management overhead, improved availability) with moderate migration effort.
Refactor (Re-Architect). Redesign the application to be cloud-native, typically adopting microservices, containerization, serverless computing, and managed services. This delivers the greatest long-term benefits in scalability, resilience, and operational efficiency, but requires the highest investment in time and engineering effort. Refactoring is justified for applications that are strategically important, will need to scale significantly, or are fundamentally limited by their current monolithic architecture.
Repurchase (Replace with SaaS). Replace the existing application with a commercial SaaS product. Common examples include moving from a self-hosted CRM to Salesforce, from self-hosted email to Microsoft 365 or Google Workspace, or from a custom HR system to Workday. Repurchase eliminates operational overhead entirely but requires data migration and user retraining.
Retire (Decommission). Identify and turn off applications that are no longer needed. Every enterprise has servers running applications that nobody uses, development environments that were never decommissioned, or redundant systems from past acquisitions. A thorough discovery phase typically identifies 10 to 20 percent of the application portfolio as retirement candidates. Every retired application is one fewer thing to migrate, manage, and pay for.
Retain (Keep On-Premises). Some workloads should not be migrated, at least not yet. Applications with hard regulatory constraints that prohibit cloud hosting, workloads with extreme latency requirements that demand physical proximity to on-premises systems, or applications deep in their life cycle that will be decommissioned within a year are all candidates for retention. Retain is not failure; it is a deliberate, informed decision.
Decision Framework for Strategy Assignment
For each application, evaluate the following criteria to assign the appropriate migration strategy:
| Criteria | Rehost | Replatform | Refactor |
|---|---|---|---|
| Business criticality | Low to medium | Medium | High |
| Expected lifespan | 1-3 years | 3-5 years | 5+ years |
| Scalability needs | Stable, predictable | Moderate growth | Significant growth expected |
| Migration timeline pressure | High (fast exit needed) | Moderate | Low (can invest time) |
| Team cloud maturity | Low | Moderate | High |
| Migration effort | Weeks | Weeks to months | Months to quarters |
Phase 3: Planning and Architecture
With strategies assigned to each workload, the planning phase translates those strategies into an actionable migration plan. This phase establishes the cloud foundation (landing zone), defines the migration wave plan, and builds the runbooks that will guide execution.
Cloud Landing Zone Design
A landing zone is the foundational cloud environment that all migrated workloads will inhabit. It is not a single account or subscription; it is a multi-account architecture with standardized networking, security, identity, and governance controls. Building a well-designed landing zone before migrating any workloads prevents the accumulation of technical debt that is painful and expensive to remediate after the fact.
Key components of a landing zone include:
- Account structure: Separate AWS accounts, Azure subscriptions, or GCP projects for production, staging, development, shared services, security, and logging. AWS Control Tower, Azure Landing Zones, and Google Cloud Foundation Toolkit provide pre-built landing zone templates.
- Networking: Hub-and-spoke or mesh VPC/VNet architecture with defined CIDR ranges, subnet segmentation (public, private, data), VPN or Direct Connect/ExpressRoute/Cloud Interconnect connectivity to on-premises, and DNS resolution between cloud and on-premises environments.
- Identity and access management: Federated identity using your existing identity provider (Active Directory, Okta, Azure AD) with single sign-on to cloud consoles and APIs. Role-based access control with least-privilege policies. Service accounts for application workloads with scoped permissions.
- Security baselines: Encryption at rest and in transit enabled by default. Security group and network ACL templates. Centralized logging to a dedicated security account. GuardDuty, Security Center, or Security Command Center enabled for threat detection.
- Governance: Service control policies (AWS), management group policies (Azure), or organization policies (GCP) that enforce guardrails across all accounts. Cost allocation tags required on all resources. Budget alerts configured at the account and organization level.
Migration Wave Planning
A wave plan organizes workloads into ordered groups (waves) based on dependencies, risk, and business impact. The first wave should include low-risk, low-dependency workloads that serve as a proving ground for your migration tooling and processes. Subsequent waves increase in complexity and business criticality as the team gains experience and confidence.
A typical wave structure for an enterprise migration:
- Wave 0 (Foundation): Landing zone, connectivity, shared services (Active Directory, DNS, monitoring).
- Wave 1 (Pilot): 2-3 non-critical applications selected to validate the migration process end-to-end. These should be representative of your application portfolio but carry low business risk if migration issues arise.
- Wave 2-N (Production): Remaining applications grouped by dependency clusters, migrated in order of increasing criticality. Each wave typically contains 5-15 applications depending on team capacity and application complexity.
- Final Wave: The most critical, complex, or sensitive applications that benefit from all the lessons learned in previous waves.
Security and Compliance Planning
Security planning for cloud migration must address the shared responsibility model, where the cloud provider secures the infrastructure and you secure your workloads, data, and access. Map your existing security controls to cloud-native equivalents. Identify gaps where cloud introduces new security considerations (public API endpoints, IAM misconfigurations, storage bucket exposure) and define controls for each.
For regulated industries, engage your compliance team early. Verify that your target cloud regions and services meet regulatory requirements. Many compliance frameworks (HIPAA, PCI DSS, SOC 2) have cloud-specific guidance and pre-certified cloud service configurations. Building compliance into your landing zone and migration process from the start is dramatically cheaper than retrofitting it after migration.
Phase 4: Migration Execution
Execution is where the plan meets reality. The key to successful execution is a structured, repeatable process that is refined through pilot migrations before being applied to production workloads at scale. Our DevOps engineering team follows a consistent execution methodology across all client migrations.
Pilot Migration
The pilot migration validates your entire migration pipeline end-to-end with a small number of low-risk workloads. The goal is not just to move the workloads, but to test every aspect of the migration process: tooling, networking, DNS cutover, monitoring, rollback procedures, and the communication workflow between teams. Document every issue encountered during the pilot and update your runbooks before proceeding to production waves.
A successful pilot migration should confirm:
- Migration tooling works correctly for your workload types (VMs, databases, storage)
- Network connectivity between cloud and on-premises is functioning (VPN, DNS resolution, firewall rules)
- Applications start and operate correctly in the cloud environment
- Monitoring and alerting are capturing metrics from cloud workloads
- Rollback procedures work and can be executed within the defined window
- The team can execute the migration runbook without significant improvisation
Batch Migration Execution
After a successful pilot, proceed with production waves. For each wave, follow this execution sequence:
- Pre-migration validation: Verify cloud infrastructure is provisioned according to the architecture plan. Confirm network connectivity, IAM roles, and security groups are configured. Run application smoke tests in the target environment with test data.
- Data synchronization: Start continuous data replication from source databases to cloud databases using AWS DMS, Azure Database Migration Service, or equivalent tooling. Allow replication to reach steady state before proceeding to cutover.
- Application deployment: Deploy application code to cloud compute resources. For rehost migrations, this may involve server image conversion (P2V) using AWS Server Migration Service or Azure Migrate. For replatform or refactor migrations, deploy from your CI/CD pipeline.
- Pre-cutover testing: Run comprehensive integration tests against the cloud deployment while the on-premises environment still handles production traffic. Verify functionality, performance, and data integrity.
- Cutover: Switch production traffic from on-premises to cloud. This typically involves DNS changes, load balancer target updates, or application connection string changes. Execute during a low-traffic window to minimize risk. Monitor closely for the first 24 to 48 hours.
- Post-cutover validation: Verify all functionality in production. Monitor error rates, latency, and business metrics. Keep the on-premises environment running as a rollback target for 1 to 2 weeks.
For guidance on implementing zero-downtime cutover strategies, see our detailed guide on zero-downtime deployment strategies.
Cutover Strategies
The cutover is the highest-risk moment in the migration. Three common cutover approaches exist:
- Big bang cutover: All traffic switches to the cloud at a single point in time. Simplest to coordinate but highest risk. Best for applications with natural low-traffic windows (weekends, holidays) and where the team has high confidence from testing.
- Phased cutover: Traffic is gradually shifted from on-premises to cloud over a period of hours or days. Weighted DNS routing or load balancer configuration enables percentage-based traffic splitting. This provides a safety valve where traffic can be shifted back if issues emerge.
- Parallel running: Both environments handle production traffic simultaneously for a period, with results compared for consistency. This is the safest approach for data-critical applications but requires the most complex synchronization logic.
Phase 5: Optimization and Governance
Migration is not complete when traffic is running in the cloud. The optimization phase is where you capture the cost savings, performance improvements, and operational efficiencies that justified the migration in the first place. Without deliberate optimization, cloud costs typically exceed on-premises costs by 20 to 40 percent due to over-provisioned resources and unused services. Our cloud cost optimization practice frequently identifies 30 to 50 percent savings in post-migration environments.
Right-Sizing Compute Resources
The instances provisioned during migration are often based on the specifications of the on-premises servers they replace, which were typically over-provisioned for peak capacity. After migration, analyze actual utilization data from cloud monitoring (CloudWatch, Azure Monitor, Google Cloud Monitoring) and right-size instances to match actual workload requirements. AWS Compute Optimizer, Azure Advisor, and Google recommender provide automated right-sizing recommendations.
Right-sizing is not a one-time activity. Workload patterns change over time, and instances should be reviewed quarterly. Automating this review process ensures that over-provisioned resources are identified and corrected before they accumulate significant waste.
Reserved Instances and Savings Plans
Once workload patterns are stable (typically 1 to 3 months after migration), commit to reserved instances or savings plans for predictable workloads. AWS Reserved Instances and Savings Plans, Azure Reservations, and GCP Committed Use Discounts offer 30 to 60 percent discounts compared to on-demand pricing in exchange for 1 or 3 year commitments. Apply reservations to your steady-state baseline workloads and use on-demand or spot instances for variable and burst capacity.
Cost Monitoring and Governance
Implement comprehensive cost monitoring from day one. Every resource should be tagged with cost allocation tags (environment, team, application, cost center) that enable granular cost attribution. Set up budget alerts at the account, team, and application level to catch unexpected cost increases before they become significant. Use tools like AWS Cost Explorer, Azure Cost Management, or Google Cloud Billing to analyze spending patterns and identify optimization opportunities.
Governance policies should prevent cost-related anti-patterns: restrict the ability to launch expensive instance types without approval, require tagging on all resources, and automatically stop or rightsize idle resources. Infrastructure as code (Terraform, CloudFormation, Pulumi) enforces these policies consistently across all environments.
Cloud Provider Comparison for Migration
Each major cloud provider has distinct strengths and migration tooling. The following comparison highlights the key differentiators relevant to enterprise migration decisions.
| Capability | AWS | Azure | GCP |
|---|---|---|---|
| Migration Assessment | Migration Hub, Application Discovery Service | Azure Migrate, Service Map | Migration Center, Fit Assessment |
| Server Migration | Application Migration Service (MGN) | Azure Migrate (agentless and agent-based) | Migrate for Compute Engine |
| Database Migration | DMS, SCT for schema conversion | Database Migration Service, Data Migration Assistant | Database Migration Service, Datastream |
| Data Transfer | Snowball, DataSync, Transfer Family | Data Box, AzCopy, Storage Migration Service | Transfer Appliance, Storage Transfer Service |
| Landing Zone | Control Tower, Organizations | Landing Zones, Management Groups | Foundation Toolkit, Organization |
| Key Strength | Broadest service portfolio, largest partner ecosystem | Microsoft stack integration, hybrid capabilities | Data analytics, ML/AI, Kubernetes-native |
| Best For | General-purpose, multi-service architectures | .NET, SQL Server, Active Directory workloads | Data-intensive, analytics, containerized workloads |
Common Migration Pitfalls and How to Avoid Them
Even with thorough planning, cloud migrations encounter predictable challenges. Awareness of these pitfalls allows you to build preventive measures into your migration plan rather than reacting to them during execution.
- Skipping the pilot wave. Teams under timeline pressure sometimes skip pilot migrations and go straight to production workloads. This is a false economy. The pilot is where you discover tooling gaps, networking issues, and process breakdowns in a low-risk context. Skipping it means discovering these issues with critical workloads, where the cost of failure is dramatically higher.
- Underestimating data transfer time. A 10TB database over a 1 Gbps connection takes roughly 22 hours to transfer under ideal conditions. Real-world conditions include competing traffic, protocol overhead, and transfer tool limitations that can double or triple that estimate. For large data volumes, use physical transfer devices (AWS Snowball, Azure Data Box) or start continuous replication weeks before the planned cutover.
- Neglecting DNS TTL management. If your DNS records have a 24-hour TTL and you change them during cutover, some clients will continue hitting the old IP for up to 24 hours. Reduce TTL to 60 seconds at least 48 hours before cutover, perform the DNS change, and increase TTL after the migration is confirmed stable.
- Forgetting about egress costs. Cloud providers charge for data leaving their network (egress). Applications with significant data transfer between cloud and on-premises, or between cloud regions, can generate unexpected egress charges. Model egress costs during the planning phase and optimize data flows to minimize cross-boundary transfers.
- Treating migration as a purely technical project. Cloud migration affects operations teams, security teams, finance teams, and application owners. Without stakeholder alignment and clear communication, technical success can be undermined by organizational resistance. Establish a migration governance board that includes stakeholders from all affected functions.
- Not planning for rollback. Every migration wave should have a documented, tested rollback plan. If the cloud deployment fails validation, you need to be able to redirect traffic back to the on-premises environment within a defined window. Keep on-premises environments operational for at least two weeks after successful cutover as a safety net.
- Ignoring license compliance. Software licensing models vary dramatically in cloud environments. Microsoft SQL Server, Oracle Database, and many enterprise applications have cloud-specific licensing terms that differ from on-premises agreements. Bring Your Own License (BYOL) arrangements may require specific instance types or tenancy configurations. Engage your licensing team or a licensing consultant during the planning phase to avoid compliance violations.
Migration Readiness Checklist
Use this checklist to verify readiness before beginning migration execution. Every item should be completed and verified before proceeding to the first production wave.
- Infrastructure discovery and inventory completed with automated tooling
- Application dependency map reviewed and validated by application owners
- Migration strategy (6 Rs) assigned to every workload in the portfolio
- Cloud landing zone deployed with networking, IAM, security, and governance
- VPN or dedicated connectivity established between on-premises and cloud
- DNS architecture designed with low-TTL cutover strategy
- Migration wave plan created with workloads grouped by dependency and risk
- Rollback plan documented and tested for each wave
- Monitoring and alerting configured in the cloud environment
- Cost allocation tags defined and enforced via policy
- Security controls implemented and validated against compliance requirements
- Migration runbooks written and reviewed by the execution team
- Pilot migration completed successfully with lessons incorporated
- Stakeholder communication plan established with escalation paths
- License compliance verified for all software being migrated
- Data transfer estimates calculated and validated against migration windows
Getting Started with Your Migration
Cloud migration is a complex but well-understood challenge. The organizations that succeed are those that invest in thorough assessment, structured planning, and disciplined execution. The checklist and methodology in this guide provide the framework; the execution requires a team with deep cloud architecture expertise, migration tooling experience, and the operational discipline to follow the process even when timelines create pressure to cut corners.
If your organization is planning a cloud migration, whether it is your first workload or your five hundredth, the principles are the same. Discover everything. Plan deliberately. Execute in waves. Optimize continuously. The cloud is not a destination; it is an operating model. Getting there safely and efficiently requires treating the migration with the same rigor you apply to building your products.
Ready to start your cloud migration? Our experienced DevOps engineers can conduct a migration readiness assessment and build a detailed migration plan tailored to your infrastructure, compliance requirements, and business objectives. Contact our team to schedule an assessment.