Why We Left the Cloud

For years, Blue ran on cloud infrastructure. Our frontend was deployed on Render. Our database lived on AWS Aurora in Singapore. It worked — until it didn’t.

This is the story of how a three-person engineering team, with zero bare metal experience, migrated our entire production infrastructure to dedicated servers. The result: dramatically lower costs, faster performance across the board, and zero downtime since the migration.

The breaking point

Render’s autoscaling had a problem. During traffic spikes — exactly the moments when reliability matters most — the scaling transitions would fail. The app would go down for a few minutes. Not once. Not occasionally. Several times a day.

We tolerated it for months. We’d get support tickets, apologize to customers, and move on. But when your platform serves thousands of companies running their daily operations, “it’ll scale back up in a few minutes” isn’t an answer.

Meanwhile, AWS Aurora was doing its job. The database was stable. But we were paying thousands a month for a managed MySQL instance that, as we’d later discover, was significantly slower than a dedicated server costing a fraction of the price.

The combination of Render’s instability and the cloud bill every month made the decision inevitable. We had to take control of our own infrastructure.

Why now

Here’s the honest part: we couldn’t have done this a year ago.

Nobody on our team had ever managed bare metal servers. We’re three engineers building a product — not a devops team. The idea of provisioning servers, setting up replication, managing failover, writing deployment pipelines from scratch — it was genuinely beyond our capabilities.

Two things changed that made this possible.

First, the tooling matured. Disco gave us a deployment platform that runs entirely from a CLI and a disco.json file in our repo. No UI required. Just push to GitHub, and it deploys. Ansible gave us infrastructure as code — every server configuration is version controlled, reproducible, and auditable.

Second, AI changed what a small team can do. We used Claude Code extensively throughout this migration — writing Ansible playbooks, planning the migration phases, debugging MySQL replication configs, and working through problems that would have previously required hiring a specialist. This isn’t a small thing. A migration like this was literally not in our skillset twelve months prior. AI didn’t just accelerate the work — it made it possible.

People warned us not to do it. “You’re not an infra company.” “Stick to what you know.” “Cloud is fine, just pay the bill.” We heard it all. But the math didn’t lie, and we were tired of being at the mercy of someone else’s autoscaling bugs.

Choosing the stack

Hetzner was the obvious choice for servers. The reputation in the bootstrapper community is well-earned, and the price-to-performance ratio isn’t even close. Our application server — 48 cores, 128GB RAM, dual NVMe SSDs — costs less per month than what we were paying Render alone. We chose their FSN1 datacenter in Germany. With US and Europe accounting for half our user base, Germany is geographically central, and the strong privacy laws are a bonus. Our origin had been in Singapore purely because that’s where we started the company — no technical reason.

Disco replaced Render as our deployment layer. We’d been using Coolify for our dev environment on AWS EC2, but it felt like it was trying to do everything. Disco is the opposite — opinionated, CLI-first, and built on Docker Swarm. A disco.json file and a Dockerfile is all you need. Everything is controlled through code, which is exactly what we wanted.

We had a call with Greg, Disco’s founder, who was incredibly helpful. We even submitted a PR for custom Docker parameters in the disco.json config, and their team improved on it and shipped it within days. When you find an open-source tool backed by a responsive, thoughtful team, you know you’ve made the right call.

Ansible handles all server configuration. Every firewall rule, every MySQL setting, every monitoring agent — it’s all in playbooks, checked into Git. If a server disappears tomorrow, we can recreate it from scratch. This was a major philosophical shift for us: from clicking around cloud dashboards to treating infrastructure as code.

Cloudflare sits in front of everything for CDN, DDoS protection, and edge caching. With a global user base, Cloudflare means the origin server location matters far less than people assume.

The phased migration

We didn’t migrate everything at once. That would have been reckless for a team doing this for the first time.

Phase 1: Frontend (January 2026). We moved our Vue 3 frontend to Hetzner first. This was the lowest-risk move — Cloudflare sits in front of it anyway, so we could test the entire deployment pipeline, get comfortable with Disco, and build confidence on production traffic without risking the database or API.

We ran the frontend on the new infrastructure for about two months while we prepared everything else.

Phase 2: Backend services (March 2026). Next came the API, the import/export engine, the collaboration server, and all background services. By this point, we’d been running the application server for nearly 100 days without a single second of downtime. The confidence was there.

We spun up multiple instances of our Node.js API using cluster mode, and even then we were using only a fraction of the server’s capacity. The difference from Render was night and day — no autoscaling failures, no cold starts, just consistent performance.

Phase 3: The database (March 2026). This was the scary one. Hundreds of gigabytes of production MySQL data, serving thousands of companies. This required two separate migrations.

First, we migrated Aurora from Singapore to Germany, keeping it on Aurora but moving it closer to our new application server. This immediately improved latency for API-to-database calls.

Then, a few days later on a Saturday morning Asia time — chosen deliberately because it’s Friday night in the US and the dead of night in Europe, statistically our quietest window — we did the real migration: Aurora to self-managed MySQL on Hetzner.

We put up a maintenance notice 72 hours in advance, a persistent banner at the top of the app that couldn’t be dismissed. When the window arrived, we put Blue into read-only mode for about 12 hours while the data migrated. The app stayed up the entire time — users could view everything, they just couldn’t write.

We had a full runbook prepared, tested on staging, with rollback steps at every phase. The idea of it was scarier than the execution. There was one hiccup — created_at timestamps didn’t port across cleanly due to a tooling issue, requiring a follow-up migration to fix those columns across 100+ tables. A minor headache, but nothing that affected users.

The results

The numbers speak for themselves.

Cost: Our monthly infrastructure bill dropped from thousands of dollars to a fraction of that. We’re getting significantly more compute power for significantly less money.

Database performance: Reads are roughly 30% faster out of the box on dedicated hardware versus Aurora. No tuning, no tricks — just faster disks and no shared-tenancy overhead.

Deploy times: API deployments went from about two minutes on Render to around 15 seconds. Frontend builds are 50% faster — under a minute now. When you deploy multiple times a day, this adds up fast.

Stability: Zero downtime since the migration. No more autoscaling crashes. No more apologizing to customers for “brief interruptions.” The server just runs.

Control: Every piece of our infrastructure is defined in code. Ansible playbooks, Disco configs, MySQL replication — it’s all in Git. We can see exactly what’s running, why, and reproduce it from scratch if needed.

What AI made possible

This deserves its own section because it fundamentally changes the calculus for small teams considering a cloud exit.

A year ago, we would not have attempted this migration. We didn’t have the expertise in server administration, MySQL replication, ProxySQL configuration, repmgr failover management, Ansible automation, or any of the dozens of specialized skills a migration like this demands.

Claude Code changed that equation entirely. It helped us write Ansible playbooks for MySQL replication and failover. It helped us plan the phased migration strategy. It debugged ProxySQL configs and network VLAN setups. It wrote the runbook for the database cutover.

This isn’t about AI replacing expertise — it’s about making expertise accessible. We still made every decision. We still tested everything. We still ran the migration ourselves on a Saturday morning, watching the progress, ready to roll back. But the gap between “we can’t do this” and “we can do this” was bridged by having an AI pair that knows infrastructure.

If you’re a small team that’s been told you need to hire a devops engineer before you can leave the cloud — that’s no longer true. The tools exist now. Plan carefully, migrate in phases, and use AI to fill the knowledge gaps.

Would we do it again

Without hesitation.

The cloud made sense when we were starting out. Render and AWS let us ship fast without thinking about servers. But as we grew, the costs scaled faster than the value. We were paying a premium for convenience, and that convenience came with instability we couldn’t control.

Dedicated servers aren’t right for everyone. If you’re pre-product-market-fit and iterating fast, the cloud’s flexibility is worth the premium. But once your architecture stabilizes and your traffic patterns are predictable, the math changes dramatically.

For us, the cloud exit wasn’t just a cost optimization. It was a shift in philosophy — from renting someone else’s computers and hoping their autoscaling works, to owning our infrastructure, defined in code, running on hardware we control.

Three engineers. Zero devops experience. One migration. And the best infrastructure we’ve ever had.

— Manny