We just moved a client’s production workload off the public cloud and rebuilt their infrastructure from the ground up. The result is the kind of work that doesn’t fit in a status update — so here is the full story, the trade-offs we accepted, and what every growing company should ask before they sign their next AWS bill.
The problem was simple — and familiar
Like many growing companies, our client was hosting their applications on a major cloud provider. Every month the same two questions came up at the leadership table.
Why are we paying this much? Cloud bills had quietly tripled over eighteen months. Most of the growth wasn’t from new features or new customers — it was from “small” line items that nobody was watching: NAT gateway traffic, cross-AZ data transfer, idle managed-service buffers, and replicated storage that nobody had pruned in a year. The team had stopped reading the bill in detail because reading it didn’t change anything.
Why does every customer require a different management process? Each tenant had been onboarded as a one-off — a custom VPC, a custom database, a custom set of IAM roles. By customer number twelve, the operational surface was unmanageable. A configuration change for one customer meant a four-hour ticket for the platform team. There was no leverage in growth.
The first problem was draining the budget. The second was draining the team’s time. We saw both, and our recommendation was direct: it’s time to leave.
The honest framing. Public cloud isn’t bad. It’s a poor fit for a specific shape of workload — predictable traffic, multi-tenant by design, cost-sensitive, where the elasticity premium isn’t paying for itself anymore. That described our client exactly.
What we built
We built a multi-tenant Platform-as-a-Service infrastructure on private VDS (Virtual Dedicated Server) instances, fully under our control. The shape of the system:
- A single control plane that provisions tenants, runs deployments, and handles upgrades.
- Per-tenant isolation at the namespace level — each customer gets their own Kubernetes namespace, their own database schema, their own observability scope — but they share the underlying nodes for cost efficiency.
- Identity and policy managed centrally with Keycloak, so the same access model applies whether a tenant has one user or fifty.
- Self-service onboarding through an internal portal — picking the right combination of services for a new customer is now a 10-minute form, not a week of platform-team coordination.
- Closed management surface — the orchestration layer is reachable only from a VPN-gated jump host. There is no public internet path into the things that control everything else.
This is closer to how Render, Fly.io, or Heroku built their platforms — except sized for a single company’s needs and operated by the people who use it daily.
The outcomes that mattered
After three months in production:
- Monthly infrastructure costs dropped significantly. We don’t quote the exact percentage publicly, but the spend trajectory crossed below the old cloud baseline in the second month and kept going.
- New customer onboarding went from days to ten minutes. What used to be a multi-team handoff is now a form on the internal portal.
- All environments became observable from a single point. One Grafana, one Loki, one Tempo. The whole platform is legible from one screen.
- The management layer was completely closed to public access. No more public-internet-facing dashboards. No more “we’ll set up SSO later.” The reachable surface is dramatically smaller.
- Vendor lock-in was eliminated. The same Helm charts, the same infrastructure-as-code definitions, will run on any provider with a VDS API. If we want to multi-home tomorrow, we can.
These outcomes don’t show up on a marketing page. They show up on a finance report and in a platform team’s morale.
The real message of this project
“Digital transformation” is too often discussed as adding new tools. A new dashboard, a new AI integration, a new observability product. Tools matter, but they’re the visible 10%. What creates lasting impact is whether the foundation those tools sit on is built right.
A foundation that will still scale three years from now. A foundation that will still stay secure when a key team member leaves. A foundation that’s sustainable on whatever budget you have in 2028 — not just whatever you have today.
This is the layer most companies under-invest in until it breaks. By the time it breaks, the cost of fixing it is much higher than the cost of building it right the first time.
Where our work begins
This is where IWWOMI’s work starts. We don’t just build applications or AI solutions — we design and deploy the entire infrastructure that keeps them running. From the data layer to deployment pipelines, from security to observability. The same discipline that makes our AI transformation work production-grade is what makes our infrastructure work survive contact with growth.
Some adjacent reading from our team:
- Cloud Migration Strategy: A Complete Guide — the framework we use to decide what moves and what stays.
- DevOps Best Practices for Modern Development Teams — the operational practices behind self-hosted at scale.
- Microservices Architecture: When and How to Use It — the architectural shape that makes multi-tenant possible.
- Database Optimization Techniques for High-Traffic Apps — what we tune once the database becomes the bottleneck.
When to consider this
You probably should not exit the public cloud if:
- Your traffic is genuinely spiky and you need autoscaling that you couldn’t justify staffing yourselves.
- You’re a small team without operational depth, and one of the founders is on call.
- You depend on managed services (RDS, Aurora, DynamoDB) in ways that would take a year to replicate.
You probably should consider it if:
- Your traffic profile is predictable and your cloud bill is growing faster than your customer base.
- You serve multiple tenants that are structurally similar.
- Compliance (KVKK, GDPR, sector-specific) keeps adding requirements you find hard to satisfy on shared infrastructure.
- A single API price change from a single provider could meaningfully hurt your margins.
The third option — and often the right one — isn’t to fully exit. It’s to build a self-hosted core for the steady-state workload and keep a small cloud footprint for the burst. The economics of a hybrid landing usually beat either pure cloud or pure on-prem.
The technical deep-dive
For the full technical write-up — the architecture diagrams, the trade-offs we accepted, what we’d do differently, and the specific tooling choices (Kubernetes, Cilium, Longhorn, Tempo, Loki, Argo CD) — read our team lead Abdullah Taş’s piece on Medium:
From Zero to Production: The Story of Building a Self-Hosted PaaS Architecture →
Ready to talk?
If your infrastructure is struggling to keep up with growing workloads, or if your cloud bill has stopped being sustainable, this is exactly the kind of work we do. The first conversation is a 30-minute call where we look at your current setup, your trajectory, and what a different shape would mean for you. No commitment, no slide deck, just an honest read.
Get in touch — we’d love to hear what you’re building.