We built HumanifyCV on one EC2 box. Here's what broke and what held.
A single m6i.large running Next.js, Postgres, k8s, and a GitHub runner. Everything coexists until it doesn't.
By HumanifyCV team
HumanifyCV runs on one AWS EC2 m6i.large instance. That's 2 vCPUs and 3.7 GB of RAM. On that box: a kubeadm-installed k8s cluster, Postgres as a statefulset, the Next.js app, the GitHub Actions self-hosted runner, and ingress-nginx on hostNetwork. It's a ridiculous setup and I'd recommend it to anyone shipping a pre-launch product.
Why one box
The cost of a multi-node setup at our scale is entirely negative. RDS, ALBs, separate runner hosts — all of them are there to solve problems we don't have yet. One box means one bill, one place to SSH, one thing to restart. The whole cluster fits in about 2 GB of RAM at idle.
What broke
Three things have actually gone wrong, and all three had the same root cause: disk pressure.
- docker build cache accumulated on every deploy until it was 10+ GB. kubelet flipped DiskPressure to true, started evicting pods (ingress first, because it has the lowest PriorityClass), and the site started serving 502s even though the app pods were fine.
- Kubelet refusing to start because we added swap without setting failSwapOn: false. Classic k8s gotcha; took 20 minutes to find.
- sshd timing out at banner exchange because the box was under memory pressure and forking a session stalled. The fix was to add swap — but see previous point about kubelet.
What held
Every other assumption. Postgres on local disk handles the load fine — we're nowhere near the I/O ceiling. The GitHub runner shares CPU with kubelet without fighting for it. Ingress-nginx on hostNetwork is faster than going through a NodePort. None of those have cost us anything.
“If your infra is load-bearing before you have users, you built the wrong infra.”
The migration to two nodes will happen when we need to. Not before. Meanwhile the box costs $60/month and does everything.