Open-source dogma is quietly wrecking your cloud bill.
You installed HPA and VPA because the CNCF told you to.
You tuned a few values.
You hoped for the best.
What you got instead was:
- Replica counts bouncing like a yo-yo
- Missed traffic spikes
- Autoscalers fighting each other
- And a bill that keeps climbing
Your scaling isn’t intelligent.
It’s reactive chaos.
So we did something heretical.
We deleted all of it.
No HPA. No VPA. No custom metrics adapters.
We replaced 5,000+ lines of Go-based autoscaling machinery with a single 100-line
Rust binary running as a DaemonSet.
The result?
22% cost reduction in the first week
p99 latency variance flattened into a straight line
Zero scaling incidents during peak traffic
This is the truth Kubernetes won’t tell you in 2026.
Why HPA and VPA are fundamentally broken (and always were
1. They are backward-looking
HPA reacts to the last 30-60 seconds of CPU or memory.
By the time it scales:
- Requests are already queued
- Latency has already spiked
- Users have already felt pain
Reactive scaling is damage control, not prevention.
2. They don’t coordinate
HPA scales out.
Minutes later, VPA decides:
“Actually, these pods need more memory.”
Now you have:
More pods
- Bigger pods
- On the same nodes
Congratulations. You just over-provisioned everything.
3. They are cost-blind
HPA doesn’t know:
- Spot vs on-demand pricing
- Imminent spot interruptions
- Whether the bottleneck is CPU, I/0, cache, or database
It will happily scale you onto the most expensive option available.
The replacement: a single Rust scaler binary
We didn’t build “another autoscaler.”
We built a feedback controller, in the control-theory sense.
It does exactly one thing:
- Read signals
- Make a decision
- Patch the Kubernetes API
All intelligence comes from inputs, not framework complexity.
The four signals HPA never sees
1. Real-time AWS pricing
- Actual node cost
- Spot vs on-demand
- Predicted price changes
Scaling without price awareness is financial negligence.
2. Business metrics
Not CPU.
Things that matter:
Orders per second
- Checkout latency
- Cache hit ratio
We scale on business pressure, not resource proxies.
3. Downstream health
If database p95 latency is rising:
• Scaling the app layer makes things worse
HPA does exactly that.
Our scaler backs off to protect the system.
4. Traffic prediction (5-minute horizon)
A tiny in-memory time-series model (Rust salmon crate) predicts traffic before it arrives.
Scaling happens ahead of demand, not after failure.
The algorithm (brutally simple)
let decision = if forecasted_traffic › current_cap
ScaleDecision:: ScaleOut
} else if node_price_spike_imminent () {
ScaleDecision:: MoveToCheaperNode
} else if downstream_latency > threshold {
ScaleDecision: :ScaleDown // protect DB
¿ else {
ScaleDecision:: Hold
};
One decision.
One atomic action.
Scale replicas and adjust resource requests and choose cheaper capacity – together.
No thrashing. No tug-of-war.
Deployment: a 5-minute
DaemonSet
apiVersion: apps/v1
kind: DaemonSet metadata:
name: rust-scaler
spec.
template:
spec:
containers:
- name: scaler
image: ghcr.io/your-org/scaler:latest
args: [“–prometheus-url=http://prometheus
env: - name: AWS_REGION
value: us-east-1
Runs on every node.
Makes local decisions for local pods.
No central brain.
No single point of failure.
Results that embarrassed our platform team Black Friday
The scaler began scaling 8 minutes before traffic increased, having learned weekly patterns.
Result:
Zero queue buildup
• Flat latency curve
Cost-aware migration
Non-critical batch jobs were moved to spot nodes 12 minutes before a price spike, then
moved back later.
No alert.
No human intervention.
The silent fix
It detected a steady RSS increase (memory leak) in a legacy service.
Instead of:
OOM at 3 AM
It:
Gradually increased memory requests over 2 hours
We discovered the issue a week later in logs.
Scaling is not an infrastructure problem.
It’s a business logic problem.
Generic autoscalers are built for:
- The average workload
- The average company
- The lowest common denominator
You are not average.
A small, explicit binary you fully understand will always outperform a massive, generic black box you don’t control.
Take back control
- Fork our open-source scaler
- Add one business metric (e.g., cart_size)
- Run it alongside HPA for one week
- Compare decisions
You’ll see the lie immediately.
Final truth
The future of cloud-native scaling isn’t more controllers, CRDs, and YAML.
It’s radical simplicity powered by better data.
Sometimes the smartest tool in your platform is the one you wrote yourself.
!! Thanks for your precious time !!