Open-source dogma is quietly wrecking your cloud bill.

You installed HPA and VPA because the CNCF told you to.

You tuned a few values.

You hoped for the best.

What you got instead was:

  • Replica counts bouncing like a yo-yo
  • Missed traffic spikes
  • Autoscalers fighting each other
  • And a bill that keeps climbing

Your scaling isn’t intelligent.

It’s reactive chaos.

So we did something heretical.

We deleted all of it.

No HPA. No VPA. No custom metrics adapters.

We replaced 5,000+ lines of Go-based autoscaling machinery with a single 100-line

Rust binary running as a DaemonSet.

The result?

22% cost reduction in the first week

p99 latency variance flattened into a straight line

Zero scaling incidents during peak traffic

This is the truth Kubernetes won’t tell you in 2026.

Why HPA and VPA are fundamentally broken (and always were

1. They are backward-looking

HPA reacts to the last 30-60 seconds of CPU or memory.

By the time it scales:

  • Requests are already queued
  • Latency has already spiked
  • Users have already felt pain

Reactive scaling is damage control, not prevention.

2. They don’t coordinate

HPA scales out.

Minutes later, VPA decides:

“Actually, these pods need more memory.”

Now you have:

More pods

  • Bigger pods
  • On the same nodes

Congratulations. You just over-provisioned everything.

3. They are cost-blind

HPA doesn’t know:

  • Spot vs on-demand pricing
  • Imminent spot interruptions
  • Whether the bottleneck is CPU, I/0, cache, or database

It will happily scale you onto the most expensive option available.

The replacement: a single Rust scaler binary

We didn’t build “another autoscaler.”

We built a feedback controller, in the control-theory sense.

It does exactly one thing:

  • Read signals
  • Make a decision
  • Patch the Kubernetes API

All intelligence comes from inputs, not framework complexity.

The four signals HPA never sees

1. Real-time AWS pricing

  • Actual node cost
  • Spot vs on-demand
  • Predicted price changes

Scaling without price awareness is financial negligence.

2. Business metrics

Not CPU.

Things that matter:

Orders per second

  • Checkout latency
  • Cache hit ratio

We scale on business pressure, not resource proxies.

3. Downstream health

If database p95 latency is rising:

• Scaling the app layer makes things worse

HPA does exactly that.

Our scaler backs off to protect the system.

4. Traffic prediction (5-minute horizon)

A tiny in-memory time-series model (Rust salmon crate) predicts traffic before it arrives.

Scaling happens ahead of demand, not after failure.

The algorithm (brutally simple)

let decision = if forecasted_traffic › current_cap

ScaleDecision:: ScaleOut

} else if node_price_spike_imminent () {

ScaleDecision:: MoveToCheaperNode

} else if downstream_latency > threshold {

ScaleDecision: :ScaleDown // protect DB

¿ else {

ScaleDecision:: Hold

};

One decision.

One atomic action.

Scale replicas and adjust resource requests and choose cheaper capacity – together.

No thrashing. No tug-of-war.

Deployment: a 5-minute

DaemonSet

apiVersion: apps/v1

kind: DaemonSet metadata:

name: rust-scaler

spec.

template:

spec:

containers:

  • name: scaler
    image: ghcr.io/your-org/scaler:latest
    args: [“–prometheus-url=http://prometheus
    env:
  • name: AWS_REGION
    value: us-east-1

Runs on every node.

Makes local decisions for local pods.

No central brain.

No single point of failure.

Results that embarrassed our platform team Black Friday

The scaler began scaling 8 minutes before traffic increased, having learned weekly patterns.

Result:

Zero queue buildup

• Flat latency curve

Cost-aware migration

Non-critical batch jobs were moved to spot nodes 12 minutes before a price spike, then

moved back later.

No alert.

No human intervention.

The silent fix

It detected a steady RSS increase (memory leak) in a legacy service.

Instead of:

OOM at 3 AM

It:

Gradually increased memory requests over 2 hours

We discovered the issue a week later in logs.

Scaling is not an infrastructure problem.

It’s a business logic problem.

Generic autoscalers are built for:

  • The average workload
  • The average company
  • The lowest common denominator

You are not average.

A small, explicit binary you fully understand will always outperform a massive, generic black box you don’t control.

Take back control

  1. Fork our open-source scaler 
  2. Add one business metric (e.g., cart_size)
  3. Run it alongside HPA for one week
  4. Compare decisions

You’ll see the lie immediately.

Final truth

The future of cloud-native scaling isn’t more controllers, CRDs, and YAML.

It’s radical simplicity powered by better data.

Sometimes the smartest tool in your platform is the one you wrote yourself.

!! Thanks for your precious time !!