My notes on Avoiding overload

In AWS, the services are split into two:

  • services executing customer requests (the data plane)

In general, the larger data plane fleet makes calls to the smaller control plane fleet.

How to avoid the overload on the control plane?

consider the direction pf API calls. the larger fleet calls the smaller or vice versa?

When the callers fleet exceeds the scale of the control plane fleet by a factor of 100 or more, it requires fine-tuning.

There can be requests burst from the DP side, even %10 percent of fleet increase might be too sudden for the CP side.

  • retries

The goodput can quickly drop to zero

  • load shedding can help to keep goodput steady longer

The biggest challenge with this architecture is scale mismatch”

Instead, the outnumbered CP side can write the latest conf into S3 buckets to delegate the scaling issue to S3, and the clients can watch the changes on S3 buckets.

Regardless of the size of the DP fleet, the CP fleet can stay small.

Increases the availability of the CP plane.

It’s called static stability, is a desirable attribute in distributed systems.

In the case of dynamic configuration, this solution may not be practical since it would require updating S3 buckets too frequently.

Another approach is to change the direction of communication and give the CP to set the pace of traffic volume.

In this architecture, CP sets the pace of communication load. If the CP is under scaled, it will continue to work even though it’s at a slower pace.

The downside of the approach is that the CP should maintain the list of the DP nodes, handle the unavailability of any DP node. Splitting the DP nodes between the CP nodes helps to ensure every DP receives the latest configuration.

You can find my notes I take when learning something new or reading, watching. So, they only help me to refresh and remember what I’ve consumed.