Dynamodb announced Autoscaling not a long time ago. It is quite handy feature if you’ve got not linear load to your tables. It helps you to save a fortune because AWS cares about performance of your table and rises or drops throughput when it’s needed.
However, it was not that good for our use case and we were forced to create our own speedometer.
So AWS native Autoscaling is using Cloudwatch for performance analytics. This leads to the delay of a response. From our experience it is taking at least 3 minutes for Cloudwatch data to be able to rise/drop table capacity. So, if for example table is at 75% for 5 minutes and load is stable then Autoscaling will rise as expected. Easy to see that blue line which is showing actual load is always higher than actual table performance.
On the other hand the app could receive enormous number of requests within very short period of time. In this case table is not prepared and throttled requests are gonna be populated. This means data of requests potentially could be lost. For example, if there is not mechanism for re-sending throttled requests or RAM/disk space running out.
One more problem if load is quite high, then it’ll rise in steps always going behind actual requests rate and keep Throttled Error rate at very high and stable level.
This problem lead to custom made speedometer which is reacting as requests go. It is counting request and assuming is table will be throttled. If there is a possibility of reaching table limits speedometer reacting straight away and table capacity is ready in 10-30 seconds. This amount of time taken by Dynamo to update capacity. Moreover, Dynamo designed that way that it’ll allow spikes to happen if they are relatively short.
It is noticeable that custom speedometer is over-provisioning but this is matter for fine tuning. But the good news is no data is lost.