DynamoDB announced Autoscaling not a long time ago. It is quite handy feature if you don’t have linear load to your tables. It helps you to save a fortune because AWS cares about performance of your table and rises or drops throughput when it’s needed. However, it was not that good for our use case and we were forced to create our own Speedometer for DynamoDB.
Problems with AWS Cloudwatch
So AWS native Auto Scaling uses Cloudwatch for performance analytics. This leads to the delay of a response. From our experience it takes at least 3 minutes for Cloudwatch data to be able to rise/drop table capacity. So, if, for example, table is at 75% for 5 minutes and load is stable, then Autoscaling will rise as expected. Easy to see (picture below), that blue line which shows actual load is always higher than actual table performance.
On the other hand the app could receives enormous number of requests within very short period of time. In this case table is not prepared and throttled requests are gonna be populated. This means data of requests potentially could be lost. For example, if there is not mechanism for re-sending throttled requests or RAM/disk space run out.
One more problem if load is quite high, then it’ll rise in steps and always will go behind actual requests rate and keep Throttled Error rate at very high and stable level.
This problem lead us to custom made speedometer which reacts as requests go. It counts request assuming if table will be throttled. If there is a possibility of reaching table limits, speedometer will react straight away and table capacity is ready in 10-30 seconds. This amount of time is taken by Dynamo to update capacity. Moreover, Dynamo is designed that way that it’ll allow spikes to happen if they are relatively short.
It is noticeable that custom speedometer is over-provisioning but this is matter for fine tuning. But the good news are that no data is lost.