Predicting records size with neural networks in key-value stores under heterogeneous multiget workloads
Files
CabreraAlba_68171400_2020.pdf
Open access - Adobe PDF
- 2.22 MB
Details
- Supervisors
- Faculty
- Degree label
- Abstract
- Key-value stores are the backbone of many interactive cloud services. Response times on these services, expected to be very efficient almost all the time, are largely tied to the latency response of their respective key-value database engines. Keeping such a latency low in key-value stores remains a challenging task, in particular for those queries with worst response times, typically located in the tail of the latency distribution. In this thesis, we consider exploiting records size prediction as complementary information to support scheduling algorithms during replica selection, so that they can make more informed decisions. We explored the different conditions under which this becomes feasible, and more importantly beneficial, in the context of multiget queries involving heterogeneous data. To this end, we propose a shallow neural network to predict whenever a queried record-column pair will be large, even before seeing this particular record (i.e. primary key) for the first time. Our network exhibits a limited computational overhead for inference, and unlike Bloom filters, can be updated "on the go" without the need for restart nor retrain from scratch. Experiments conducted on several synthetic datasets confirm that the proposed network attains a beneficial state, namely more True Positives identified than mistakes made, after seeing approximately 25% of all the records. Our results suggest that it is indeed possible to accurately predict if a key-column opset will be large, but only whenever large records are not evenly distributed among the different columns, or equivalently, whenever they are "rather grouped" into a set of a few columns.