Comp Sci Talks for Data Engineers: Distributed, Low Latency Scheduling with Sparrow
When you need to execute code on a cluster of machines, deciding which machine should run that code becomes a complex problem, known as scheduling. We're all familiar with routing problems, such as the recent RapGenius incident. It turns out that simple improvements to randomized routing can dramatically improve performance! Sparrow is a distributed scheduling algorithm for low latency, high throughput workloads.
David Greenberg (Research Methodologist, Two Sigma Investments) uses the paper Sparrow: Distributed, Low Latency Scheduling by Kay Ousterhout, Patrick Wendell, Matei Zaharia and Ion Stoica to frame the conversation. He reviews the Sparrow algorithm, Sparrow's application to a big data MapReduce application, and other applications.