The Next Generation of Apache Hadoop

August 25, 2016

Apache Hadoop turned ten this year. To celebrate, Karthik and I gave a talk at USENIX ATC '16 about open problems to solve in Hadoop's second decade. This was an opportunity to revisit our academic roots and get a new crop of graduate students interested in the real distributed systems problems we're trying to solve in industry.

This is a huge topic and we only had a 25 minute talk slot, so we were pitching problems rather than solutions. However, we did have some ideas in our back pocket, and the hallway track and birds-of-a-feather we hosted afterwards led to a lot of good discussion.

Karthik and I split up the content thematically, which worked really well. I covered scalability, meaning sharded filesystems and federated resource management. Karthik addressed scheduling (unifying batch jobs and long-running services) and utilization (overprovisioning, preemption, isolation).

I'm hoping to give this talk again in longer form, since I'm proud of the content.

Slides: pptx

USENIX site with PDF slides and audio

Talking big ideas like this with Karthik also made me nostalgic for graduate school. Karthik is one of the most impressive people I know; I thought he'd left graduate school for Cloudera like me, but he's actually been working on his PhD nights and weekends! While we were prepping this presentation for ATC, he was also working on a submission for SoCC, and is apparently close to graduating.

blog comments powered by Disqus