Back To Schedule
Thursday, March 5 • 15:15 - 17:15
Poster: 'Speeding up Big Data Jobs through better traffic flow management in file systems,' Simbarashe Dzinamarira, Rice University

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Distributed file systems that underlie big data processing frameworks often employ data replication to guard against node failures. A common practice in these file systems is to couple replication with primary copy writes. Unfortunately, under real workloads, the replication traffic ends up contending arbitrarily for limited disk bandwidth against each other and with time sensitive data flows. As a result, task execution times are significantly increased and cluster-wide resource utilization becomes inefficient.

We present FC-FS(Flow controlled file system), a departure from conventional design that allows decoupling of replication from primary copy writes and performs job-aware flow control on file system traffic. Just as you would not backup your computer when it is busy, FC-FS pauses replication when computers in a cluster are busy.

FC-FS employs established flow control concepts such as credit-based flow control and weighted fair queuing in novel ways to allow the realization of a variety of policy objectives, including speeding up executing tasks, rapidly allocating under-utilized disk bandwidth to replication writes, reducing the time to reach failure resilience for important jobs, and progressively achieving increasing levels of failure resilience.

We have augmented HDFS to realize FC-FS. Extensive evaluation on a 30-node Hadoop cluster shows that FC-FS can halve the duration of a job’s write phase while reducing average job runtime by up to 20%.

Thursday March 5, 2015 15:15 - 17:15 CST
BioScience Research Collaborative 6500 Main Street, Houston, Tx 77005

Attendees (0)