2012

Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing

Matei Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. Mccauley, M. J. Franklin, S. Shenker, Ion Stoica

citations

Cite Score

77

AI summary

This paper introduces Resilient Distributed Datasets (RDDs), a distributed memory abstraction for fault-tolerant in-memory cluster computing, implemented in Spark. It achieves a 20x speedup over Hadoop in iterative applications and speeds up an analytics report by 40x. It enables interactive queries on a 1 TB dataset with 5-7s latency.

Main Contributions

  • Introduces Resilient Distributed Datasets (RDDs), a distributed memory abstraction for fault-tolerant in-memory cluster computing.
  • RDDs enable efficient data reuse in a broad range of applications.
  • RDDs provide fault tolerance by logging the transformations used to build a dataset (its lineage) rather than the actual data.
  • Implemented RDDs in a system called Spark.
  • Spark outperforms Hadoop by up to 20x in iterative applications.

Abstract

We present Resilient Distributed Datasets (RDDs), a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner. RDDs are motivated by two types of applications that current computing frameworks handle inefficiently: iterative algorithms and interactive data mining tools. In both cases, keeping data in memory can improve performance by an order of magnitude. To achieve fault tolerance efficiently, RDDs provide a restricted form of shared memory, based on coarsegrained transformations rather than fine-grained updates to shared state. However, we show that RDDs are expressive enough to capture a wide class of computations, including recent specialized programming models for iterative jobs, such as Pregel, and new applications that these models do not capture. We have implemented RDDs in a system called Spark, which we evaluate through a variety of user applications and benchmarks.

Citation Graph

Loading graph...

References [33]

Sort:
Filter:

Jeffrey Dean, Sanjay Ghemawat - 2004

4 papers in library cite

Matei Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. Mccauley, M. J. Franklin, S. Shenker, Ion Stoica - 2012

2 papers in library cite

D. G. Murray, M. Schwarzkopf, C. Smowton, S. Smith, A. Madhavapeddy, S. Hand - 2011

2 papers in library cite

M. Isard, M. Budiu, Y. Yu, A. Birrell, D. Fetterly - 2007

2 papers in library cite

C. Chambers, A. Raniwala, F. Perry, S. Adams, R. R. Henry, R. Bradshaw, N. Weizenbaum - 2010

2 papers in library cite

S. Brin, L. Page - 1998

2 papers in library cite

J. W. Young - 1974

1 paper in library cites

Missing author listMissing year

1 paper in library cites

A. Heydon, R. Levin, Y. Yu - 2000

1 paper in library cites

Bingxiang He, Michael Yang, Z. Guo, R. Chen, B. Su, Wuwei Lin, L. Zhou - 2010

1 paper in library cites

R. Ramakrishnan, J. Gehrke - 2003

1 paper in library cites

Matei Zaharia, D. Borthakur, J. S. Sarma, K. Elmeleegy, S. Shenker, Ion Stoica - 2010

1 paper in library cites

K. Thomas, C. Grier, J. Ma, V. Paxson, Dawn Song - 2011

1 paper in library cites

G. Ananthanarayanan, A. Ghodsi, S. Shenker, Ion Stoica - 2011

1 paper in library cites

B. Nitzberg, V. Lo - 1991

1 paper in library cites

Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K. Gunda, J. Currey - 2008

1 paper in library cites

Y. Bu, B. Howe, M. Balazinska, M. D. Ernst - 2010

1 paper in library cites

P. Bhatotia, A. Wieder, R. Rodrigues, U. A. Acar, R. Pasquin - 2011

1 paper in library cites

D. Peng, F. Dabek - 2010

1 paper in library cites

R. Bose, J. Frew - 2005

1 paper in library cites

B. Hindman, Andy Konwinski, Matei Zaharia, A. Ghodsi, A. D. Joseph, R. H. Katz, S. Shenker, Ion Stoica - 2011

1 paper in library cites

P. K. Gunda, L. Ravindranath, C. A. Thekkath, Y. Yu, L. Zhuang - 2010

1 paper in library cites

S. Y. Ko, I. Hoque, B. Cho, I. Gupta - 2009

1 paper in library cites

Russell Power, Jeffrey Li - 2010

1 paper in library cites

G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, G. Czajkowski - 2010

1 paper in library cites

J. Cheney, L. Chiticariu, W. C. Tan - 2009

1 paper in library cites

Z. Guo, Xinpeng Wang, Jie Tang, Xiaodong Liu, Zhiwei Xu, M. Wu, M. F. Kaashoek, Zhengyou Zhang - 2008

1 paper in library cites

Missing author listMissing year
[28]Scala

1 paper in library cites

T. Hunter, T. Moldovan, Matei Zaharia, S. Merzgui, J. Ma, M. J. Franklin, P. Abbeel, A. M. Bayen - 2011

1 paper in library cites

D. Logothetis, C. Olston, B. Reed, K. C. Webb, K. Yocum - 2010

1 paper in library cites

J. Ousterhout, P. Agrawal, D. Erickson, C. Kozyrakis, J. Leverich, D. Mazieres, S. Mitra, A. Narayanan, G. Parulkar, M. Rosenblum, S. M. Rumble, E. Stratmann, R. Stutsman - 2010

1 paper in library cites

T. Hastie, R. Tibshirani, J. Friedman - 2009

1 paper in library cites

J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S. H. Bae, Jiezhong Qiu, G. Fox - 2010

1 paper in library cites

Cited by

2

papers in your library

Cites

2

papers in your library

Read

on August 3, 2025

Your review

Tags

Paper Aliases

No aliases