Evaluating the impact of data placement to spark and SciDB with an Earth Science use case

Khoa Doan, Amidu O. Oloso, Kwo Sen Kuo, Thomas L. Clune, Hongfeng Yu, Brian Nelson, Jian Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Scopus citations

Abstract

We investigate the impact of data placement on two Big Data technologies, Spark and SciDB, with a use case from Earth Science where data arrays are multidimensional. Simultaneously, this investigation provides an opportunity to evaluate the performance of the technologies involved. Two datastores, HDFS and Cassandra, are used with Spark for our comparison. It is found that Spark with Cassandra performs better than with HDFS, but SciDB performs better yet than Spark with either datastore. The investigation also underscores the value of having data aligned for the most common analysis scenarios in advance on a shared nothing architecture. Otherwise, repartitioning needs to be carried out on the fly, degrading overall performance.

Original languageEnglish (US)
Title of host publicationProceedings - 2016 IEEE International Conference on Big Data, Big Data 2016
EditorsRonay Ak, George Karypis, Yinglong Xia, Xiaohua Tony Hu, Philip S. Yu, James Joshi, Lyle Ungar, Ling Liu, Aki-Hiro Sato, Toyotaro Suzumura, Sudarsan Rachuri, Rama Govindaraju, Weijia Xu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages341-346
Number of pages6
ISBN (Electronic)9781467390040
DOIs
StatePublished - 2016
Event4th IEEE International Conference on Big Data, Big Data 2016 - Washington, United States
Duration: Dec 5 2016Dec 8 2016

Publication series

NameProceedings - 2016 IEEE International Conference on Big Data, Big Data 2016

Other

Other4th IEEE International Conference on Big Data, Big Data 2016
Country/TerritoryUnited States
CityWashington
Period12/5/1612/8/16

Keywords

  • SciDB
  • SciDB
  • Spark
  • data layout
  • multimensional arrays

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Evaluating the impact of data placement to spark and SciDB with an Earth Science use case'. Together they form a unique fingerprint.

Cite this