Implementing connected component labeling as a user defined operator for SciDB

Amidu Oloso, Kwo Sen Kuo, Thomas Clune, Paul Brown, Alex Poliakov, Hongfeng Yu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

We have implemented a flexible User Defined Operator (UDO) for labeling connected components of a binary mask expressed as an array in SciDB, a parallel distributed database management system based on the array data model. This UDO is able to process very large multidimensional arrays by exploiting SciDB's memory management mechanism that efficiently manipulates arrays whose memory requirements far exceed available physical memory. The UDO takes as primary inputs a binary mask array and a binary stencil array that specifies the connectivity of a given cell to its neighbors. The UDO returns an array of the same shape as the input mask array with each foreground cell containing the label of the component it belongs to. By default, dimensions are treated as non-periodic, but the UDO also accepts optional input parameters to specify periodicity in any of the array dimensions. The UDO requires four stages to completely label connected components. In the first stage, labels are computed for each subarray or chunk of the mask array in parallel across SciDB instances using the weighted quick union (WQU) with half-path compression algorithm. In the second stage, labels around chunk boundaries from the first stage are stored in a temporary SciDB array that is then replicated across all SciDB instances. Equivalences are resolved by again applying the WQU algorithm to these boundary labels. In the third stage, relabeling is done for each chunk using the resolved equivalences. In the fourth stage, the resolved labels, which so far are 'flattened' coordinates of the original binary mask array, are renamed with sequential integers for legibility. The UDO is demonstrated on a 3-D mask of 0(10n) elements, with 0(108) foreground cells and o(106) connected components. The operator completes in 19 minutes using 84 SciDB instances.

Original languageEnglish (US)
Title of host publicationProceedings - 2016 IEEE International Conference on Big Data, Big Data 2016
EditorsRonay Ak, George Karypis, Yinglong Xia, Xiaohua Tony Hu, Philip S. Yu, James Joshi, Lyle Ungar, Ling Liu, Aki-Hiro Sato, Toyotaro Suzumura, Sudarsan Rachuri, Rama Govindaraju, Weijia Xu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2948-2952
Number of pages5
ISBN (Electronic)9781467390040
DOIs
StatePublished - Jan 1 2016
Event4th IEEE International Conference on Big Data, Big Data 2016 - Washington, United States
Duration: Dec 5 2016Dec 8 2016

Publication series

NameProceedings - 2016 IEEE International Conference on Big Data, Big Data 2016

Other

Other4th IEEE International Conference on Big Data, Big Data 2016
CountryUnited States
CityWashington
Period12/5/1612/8/16

    Fingerprint

Keywords

  • Connected Component Labeling
  • MemArray
  • SciDB
  • User Defined Operator
  • Weighted Quick Union
  • array
  • connectivity
  • equivalencies
  • mask

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Hardware and Architecture

Cite this

Oloso, A., Kuo, K. S., Clune, T., Brown, P., Poliakov, A., & Yu, H. (2016). Implementing connected component labeling as a user defined operator for SciDB. In R. Ak, G. Karypis, Y. Xia, X. T. Hu, P. S. Yu, J. Joshi, L. Ungar, L. Liu, A-H. Sato, T. Suzumura, S. Rachuri, R. Govindaraju, & W. Xu (Eds.), Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016 (pp. 2948-2952). [7840945] (Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData.2016.7840945