A Unified MapReduce Domain-Specific Language for Distributed and - - PowerPoint PPT Presentation

a unified mapreduce domain specific language for
SMART_READER_LITE
LIVE PREVIEW

A Unified MapReduce Domain-Specific Language for Distributed and - - PowerPoint PPT Presentation

Pontifical Catholic University of Rio Grande do Sul (PUCRS), Faculty of Informatics (FACIN), Computer Science Graduate Program (PPGCC) Parallel Application Modeling Group (GMAP) A Unified MapReduce Domain-Specific Language for Distributed


slide-1
SLIDE 1

Daniel Adornes, Dalvan Griebler, Cleverson Ledur, Luiz Gustavo Fernandes Pontifical Catholic University of Rio Grande do Sul (PUCRS),


Faculty of Informatics (FACIN), Computer Science Graduate Program (PPGCC) Parallel Application Modeling Group (GMAP)

SEKE 2015

A Unified MapReduce Domain-Specific Language for Distributed and Shared Memory Architectures

Wyndham Pittsburgh University Center

slide-2
SLIDE 2
  • New challenges for software engineers and developers
  • Instead of being faster, computer architectures are

more parallel

  • Depending on the amount of data to be processed,

local memory is not enough and distributed systems become a necessity

  • Programming interfaces become prone to excessive

complexity

Introduction

Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-3
SLIDE 3

MapReduce abstract model

  • 2004 - Google introduced the MapReduce abstract

model, based on two operations, map and reduce,

  • riginally from functional programming languages
  • Simplicity and scalability for developing software to

process large datasets

  • Aimed, but not limited, to distributed environments

Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-4
SLIDE 4

Introduction Background Related Work Unified Interface Evaluation Conclusions

MapReduce job execution flow - Dean and Ghemawat (2004, p. 3)

slide-5
SLIDE 5

MapReduce abstract model

"Many different implementations of the MapReduce interface are possible. The right choice depends on the environment. For example, one implementation may be suitable for a small shared-memory machine, another for a large NUMA multi-processor, and yet another for an even larger collection of networked machines”

Dean and Ghemawat (2004, p. 3)

Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-6
SLIDE 6

2007 - Phoenix 2009 - Phoenix Rebirth 2005 - Hadoop

MapReduce implementations

2004 - MapReduce original publication 2010 - Tiled-MapReduce 2011 - Phoenix++

Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-7
SLIDE 7

2007 - Phoenix 2009 - Phoenix Rebirth 2005 - Hadoop

MapReduce implementations

2004 - MapReduce original publication 2010 - Tiled-MapReduce 2011 - Phoenix++

Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-8
SLIDE 8
  • Language: Java
  • Mapper and Reducer
  • Writable
  • InputFormatReader
  • RecordReader

Hadoop interface components

Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-9
SLIDE 9

Phoenix++

  • Language: C++
  • Efficient key-value storage
  • Modular storage options: Containers
  • Effective combiner stage
  • Aggressively call combiner after every map emit

Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-10
SLIDE 10

Phoenix++

  • Modular storage options
  • Specialized Container types

Key Distribution Sample applications Container type

*:*

Word Count variable-size hash table

*:k

Histogram, Linear Regression, K-means, String Match array with fixed mapping

1:1

Matrix Multiplication, PCA shared array

Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-11
SLIDE 11

Performance comparison

Hadoop vs Phoenix++

Experiment of 1 GB word count using Phoenix++ and Hadoop on a multi-core architecture. The y-axis is in a logarithmic scale.

Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-12
SLIDE 12
  • Important researches on improving Hadoop for

high performance at the single-node level.

  • No research was found on building a unified

MapReduce programming interface.

Related work

Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-13
SLIDE 13

Hone Appuswamy et al. Azwraith Phoenix++ Hadoop

Abstraction

Performance on shared-memory

Phoenix Phoenix 2 Tiled-MapReduce

Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-14
SLIDE 14

Unified MapReduce programming interface

  • One single programming interface
  • Transformation rules for Hadoop and Phoenix++

programming interfaces

  • Shared-memory and distributed state-of-the-art

solutions

Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-15
SLIDE 15
  • Focus on MapReduce logic
  • Abstraction capable of keeping key performance

components

  • Able to be hereafter extended to comprehend new

solutions and architectures (e.g., GPGPUs)

Unified MapReduce programming interface

Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-16
SLIDE 16

Unified MapReduce programming interface

@MapReduce<NAME, K_IN , V_IN , K_OUT , V_OUT , K_DIST > { @Map(key, value){
 // Map code logic } @SumReducer
 } @Type name(attr_name: attr_type, …)

Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-17
SLIDE 17

Unified MapReduce programming interface

@MapReduce<NAME, K_IN , V_IN , K_OUT , V_OUT , K_DIST > { … @Reduce(key, values){
 double product = 1 for(int i=0; i < length(values); i++) product *= values [ i ] emit(key, product) } }

Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-18
SLIDE 18

Stage Elements First imports/includes Second @MapReduce @Map @Reduce @Type global variables Third unsolved keywords Fourth variable types Fifth functions

Transformation process

Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-19
SLIDE 19

@type pixel(r: ushort, g: ushort, b: ushort) @MapReduce<HistogramMR, long, pixel, int, ulonglong, "*:768"> @Map(key, p) emit(p.b, 1) emit(p.g+256, 1) emit(p.r+512, 1) @SumReducer

Unified interface - Histogram

Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-20
SLIDE 20

public class HistogramMR { public static class Map extends Mapper<LongWritable, Pixel, IntWritable, LongWritable> { private final static LongWritable one = new LongWritable(1); @Override public void map(LongWritable key, Pixel p, Context context) throws IOException, InterruptedException { context.write(new IntWritable(p.getR()), one); context.write(new IntWritable(p.getG() + 256), one); context.write(new IntWritable(p.getB() + 512), one); } } }

Hadoop interface - Histogram

Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-21
SLIDE 21

class HistogramMR : public MapReduceSort<HistogramMR, pixel, intptr_t, uint64_t, array_container<intptr_t, uint64_t, sum_combiner, 768 #ifdef TBB , tbb::scalable_allocator #endif > > { public: void map(data_type const& value, map_container& out) const { emit_intermediate(out, value.b, 1); emit_intermediate(out, value.g+256, 1); emit_intermediate(out, value.r+512, 1); } };

Phoenix++ interface - Histogram

Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-22
SLIDE 22

@MapReduce<WordCountMR, long, text, string, int> @Map(key, value) toupper(value) tokenize(value) emit(token, 1) @SumReducer

Unified interface - WordCount

Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-23
SLIDE 23

public class WordCountMR { public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } …

Hadoop interface - WordCount

Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-24
SLIDE 24
  • C++ includes ± 6 lines
  • MapReduce blocks ± 25 lines
  • Custom split ± 24 lines
  • Custom types - C++ struct ± 34 lines
  • TOTAL 89 lines

Phoenix++ interface - WordCount

Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-25
SLIDE 25

10 20 30 40 50 Histogram Kmeans Linear Regression Word Count Word Length

Application Execution time in seconds

Version Generated Original

Mean execution time in seconds for original and generated Hadoop code (30 executions)

Performance evaluation - Hadoop

Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-26
SLIDE 26

Mean execution time in seconds for original and generated Phoenix++ code (30 executions)

Performance evaluation - Phoenix++

2 4 6 Histogram Kmeans Linear Regression Word Count Word Length

Application Execution time in seconds

Version Generated Original Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-27
SLIDE 27

SLOCCount

  • Source Lines of Code counting
  • Effort estimate based on COCOMO model

Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-28
SLIDE 28

SLOC count and reduction

Application Phoenix++ Hadoop Unified Interface Reduction compared to Phoenix++ Reduction compared to Hadoop WordCount 89 27 8 91.01% 70.37% WordLength 95 33 14 85.26% 57.58% Histogram 22 170 9 59.09% 94.71% K-means 98 244 57 41.84% 76.64% Linear Regression 31 171 18 41.94% 89.47%

Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-29
SLIDE 29

SLOC count and reduction

Application Phoenix++ Hadoop Unified Interface Reduction compared to Phoenix++ Reduction compared to Hadoop WordCount 89 27 8 91.01% 70.37% WordLength 95 33 14 85.26% 57.58% Histogram 22 170 9 59.09% 94.71% K-means 98 244 57 41.84% 76.64% Linear Regression 31 171 18 41.94% 89.47%

Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-30
SLIDE 30

SLOC reduction for the interface version with curly braces

SLOC reduction

41.84% 41.94% 57.58% 59.09% 70.37% 76.64% 85.26% 89.47% 91.01% 94.71% Histogram K−means Linear Regression Word Count Word Length

Application Reduced SLOC

Framework Hadoop Phoenix++ Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-31
SLIDE 31

Hone Appuswamy et al. Azwraith Phoenix++ Hadoop

Abstraction

Performance on shared-memory

Unified Interface Phoenix Phoenix 2 Tiled-MapReduce

Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-32
SLIDE 32

Conclusions

  • MapReduce implementations for lower level

architectures, particularly, lose MapReduce's

  • riginally aimed abstraction
  • Through a comprehensive set of transformation rules

it is possible to effectively cover the components of Phoenix++ and Hadoop’s programming interfaces

Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-33
SLIDE 33

Conclusions

  • Performance evaluation shows less than 3% of

variance from original and generated versions for all sample applications

  • A SLOC and effort reduction from 41.84% and up

to 96.48% is achieved

Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-34
SLIDE 34

Conclusions

  • Code written with the proposed unified interface

can be reused for addressing different architectures

  • Phoenix++ provides some optimizations for NUMA

architectures, which are not supported by the transformation rules

Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-35
SLIDE 35

Future work

  • The effective construction of the compiler and

code generator based on the proposed transformation rules

  • The extension of transformation rules for

compatibility with MapReduce solutions for different architectures (e.g., GPGPUs)

Introduction Background Related Work Unified Interface Evaluation Conclusions

slide-36
SLIDE 36

Daniel Adornes, Dalvan Griebler, Cleverson Ledur, Luiz Gustavo Fernandes Pontifical Catholic University of Rio Grande do Sul (PUCRS),


Faculty of Informatics (FACIN), Computer Science Graduate Program (PPGCC) Parallel Application Modeling Group (GMAP)

SEKE 2015

A Unified MapReduce Domain-Specific Language for Distributed and Shared Memory Architectures

Wyndham Pittsburgh University Center