[PPT] - A Unified MapReduce Domain-Specific Language for Distributed and PowerPoint Presentation

SLIDE 1

Daniel Adornes, Dalvan Griebler, Cleverson Ledur, Luiz Gustavo Fernandes Pontifical Catholic University of Rio Grande do Sul (PUCRS), 

Faculty of Informatics (FACIN), Computer Science Graduate Program (PPGCC) Parallel Application Modeling Group (GMAP)

SEKE 2015

A Unified MapReduce Domain-Specific Language for Distributed and Shared Memory Architectures

Wyndham Pittsburgh University Center

SLIDE 2

New challenges for software engineers and developers
Instead of being faster, computer architectures are

more parallel

Depending on the amount of data to be processed,

local memory is not enough and distributed systems become a necessity

Programming interfaces become prone to excessive

complexity

Introduction

Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 3

MapReduce abstract model

2004 - Google introduced the MapReduce abstract

model, based on two operations, map and reduce,

riginally from functional programming languages
Simplicity and scalability for developing software to

process large datasets

Aimed, but not limited, to distributed environments

Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 4

Introduction Background Related Work Unified Interface Evaluation Conclusions

MapReduce job execution flow - Dean and Ghemawat (2004, p. 3)

SLIDE 5

MapReduce abstract model

"Many different implementations of the MapReduce interface are possible. The right choice depends on the environment. For example, one implementation may be suitable for a small shared-memory machine, another for a large NUMA multi-processor, and yet another for an even larger collection of networked machines”

Dean and Ghemawat (2004, p. 3)

Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 6

2007 - Phoenix 2009 - Phoenix Rebirth 2005 - Hadoop

MapReduce implementations

2004 - MapReduce original publication 2010 - Tiled-MapReduce 2011 - Phoenix++

Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 7

2007 - Phoenix 2009 - Phoenix Rebirth 2005 - Hadoop

MapReduce implementations

2004 - MapReduce original publication 2010 - Tiled-MapReduce 2011 - Phoenix++

Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 8

Language: Java
Mapper and Reducer
Writable
InputFormatReader
RecordReader

Hadoop interface components

Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 9

Phoenix++

Language: C++
Efficient key-value storage
Modular storage options: Containers
Effective combiner stage
Aggressively call combiner after every map emit

Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 10

Phoenix++

Modular storage options
Specialized Container types

Key Distribution Sample applications Container type

:

Word Count variable-size hash table

*:k

Histogram, Linear Regression, K-means, String Match array with fixed mapping

1:1

Matrix Multiplication, PCA shared array

Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 11

Performance comparison

Hadoop vs Phoenix++

Experiment of 1 GB word count using Phoenix++ and Hadoop on a multi-core architecture. The y-axis is in a logarithmic scale.

Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 12

Important researches on improving Hadoop for

high performance at the single-node level.

No research was found on building a unified

MapReduce programming interface.

Related work

Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 13

Hone Appuswamy et al. Azwraith Phoenix++ Hadoop

Abstraction

Performance on shared-memory

Phoenix Phoenix 2 Tiled-MapReduce

Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 14

Unified MapReduce programming interface

One single programming interface
Transformation rules for Hadoop and Phoenix++

programming interfaces

Shared-memory and distributed state-of-the-art

solutions

Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 15

Focus on MapReduce logic
Abstraction capable of keeping key performance

components

Able to be hereafter extended to comprehend new

solutions and architectures (e.g., GPGPUs)

Unified MapReduce programming interface

Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 16

Unified MapReduce programming interface

@MapReduce<NAME, K_IN , V_IN , K_OUT , V_OUT , K_DIST > { @Map(key, value){  // Map code logic } @SumReducer  } @Type name(attr_name: attr_type, …)

Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 17

Unified MapReduce programming interface

@MapReduce<NAME, K_IN , V_IN , K_OUT , V_OUT , K_DIST > { … @Reduce(key, values){  double product = 1 for(int i=0; i < length(values); i++) product *= values [ i ] emit(key, product) } }

Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 18

Stage Elements First imports/includes Second @MapReduce @Map @Reduce @Type global variables Third unsolved keywords Fourth variable types Fifth functions

Transformation process

Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 19

@type pixel(r: ushort, g: ushort, b: ushort) @MapReduce<HistogramMR, long, pixel, int, ulonglong, "*:768"> @Map(key, p) emit(p.b, 1) emit(p.g+256, 1) emit(p.r+512, 1) @SumReducer

Unified interface - Histogram

Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 20

public class HistogramMR { public static class Map extends Mapper<LongWritable, Pixel, IntWritable, LongWritable> { private final static LongWritable one = new LongWritable(1); @Override public void map(LongWritable key, Pixel p, Context context) throws IOException, InterruptedException { context.write(new IntWritable(p.getR()), one); context.write(new IntWritable(p.getG() + 256), one); context.write(new IntWritable(p.getB() + 512), one); } } }

Hadoop interface - Histogram

Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 21

class HistogramMR : public MapReduceSort<HistogramMR, pixel, intptr_t, uint64_t, array_container<intptr_t, uint64_t, sum_combiner, 768 #ifdef TBB , tbb::scalable_allocator #endif > > { public: void map(data_type const& value, map_container& out) const { emit_intermediate(out, value.b, 1); emit_intermediate(out, value.g+256, 1); emit_intermediate(out, value.r+512, 1); } };

Phoenix++ interface - Histogram

Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 22

@MapReduce<WordCountMR, long, text, string, int> @Map(key, value) toupper(value) tokenize(value) emit(token, 1) @SumReducer

Unified interface - WordCount

Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 23

public class WordCountMR { public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } …

Hadoop interface - WordCount

Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 24

C++ includes ± 6 lines
MapReduce blocks ± 25 lines
Custom split ± 24 lines
Custom types - C++ struct ± 34 lines
TOTAL 89 lines

Phoenix++ interface - WordCount

Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 25

10 20 30 40 50 Histogram Kmeans Linear Regression Word Count Word Length

Application Execution time in seconds

Version Generated Original

Mean execution time in seconds for original and generated Hadoop code (30 executions)

Performance evaluation - Hadoop

Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 26

Mean execution time in seconds for original and generated Phoenix++ code (30 executions)

Performance evaluation - Phoenix++

2 4 6 Histogram Kmeans Linear Regression Word Count Word Length

Application Execution time in seconds

Version Generated Original Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 27

SLOCCount

Source Lines of Code counting
Effort estimate based on COCOMO model

Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 28

SLOC count and reduction

Application Phoenix++ Hadoop Unified Interface Reduction compared to Phoenix++ Reduction compared to Hadoop WordCount 89 27 8 91.01% 70.37% WordLength 95 33 14 85.26% 57.58% Histogram 22 170 9 59.09% 94.71% K-means 98 244 57 41.84% 76.64% Linear Regression 31 171 18 41.94% 89.47%

Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 29

SLOC count and reduction

Application Phoenix++ Hadoop Unified Interface Reduction compared to Phoenix++ Reduction compared to Hadoop WordCount 89 27 8 91.01% 70.37% WordLength 95 33 14 85.26% 57.58% Histogram 22 170 9 59.09% 94.71% K-means 98 244 57 41.84% 76.64% Linear Regression 31 171 18 41.94% 89.47%

Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 30

SLOC reduction for the interface version with curly braces

SLOC reduction

41.84% 41.94% 57.58% 59.09% 70.37% 76.64% 85.26% 89.47% 91.01% 94.71% Histogram K−means Linear Regression Word Count Word Length

Application Reduced SLOC

Framework Hadoop Phoenix++ Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 31

Hone Appuswamy et al. Azwraith Phoenix++ Hadoop

Abstraction

Performance on shared-memory

Unified Interface Phoenix Phoenix 2 Tiled-MapReduce

Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 32

Conclusions

MapReduce implementations for lower level

architectures, particularly, lose MapReduce's

riginally aimed abstraction
Through a comprehensive set of transformation rules

it is possible to effectively cover the components of Phoenix++ and Hadoop’s programming interfaces

Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 33

Conclusions

Performance evaluation shows less than 3% of

variance from original and generated versions for all sample applications

A SLOC and effort reduction from 41.84% and up

to 96.48% is achieved

Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 34

Conclusions

Code written with the proposed unified interface

can be reused for addressing different architectures

Phoenix++ provides some optimizations for NUMA

architectures, which are not supported by the transformation rules

Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 35

Future work

The effective construction of the compiler and

code generator based on the proposed transformation rules

The extension of transformation rules for

compatibility with MapReduce solutions for different architectures (e.g., GPGPUs)

Introduction Background Related Work Unified Interface Evaluation Conclusions

SLIDE 36

Daniel Adornes, Dalvan Griebler, Cleverson Ledur, Luiz Gustavo Fernandes Pontifical Catholic University of Rio Grande do Sul (PUCRS), 

Faculty of Informatics (FACIN), Computer Science Graduate Program (PPGCC) Parallel Application Modeling Group (GMAP)

SEKE 2015

A Unified MapReduce Domain-Specific Language for Distributed and Shared Memory Architectures

Wyndham Pittsburgh University Center

A Unified MapReduce Domain-Specific Language for Distributed and Shared Memory Architectures

more parallel

local memory is not enough and distributed systems become a necessity

complexity

Introduction

MapReduce abstract model

model, based on two operations, map and reduce,

process large datasets

MapReduce abstract model

2007 - Phoenix 2009 - Phoenix Rebirth 2005 - Hadoop

MapReduce implementations

2004 - MapReduce original publication 2010 - Tiled-MapReduce 2011 - Phoenix++

2007 - Phoenix 2009 - Phoenix Rebirth 2005 - Hadoop

MapReduce implementations

2004 - MapReduce original publication 2010 - Tiled-MapReduce 2011 - Phoenix++

Hadoop interface components

Phoenix++

Phoenix++

*:*

*:k

1:1

Performance comparison

Hadoop vs Phoenix++

high performance at the single-node level.

MapReduce programming interface.

Related work

Unified MapReduce programming interface

programming interfaces

solutions

components

solutions and architectures (e.g., GPGPUs)

Unified MapReduce programming interface

Unified MapReduce programming interface

Unified MapReduce programming interface

Transformation process

Unified interface - Histogram

Hadoop interface - Histogram

Phoenix++ interface - Histogram

Unified interface - WordCount

Hadoop interface - WordCount

Phoenix++ interface - WordCount

Performance evaluation - Hadoop

Performance evaluation - Phoenix++

SLOCCount

SLOC count and reduction

SLOC count and reduction

SLOC reduction

Conclusions

architectures, particularly, lose MapReduce's

it is possible to effectively cover the components of Phoenix++ and Hadoop’s programming interfaces

Conclusions

variance from original and generated versions for all sample applications

to 96.48% is achieved

Conclusions

can be reused for addressing different architectures

architectures, which are not supported by the transformation rules

Future work

code generator based on the proposed transformation rules

compatibility with MapReduce solutions for different architectures (e.g., GPGPUs)

A Unified MapReduce Domain-Specific Language for Distributed and Shared Memory Architectures

: