Hadoop Map Reduce
1
Hadoop Map Reduce 1 MapReduce 2-in-1 A programming paradigm A - - PowerPoint PPT Presentation
Hadoop Map Reduce 1 MapReduce 2-in-1 A programming paradigm A query execution engine A kind of functional programming We focus on the MapReduce execution engine of Hadoop through YARN 2 Logical View of MapReduce During MapReduce, the
1
2
3
Map Reduce
4
5
6
Driver
Slave nodes Master node Developer
MR Program MR Job
7
Driver Job submission Job preparation Map Shuffle Reduce Cleanup
8
Key: String Value: String Input hdfs://user/eldawy/README.txt Output hdfs://user/eldawy/wordcount Mapper edu.ucr.cs.cs167.eldawy.WordCount Reducer … JAR File … User-defined User-defined
9
Master node
Serialized over network
10
11
Configuration JAR File
Master node
HDFS InputFormat#getSplits() Split1 Split2 .. SplitM Mapper1 Mapper2 .. MapperM FileInputSplit Path Start End
12
13
Master node
IS1 IS2 IS3 IS4 IS5 ISM …
Input Splits (Map tasks)
14
15
16
17
18
Map1 Map2 Map3 MapM … Reduce1 Reduce2 ReduceN …
Mapi
19
Input Split map k v k v k v k v k v k v k v Partition k v k v k v k v k v k v k v k v k v kA kZ k v k v k v k v k v k v k v k v k v k v k v k v k v k v k v k v k v k v k v k v k v k v k v k v k v k v k v Reduce1 Reduce2 ReduceN … 1 N-1 1 N-1 1 N-1 1 N-1
20
Reducej Map1 Map2 Map3 MapM … Copy Sort Reduce part1 part2 part3 partM k v k v k v
k v k v k v k v k v k v k v
21
k1 v k1 v k2 v k2 v k3 v k3 v k3 v
reduce reduce reduce
k… v
kN v kN v kN v kN v kN v
reduce reduce
22
23