- M. Carmen Ruiz, Diego Pérez and Damas Gruska
CPA 2017: The 39th Commu mmunica icati ting ng Process ess Archit itectu ectures es Malta ta 20-23 23 August ust
M. Carmen Ruiz, Diego Prez and Damas Gruska CPA 2017: The 39th Commu - - PowerPoint PPT Presentation
M. Carmen Ruiz, Diego Prez and Damas Gruska CPA 2017: The 39th Commu mmunica icati ting ng Process ess Archit itectu ectures es Malta ta 20-23 23 August ust Outline 1. Motivation 2. Our work 3. Formal Modelling of Map/Reduce 4.
CPA 2017: The 39th Commu mmunica icati ting ng Process ess Archit itectu ectures es Malta ta 20-23 23 August ust
conduct a wide variety
research analysis and has demonstrated to be of great interest.
longitudinal analysis
this huge data becomes a Big-Data problem since the volume of this data is produced continuously all around the world.
Open–Source implementation (Hadoop)
become of interest in conjunction with Hadoop, such as high availability and distributed environment provisioning.
virtual cluster) in order to perform any Hadoop application execution.
dedicated to this task (number
account in order to minimise the cost since the number of virtual machines hired for a certain study is related to
The basis of Map/Reduce consists in splitting the input data into data chunks that are distributed to the worker nodes where they are processed. Later on, the results are combined and collected.
Sintaxis P ::= stop | a.P | < b, α > .P | P ⊕ P | P ||A P | recX.P Types of actions:
N = {N1, N2, ..., Nm} Nº of resources of each type Z = {Z1, Z2, ..., Zm} Zi = {b1, b2, ..., bi} actions which need resources of type i
Specification Wizard File Syntax Analyser Graph Generator Performance Evaluator Results
BTC Syntax Branch-and-bound techniques DBL Scheme Parallel computing Grid computing BTC Operational Semantics
Syntax Error System Specification
[[sys_Map_Red]]Z, N ≡ [[BLOCK || BLOCK || . . . || BLOCK || OVERLAP || SYN_CLEANUP || SYN_SETUP]] {act_worker}, {n} BLOCK ≡ SETUP . MAP . REDUCE . CLEAN SETUP ≡ < setup, tS > . synR.synRR MAP ≡ < act_worker > . synS . < recordReader, tRr > . < map, tM > . < act_worker > . synSS REDUCE ≡ < act_worker > . < shuffle, tSh > . < sort, tSrt > . < reduce, tR > . < output, tOpt > . < act_worker > CLEAN ≡ synC.synCC . < clean, tC > OVERLAP ≡ synS. . . . .synS.synSS.synSS. . . . .synSS SYN_CLEANUP ≡ synC. . . . .synC.synCC.synCC . . . .synCC SYN_SETUP ≡ synR. . . . .synR.synRR.synRR. . . .synRR
PRECONDITIONS OF EACH TRANSITION
Main Task Sub-Task Parameter Setup Setup tS Map Record Reader tRr Map tM Reduce Shuflle tSh Sort tSrt Reduce tR Output tOpt Clean Up Clean Up tC
the utmost performance with the minimum number of resources or minimum cost.
performed.
resource needed.
requirements.
type of application and the volume of data to be processed.
we chose a concrete application:
H.264. However, its successor, known as H.265 (or HEVC), has shown to improve H.264 and represents the future of video encoding.
Hadoop application, exploiting the distributed processing
the encoded video sequence used has been “BasketballDrill” (832x480).
sub-phases that make up the model of Map/Reduce.
Main Task Sub-Task Parameter Time (ms) Setup Setup tS
125 ms
Map Record Reader tRr
506 ms
Map tM
64044 ms
Reduce Shuflle tSh
16187 ms
Sort tSrt
500 ms
Reduce tR
75 ms
Output tOpt
65 ms
Clean Up Clean Up tC
125 ms
To replace the input parameters in the Map/Reduce Model To provide the number of data blocks. To establish the number of workers (variable n)
(include information into the model)
The result is the time that the application takes to analyse this amount of chunks.
Data has been obtained for different configurations (1 master VM + # workers VMs) Workers ers Execution cution Time Impr mprovemen ement 2 10m 51s 3 07m 20s 32,41% 4 05m 25s 26,14% 5 04m 35s 15,38% 6 03m 47s 17,45% 7 03m 30s 7,49% 8 02m 43s 22,38% 9 02m 28s 9,20% Workers ers Execution cution Time Impr mprovemen ement 10 02m 26s 1,35% 11 02m 26s 0% 12 02m 26s 0% 13 02m 26s 0% 14 02m 26s 0% 15 02m 26s 0% 16 01m 21s 44,52%
Real observation Formal model
media data.
is to translate the underlying social observation and analysis mechanisms into an embedded research tool that supports the development and execution of social media research analysis.
been split into 300 blocks of equal size.
model follows the specifications provided by Amazon EC2 provider M1.small instances, the hiring costs stated by Amazon are been considered.
Workers Execution Time Price 2 7h 15m 02s 43,92$ 3 4h 50m 05s 36,60$ 4 3h 37m 33s 36,60$ 5 2h 54m 03s 32,94$ 6 2h 25m 15s 38,43$ 7 2h 04m 45s 43,92$ 8 1h 49m 02s 32,94$ 9 1h 36m 59s 36,60$
analysis of 100 million Tweets (divided into 300 blocks) within the Amazon EC2 Cloud.
Data has been obtained for different configurations (1 master VM + # workers VMs)
managers to evaluate cost and performance in terms of the deployment strategy, or to choose the best deployment strategy in terms of the expected cost/performance, with the objective
costs restrictions.
In detail:
Hadoop module.
resources by helping on choosing the optimal resource hiring.
CPA 2017: The 39th Commu mmunica icati ting ng Process ess Archit itectu ectures es Malta ta 20-23 23 August ust