Mixing Cloud and Grid Resources for Many Task Computing
David Abramson
Monash e-Science and Grid Engineering Lab (MeSsAGE Lab) Faculty of Information Technology Science Director: Monash e-Research Centre ARC Professorial Fellow
1
Mixing Cloud and Grid Resources for Many Task Computing David - - PowerPoint PPT Presentation
Mixing Cloud and Grid Resources for Many Task Computing David Abramson Monash e-Science and Grid Engineering Lab (MeSsAGE Lab) Faculty of Information Technology Science Director: Monash e-Research Centre ARC Professorial Fellow 1 Introduction
Monash e-Science and Grid Engineering Lab (MeSsAGE Lab) Faculty of Information Technology Science Director: Monash e-Research Centre ARC Professorial Fellow
1
4
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.5 1 1.5 2 2.5 3 Radius r/a.u.
A1, A2 B1, B2
C C H H H HH H F C H H H
2 2 2 2 1 1 eff
exp exp r B A r B A r U
Results Results Nimrod/O Results
9
Nimrod/G Grid Middleware Nimrod/O Nimrod/E Nimrod Portal Actuators Plan File
parameter pressure float range from 5000 to 6000 points 4 parameter concent float range from 0.002 to 0.005 points 2 parameter material text select anyof “Fe” “Al” task main copy compModel node:compModel copy inputFile.skel node:inputFile.skel node:substitute inputFile.skel inputFile node:execute ./compModel < inputFile > results copy node:results results.$jobname endtask
10
Prepare Jobs using Portal Jobs Scheduled Executed Dynamically Sent to available machines Results displayed & interpreted
13 Antenna Design Drug Docking Aerofoil Design Aerofoil Design
Nimrod/G Client
Nimrod/G Client Grid Information Server(s) RM & TS RM: Local Resource Manager, TS: Trade Server G L Legion enabled node. C Condor enabled node. Nimrod/G GUI
Enfuzion API + Database Level 3 Level 2 Level 1
Generato r Creator
Run File Plan File
Job Scheduler Agent Scheduler DB Server Agent Globus Actuator Condor Actuator Legion Actuator
Globus enabled node
RM & TS RM & TS Agent Agent
Grid Middleware
Authentication GUI Vergil SMS Kepler Core Extensions Ptolemy
…Kepler GUI Extensions…
Actor&Data SEARCHType System Ext
Provenance Framework Kepler Object Manager
Documentation
Smart Re-run / Failure Recovery
17
Nimrod Director
Clone 1 Clone 2 Clone 3 Actor
Clone 1 Clone 2 Clone 3
Kepler
a computer with 8 cores (2 x quads)
8 times faster
No actor parameters need setting No difference from the parameter sweep actors
Domain Definer Points Generator Optimizer Constraint Enforcer Execute Model
F(x,y,z,w,…)
30
parallel execution
search
– Built from key components
algorithms
34
2 4 6 8 10 12 1 3 4 6 8 9 10 12 14 15 17 19 20 21 22 24 25 27 28 30 31 33 34 36 37 38 40 41 43 44 46 47 49 51 52 54Time (minutes) Jobs
Linux cluster - Monash (20) Sun - ANL (5) SP2 - ANL (5) SGI - ANL (15) SGI - ISI (10)
Soft real-time scheduling problem
2 4 6 8 10 12 1 3 4 6 8 9 10 12 14 15 17 19 20 21 22 24 25 27 28 30 31 33 34 36 37 38 40 41 43 44 46 47 49 51 52 54
Time (minutes) Jobs
Linux cluster - Monash (20) Sun - ANL (5) SP2 - ANL (5) SGI - ANL (15) SGI - ISI (10)
2 4 6 8 10 12 3 4 7 8 10 13 15 17 19 21 23 26 28 31 32 35 37 39 41 43 46 48 50 53 55 57 60
Time (minutes) Jobs
Linux cluster - Monash (5) Sun - ANL (10) SP2 - ANL (10) SGI - ANL (15) SGI - ISI (20)
2 4 6 8 10 12 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72
Time (in Minute)
Condor-Monash Linux-Prosecco-CNR Linux-Barbera-CNR Solaris/Ultas2-TITech SGI-ISI Sun-ANL
2 4 6 8 10 12 14 6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 96 102 108 114
Time (in Minute)
Condor-Monash Linux-Prosecco-CNR Linux-Barbera-CNR Solaris/Ultas2-TITech SGI-ISI Sun-ANL
39 Your Java Service Your Java Service
RFT GRAM Delegation Index Trigger Archiver CAS OGSA-DAI GTCP
SERVER Globus 4.0 Services
Deployment High Performance Virtualization Grid Deploy Aware Clients
CLIENT
Deployment Service
Intermediate Code Application Binary Application Handle
GRAM
Application Handle
Installed Applications
Install Execute
Application Source
.NET Compilers
Client Machine Grid Resource
.NET Runtime
.NET Parallel Virtual Machine Globus/OGSA
41
Configured Application User Security Scope Remote Host Local Host Ant Build File
1
DistAnt Deployment Client
3
Un-configured Files
Globus User Hosting Environment DistAnt Service Managed Job Service (GRAM)
4 6 5
Instantiated Application
4
Application Files
2 6
RSL
Reliable File Transfer Service (GridFTP)
42
Virtual Machine Native OS and Interconnect HPC Application Runtime Core
System Libraries
System.HPC HPC Comm
Runtime-Internal
(Our Approach)
Virtualization
System Libraries Runtime Core Virtual Machine
HPC Comm
HPC Application
Managed to Native Bindings
Native OS and Interconnect
Runtime-External
(Existing Approach)
44
Nimrod Actuator, e.g., SGE, PBS, LSF, Condor Local Batch System Jobs / Nimrod experiment
45
– (Clusters vs Grid)
– Build once, run everywhere
– Deadline driven
– Budget driven
possible.”
– Scale-out to supplement locally and nationally available resources
46
48
Grid Middleware
Agents
Nimrod/G Portal Nimrod-O/E/K Jobs / Nimrod experiment Actuator: Globus,... Services New actuators: EC2, Azure, IBM, OCCI?,...?
VM
Agents
VM
Agents
VM
Agents
RESTful IaaS API
49
50
51
52
53
54
Azure Blob Queue Nimrod Server Azure Actuator
Nimrod Experiment Agent cspkg
Blob Blob
55
56
Azure Queue Nimrod Server Azure Actuator
Agent params
Blob Blob Worker Worker Worker Worker Workers
Agent User app/s
– Jeff Tan – Maria Indrawan
– Blair Bethwaite – Slavisa Garic – Jin Chao
– Rob Gray
– Shahaan Ayyub – Philip Chan – Colin Enticott – ABM Russell – Steve Quinette – Ngoc Dinh (Minh)
– Greg Watson – Rajkumar Buyya – Andrew Lewis – Nam Tran – Wojtek Goscinski – Aaron Searle – Tim Ho – Donny Kurniawan – Tirath Ramdas
– Amazon – Axceleon – Australian Partnership for Advanced Computing (APAC) – Australian Research Council – Cray Inc – CRC for Enterprise Distributed Systems (DSTC) – GrangeNet (DCITA) – Hewlett Packard – IBM – Microsoft – Sun Microsystems – US Department of Energy 61