1/23
Distributed Systems Architecture Research Group Universidad Complutense de Madrid José Luis Vázquez-Poletti (on behalf of Eduardo Huedo)
GridWay Scalability and Interoperation for DRMAA codes Jos Luis - - PowerPoint PPT Presentation
GridWay Scalability and Interoperation for DRMAA codes Jos Luis Vzquez-Poletti (on behalf of Eduardo Huedo) Distributed Systems Architecture Research Group Universidad Complutense de Madrid 1/23 Contents 1. The GridWay Metascheduler 2.
1/23
Distributed Systems Architecture Research Group Universidad Complutense de Madrid José Luis Vázquez-Poletti (on behalf of Eduardo Huedo)
2/23
Contents
Interoperability
“The more man meditates upon good thoughts, the better will be his world and the world at large.”
3/23
What is GridWay?
GridWay is a Globus Toolkit component for meta-scheduling, creating a scheduler virtualization layer on top of Globus services (GRAM, MDS & GridFTP)
and guidelines for collaborative development.
and supports several OGF standards.
systems, supporting resource accounting and the definition of state-of-the-art scheduling policies.
assuring compatibility of applications with LRM systems that implement the standard, such as SGE, Condor, Torque,...
controlling jobs, that could be described using the OGF standard JSDL.
4/23
Global Architecture of a Computational Grid
PBS
GridWay
SGE
$>
CLI Results
.C, .java
DRMAA .C, .java Infrastructure Grid Middleware Applications Globus Grid Meta- Scheduler
Application-Infrastructure decoupling
5/23
GridWay Internals
Execution Manager Transfer Manager Information Manager Dispatch Manager Request Manager Scheduler Job Pool Host Pool DRMAA library CLI GridWay Core Grid File Transfer Services Grid Execution Services
GridFTP RFT pre-WS GRAM WS GRAM
Grid Information Services
MDS2 MDS2 GLUE MDS4
Resource Discovery Resource Monitoring Resource Discovery Resource Monitoring Job Preparation Job Termination Job Migration Job Preparation Job Termination Job Migration Job Submission Job Monitoring Job Control Job Migration Job Submission Job Monitoring Job Control Job Migration
6/23
What is DRMAA? Distributed Resource Management Application API
http://www.drmaa.org/
Open Grid Forum Standard Homogeneous interface to different Distributed Resource Managers (DRM):
SGE Condor PBS/Torque GridWay
C JAVA Perl (GW 5.2+) Ruby (GW 5.2+) Python (GW 5.2+)
7/23
C Binding
drmaa_run_job (job_id, DRMAA_JOBNAME_BUFFER-1, jt, error, DRMAA_ERROR_STRING_BUFFER-1);
8/23
Java Binding
session.runJob(jt);
9/23
Ruby Binding
(result, job_id, error)=drmaa_run_job(jt)
10/23
Python Binding
(result, job_id, error)=drmaa_run_job(jt)
Perl Binding
($result, $job_id, $error)=drmaa_run_job($jt);
11/23
Definition (by OGF GIN-CG)
directly via common open standards in the near future.
WSRF…
infrastructures to work together as a short-term solution. Two alternatives:
another".
GridWay provides both adapters (Middleware Access Drivers, MADs) and a gateway (GridGateWay, WSRF GRAM service encapsulating GridWay),. GridWay’s light concept helps to maintain Scalability.
12/23
How do we achieve interoperability
“A device that allows one system to connect to and work with another”
Users pre-WS / WS GridWay pre-WS / WS gLite gLite SGE Cluster PBS Cluster SGE Cluster PBS Cluster pre-WS / WS pre-WS / WS SGE Cluster PBS Cluster GridWay Users
(Virtual) Organization
Applications Middleware
13/23
EGEE
Middleware
brings together scientists and engineers from more than 240 institutions in 45 countries world-wide to provide a seamless Grid infrastructure for e-Science that is available to scientists 24 hours-a-day.
14/23
Open Science Grid
Middleware
and storage resources into a uniform shared cyberinfrastructure for large-scale scientific research. It is built and operated by a consortium of universities, national laboratories, scientific collaborations and software developers.
15/23
TeraGrid
Middleware
class resources at eleven partner sites to create an integrated, persistent computational resource
16/23
Application Description
(CNIO)
sets, UniRef90 and UniRef50.
17/23
CD-HIT Parallel
compare each division in parallel
hit)
this one (cd-hit-2d)
with larger databases
A90 B-A C-A D-A C-AB D-AB D-ABC B90 C90 A B C D D90 DB DB90 Merge Div.
18/23
C-A D-A C-AB D-AB D-ABC C90 cd-hit-div D90 B90 A90 Front-end
PBS Database division/merging is performed in the front-end
underlying DRMS
merge
19/23
PBS
C-AB D-AB D-ABC C90 C-A D-A D-AB D90 cd-hit-div merge C90 B90
Merge sequential tasks to reduce
Provide a uniform interface (DRMAA) to interact with different DRMS. Some file manipulation still needed
DRMS
GridWay
To your local cluster EGEE TG OSG
20/23
Running with 10 divisions
21/23
Job States - Running with 14 divisions
22/23
Who’s behind the GridWay Metascheduler?
GridWay
23/23
Questions?