ISGC 2004 Taipei
ARDA: Technology for Distributed Analysis
Jakub Moscicki
ARDA Project Jakub.Moscicki@cern.ch
http://cern.ch/arda www.eu-egee.org cern.ch/lcg
EGEE is a project funded by the European Union under contract IST-2003-508833
ARDA: Technology for Distributed Analysis http://cern.ch/arda - - PowerPoint PPT Presentation
ISGC 2004 Taipei ARDA: Technology for Distributed Analysis http://cern.ch/arda Jakub Moscicki ARDA Project Jakub.Moscicki@cern.ch www.eu-egee.org cern.ch/lcg EGEE is a project funded by the European Union under contract IST-2003-508833
ISGC 2004 Taipei
ARDA Project Jakub.Moscicki@cern.ch
http://cern.ch/arda www.eu-egee.org cern.ch/lcg
EGEE is a project funded by the European Union under contract IST-2003-508833
ISGC 2004 Taipei- 2
Service Interaction and Architecture Data Management Files on the GRID Metadata Catalogs GRID Services and Databases Interactivity on the GRID Connectivity & Protocols
ISGC 2004 Taipei- 3
LHC Experiments' Software
ISGC 2004 Taipei- 4
ARDA-Alice ARDA-Atlas ARDA-CMS ARDA-LHCb
Taiwan, Russia
POOL, SEAL, ROOT
ISGC 2004 Taipei- 5
functionality and coherence
scientific community (not only HEP)
ISGC 2004 Taipei- 6
ISGC 2004 Taipei- 7
Regular Services (Metadata Catalog,....)
if mapped one-to-one to the WSDL
– e.g. paging of query results
– langauge bindings, platform compatibility,...
“Interoperability Service”
WSDL
API
WSDL
ISGC 2004 Taipei- 8
Client Proxy
– authorisation to access different services
Entry Point (Bootstraping)
client File Catalog Metadata Replica Workload
ISGC 2004 Taipei- 9
libgliteUI *Ssl-poolserver
ROOT gshell
gLite/AliEn Services
SHM-Bus session Authentication
gLite Authen. (GSI, PAM)
libgliteIO
Iod
File Access Services (via catalogue)
T
S S I/O UI
C2PERL gLite.pl *Mtpoolserver C2PERL gLite.pl *Mtpoolserver C2PERL gLite.pl *Mtpoolserver C2PERL gLite.pl Embedded Perl module
Shell commands
ISGC 2004 Taipei- 10
ability to replace implementation of components
Interface
Interface
ISGC 2004 Taipei- 11
ISGC 2004 Taipei- 12
Categories of files
– File GUID lifecycle and management – Versioning
Role of LFN:
Persistent references
ISGC 2004 Taipei- 13
Filesystem Metadata: property of File Catalog Physics Metadata: in a separate database
Many parallel developments in LHC experiments
Generic metadata service
– any objects = datasets, applications, users,...
ISGC 2004 Taipei- 14
ISGC 2004 Taipei- 15
Reliable file transfer Misuse protection Example: TMDB (CMS)
RefDB McRunjob
T0 worker nodes GDB castor pool
Tapes Export Buffers Transfer agent RLS TMDB
Reconstruct ion instructions Reconstruction jobs Reconstructed data Reconstructed data Checks what has arrived Updates Updates Summaries of successful jobs
CMS DC04 production
ISGC 2004 Taipei- 16
– 2x PIII, 2.40GHz, 1GB RAM
ISGC 2004 Taipei- 17
Example: RFT (Reliable File Transfer, GT3)
Interface
Interface
ISGC 2004 Taipei- 18
encoding of binary data and complex structures: overhead 10x floating point representation: parser dependant SOAP/https encryption: overhead 30%
Grid Access Library (ARDA-Alice)
grid command encoding+encryption
SOAP Body
Session ID Session Crypto(t)
SOAP Body
Sym.Cipher SSLEncryption
SOAP String Call XY Arg1 = '...'; Arg2 = '...'; Arg3 = '...';
C O D E C / S e r i a l i z e r UU Encoding
UU
Session ID Session Crypto Session ID Session Crypto
ISGC 2004 Taipei- 19
ISGC 2004 Taipei- 20
SOAP-Proxy
Meta-Data (MySQL) User User User
ISGC 2004 Taipei- 21
10 20 30 40 50 60 20 40 60 80 100 120 140 Time to completion Number of rows selected
100 40 150 50 30 Clients 20 10 5 1
Clients per Second
100 150 Rows 50 20 10 5
0.1 1 10 100 20 40 60 80 100 120 140
AMI behaviour using many concurrent clients:
– Note that Web Services are “stateless” (not automatic handles to have the concept of session, transaction, etc…): 1 query = 1 (full) response
ISGC 2004 Taipei- 22
CMS Metadata and Bookkeeping Database
MySQL backend PHP script frontend
Result handling:
PHP proxy
Meta-Data (MySQL) User User User
ISGC 2004 Taipei- 23
ISGC 2004 Taipei- 24
Oracle backend XML-RPC frontend
ISGC 2004 Taipei- 25
5 10 15 20 25 30 50 100 150 200 250 Number of rows selected Time to completion(sec)
1 Client 5 Clients 10 Clients 20 Clients 30 Clients 40 Clients 50 Clients 100 Clients
10 20 30 40 50 50 100 150 200 250 300 350 400 450 500 550 The Number of Clients Response Time(s)
Unoptimized SQL access
200 rows = 180KB
ISGC 2004 Taipei- 26
problems:
– ACID – transactions – timeouts – large queries
benefits
– simple user query may map to a complex ‘optimized’ internal query
ISGC 2004 Taipei- 27
ISGC 2004 Taipei- 28
SOAP over Jabber (IM)
Outbound Connectivity Required
Examples
Security models
SOAP over http is connectionless
ISGC 2004 Taipei- 29
Application packager Workflow editor Production editor
Production manager
Production DB Production preparation
Edit Instantiate workflow Create application tar file
Central Services
Monitoring DB Bookkeeping DB
Production resources Agent A
Site A
Agent n
Site n
Job XML Job status Meta XML
Castor MSS CERN
Dataset replica
Agent B
Site B
Central Storage
Job request
Bookkeeping Service Monitoring Service Production Service
ISGC 2004 Taipei- 30
USER SESSION USER SESSION
PROOF PROOF SLAVES
SLAVES
TcpRouter
PROOF PROOF
PROOF PROOF SLAVES
SLAVES
PROOF MASTER PROOF MASTER SERVER
SERVER
PROOF PROOF SLAVES
SLAVES
TcpRouter TcpRouter TcpRouter
no support for hierarchical Grid infrastructure, only local cluster mode.
ISGC 2004 Taipei- 31
active feedback from beta-testers and users discussion plaform for development teams technology & ideas exchange medium
building technology knowledge-base sharing ideas and implementations Towards usable end-to-end prototype of Distributed Analysis ARDA
ISGC 2004 Taipei- 32