Predrag BUNCIC, Thorsten KOLLEGER & Pierre VANDE VYVRE - - PowerPoint PPT Presentation

predrag buncic thorsten kolleger pierre vande vyvre
SMART_READER_LITE
LIVE PREVIEW

Predrag BUNCIC, Thorsten KOLLEGER & Pierre VANDE VYVRE - - PowerPoint PPT Presentation

Predrag BUNCIC, Thorsten KOLLEGER & Pierre VANDE VYVRE ALICE-USA, May 2013, CERN Requirements and Strategy Focus of ALICE upgrade on physics probes requiring high statistics: sample 10 nb -1 Online System Requirements Sample full


slide-1
SLIDE 1

Predrag BUNCIC, Thorsten KOLLEGER & Pierre VANDE VYVRE

ALICE-USA, May 2013, CERN

slide-2
SLIDE 2

ALICE USA - 2013-05

  • P. Buncic, T. Kollegger, P. Vande Vyvre

2

  • Focus of ALICE upgrade on physics probes

requiring high statistics: sample 10 nb-1

  • Online System Requirements
  • Sample full 50kHz Pb-Pb interaction rate

(current limit at ~500Hz, factor 100 increase)

  •  ~1.1 TByte/s detector readout but the ALICE data processing power

capacity will not allow more than a Storage bandwidth of ~20 GByte/s

  • Many physics probes: classical trigger/event filter approach not efficient
  • Massive data volume reduction
  • Data reduction by (partial) online reconstruction and compression
  • Store only reconstruction results, discard raw data
  • Demonstrated with TPC clustering since Pb-Pb 2011
  • Optimized data structures for lossless compression
  • Algorithms designed to allow for offline reconstruction passes with

improved calibrations

  •  Implies much tighter coupling between online and
  • ffline reconstruction software

Requirements and Strategy

slide-3
SLIDE 3

ALICE USA - 2013-05

  • P. Buncic, T. Kollegger, P. Vande Vyvre

3

  • 3 projects: DAQ, HLT, Offline
  • LS1, Run 2 : Prepare (update) and operate 3 independent systems
  • LS2 upgrade (re-design and implement)

“Upgrade of the ALICE Experiment”, Letter Of Intent (LoI), CERN-LHCC-2012-12)

  • Run 3 : 1 common new online and offline computing system
  • LS2 upgrade
  • Software panel
  • Common Technical Design Report (TDR) by September 2014
  • Common computing farm and software framework in production by 2018

Online-Offline Computing (O2)

Operate Common System

Run 1

2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021

LS1 Run 2 LS2 Run 3

2022 2023 2024-26

LS3 Run 4

DAQ Prepare Operate LS2 Upgrade TDR Panel HLT Prepare Operate OFF Prepare Operate

slide-4
SLIDE 4

ALICE USA - 2013-05

  • P. Buncic, T. Kollegger, P. Vande Vyvre

4

  • Panel (6 people of DAQ, HLT, offline)
  • Worked on how to go about a

common software framework

  • Started in March and final report in December 2012
  • The final report is the basis of our future work
  • Success of the panel also successfully experienced the way we

will work in the future

  • Computing Working Groups (CWGs)
  • People from DAQ, HLT, offline and detector groups according to

needs and availability

  • Working together on a common topic
  • Delivering a result (report, procedure, code) adopted by all

Software Panel

slide-5
SLIDE 5

ALICE USA - 2013-05

  • P. Buncic, T. Kollegger, P. Vande Vyvre

5

O2 Computing

  • 11 Computing Working Groups (CWGs) have started to work
  • 2 CWGs will start later
  • More could be added

Institution Boards

Computing Board Online Institution Board

Computing Working Groups Projects O2 Steering Board

Project Leaders DAQ

CWG1 Architecture CWG2 Procedure & Tools

HLT

CWG3 DataFlow CWG4 Data Model

Offline

CWG5 Platforms CWG6 Calibration CWG13 Sw Framework CWGnn

  • CWG7

Reconstruc. CWG8 Simulation CWGnn

  • CWGnn
  • CWGnn
  • CWGnn
  • CWG9

QA, DQM, Vi CWG10 Control CWG11 Sw Lifecycle CWG12 Hardware

slide-6
SLIDE 6

ALICE USA - 2013-05

  • P. Buncic, T. Kollegger, P. Vande Vyvre

6

Computing Working Groups (1)

Computing Working Group Main tasks and topics Milestones & Deliverables 1 Architecture, Framework & Distrib. Comput.

  • Propose the general architecture including the distributed

computing Q4 2013 Online system conceptual design note

  • Design the overal framework, define components and

interfaces Q2 2014 Sw fw architecture proposal 2 Tools, guidelines, procedures

  • Propose survey and evaluation procedures

Q2 2013 Procedures and policies draft prop.

  • Propose common guidelines and policies

Q3 2013 Procedures and policies proposal

  • Select tools to implement procedures, policies and

development environment Q3 2013 Common tools proposal 3 Dataflow & ‌ Condition Data

  • Design and develop mechanisms to exchange & share

physics & conditions data Q3 2013 Dataflow simulation

  • Communication, data transport, dataflow control and

config, Shuttle, OCDB Q4 2013 Dataflow design proposal

  • Computing system simulation

Q2 2014 Dataflow demonstrator 4 Data model

  • Study the functional and performance aspects of the data

model Q42013 Data model proposal

  • Data format and data exchange

Q2 2014 Data model demonstrator

  • Data compression

5 Computing Platforms

  • Study parallel technologies and alternative platforms

(Atom, ARM, etc) Q3 2013 Define list of platforms and benchmarks

  • Define contraints for an effective usage of parallel

technologies Q4 2013 Guidelines use of parallel plaforms hw+sw

  • Assess the advantages of sw contructions e.g. threads,

processes Q2 2014 Proposal on the use of parallelism 10 Control, config. & monitoring

  • Control: Experiment control system

Q4 2013 Survey of methods and

  • Configuration

tools for the exp. Control

  • Bookeeping and monitoring : Alarm system and eLogBook

Q2 2014 Tools selection + demonstrator 12 Computing hardware

  • Test and recommend commercial hw
  • Design and develop custom hw

Q4 2013 First tests os the key technologies

  • Purchase, install and administrate the reference and

production systems Q2 2014 A few options for each hw item

slide-7
SLIDE 7

ALICE USA - 2013-05

  • P. Buncic, T. Kollegger, P. Vande Vyvre

7

Computing Working Groups (2)

Computing Working Group Main tasks and topics Milestones & Deliverables 6 Calibration

  • On-line calibration

In close contact with PWG-PP Calibration

  • Asses the feasibility of automatic calibration procedures

7 Reconstruction

  • Study the online frame-based reconstruction

In close contact with PWG-PP Tracking and Alignment

  • Asses the feasibility of time frame based reconstruction

8 Physics simulation

  • Study the physics simulation inc. on time-frames and

continuous readout In close contact with PWG-PP Embedding & Mix, Monte-Carlo

  • Physics simulation packages

9 QA, DQM, visualization

  • Online and offline DQM and QA

In close contact with PWG-PP QA

  • Data visualization, event display

11 Software Lifecycle

  • Implement the complete software lifecycle : development,

releases, QC Q3 2013 Lifecycle draft proposal

  • Implement policies defined by CWG2 on the reference

system Q4 2013 Lifecycle proposal

  • Code quality enforcement system

Q1 2014 Reference system in production 13 Software framework

  • Design and develop an analysis framework and associated

tools Q4 2013 Design note

  • Define an AOD data format

Q2 2014 Demonstrate the improvement compared

  • Study the global performances for batch and interactive

to the present framework

slide-8
SLIDE 8

ALICE USA - 2013-05

  • P. Buncic, T. Kollegger, P. Vande Vyvre

8

  • TDR
  • Procedure, internal notes
  • Technology choice or options
  • Algorithms
  • Demonstrators, prototypes
  • TDR section
  • O2 deliverables for the future system
  • Custom component (hw, fw, sw)
  • Commercial component (hw, fw, sw)

Deliverables & Milestones

slide-9
SLIDE 9

ALICE USA - 2013-05

  • P. Buncic, T. Kollegger, P. Vande Vyvre

9

  • Institutes
  • FIAS, Frankfurt, Germany
  • IIT, Mumbay, India
  • IPNO, Orsay, France
  • IRI, Frankfurt, Germany
  • Rudjer Bošković Institute, Zagreb, Croatia
  • SUP, Sao Paulo, Brasil
  • University Of Technology, Warsaw, Poland
  • Wiegner Institute, Budapest, Hungary
  • CERN, Geneva, Switzerland
  • Looking for more people
  • Need people with computing skills and from detector groups
  • CWG’s membership is neither closed nor rigid:
  • New members more than welcome to join
  • Must be explicit and agreed by the group leader

O2 Institutes

slide-10
SLIDE 10

ALICE USA - 2013-05

  • P. Buncic, T. Kollegger, P. Vande Vyvre

10

Item Cost [MCHF] DDL Fibres Detector read-out fibres 0.9 EPN Event Processing Nodes 4.1 FLP and C-RORC Detector read-out nodes 0.9 Infrastructure Racks, power, cooling 1.3 Networks Network equipment at Point 2 0.8 Servers Control, configuration, monitoring 0.5 Storage Data storage 0.6 Central DCS 0.2 Total (50 kHz) Total for a 50 kHz read-out rate 9.3 Total (100 kHz) Total for a 100 kHz read-out rate 15.1

O2 Budget

Year Fraction [%] 2017 32 2018 29 2019 39

  • O2 budget not yet fully covered
  • Draft spending profile 2017-19
slide-11
SLIDE 11

ALICE USA - 2013-05

  • P. Buncic, T. Kollegger, P. Vande Vyvre

11

  • Mailing lists
  • https://e-groups.cern.ch/
  • Lists: alice-o2-xxx
  • Mailing lists archives
  • https://groups.cern.ch/group/alice-o2-cwgxx/
  • Accessible by all ALICE members

Logistics: Mailing Lists

slide-12
SLIDE 12

ALICE USA - 2013-05

  • P. Buncic, T. Kollegger, P. Vande Vyvre

12

  • CWG2 is defining now in more details

the procedures and tools of the O2 project

  • Some tools selected in order to start the work
  • Indico O2 area: https://indico.cern.ch/categoryDisplay.py?categId=4601
  • Wiki O2 area: https://twiki.cern.ch/twiki/bin/viewauth/ALICE/AliceO2

Logistics: Indico and Twiki

slide-13
SLIDE 13

Thanks!

slide-14
SLIDE 14

ALICE USA - 2013-05

  • P. Buncic, T. Kollegger, P. Vande Vyvre

14

  • Expected data sizes for minimum bias Pb-Pb collisions

at full LHC energy

Event Size

Detector Event Size (MByte) After Zero Suppressi

  • n

After Data Compressi

  • n

TPC 20.0 1.0 TRD 1.6 0.2 ITS 0.8 0.2 Others 0.5 0.25 Total 22.9 1.65

slide-15
SLIDE 15

ALICE USA - 2013-05

  • P. Buncic, T. Kollegger, P. Vande Vyvre

15

  • First steps up to clustering on FEE/FPNs (RORC FPGA)

Further steps require full event reconstruction on EPNs, pattern recognition requires only coarse online calibration

TPC Data Reduction

Data Format Data Reduction Factor Event Size (MByte) Raw Data 1 700 FEE Zero Suppression 35 20 HLT Clustering & Compression 5-7 ~3 Remove clusters not associated to relevant tracks 2 1.5 Data format optimization 2-3 <1

slide-16
SLIDE 16

ALICE USA - 2013-05

  • P. Buncic, T. Kollegger, P. Vande Vyvre

16

  • LHC luminosity variation during fill and efficiency taken

into account for average output to computing center.

Data Bandwidth

Detector Input to Online System (GByte/s) Peak Output to Local Data Storage (GByte/s)

  • Avg. Output

to Computing Center (GByte/s) TPC 1000 50.0 8.0 TRD 81.5 10.0 1.6 ITS 40 10.0 1.6 Others 25 12.5 2.0 Total 1146.5 82.5 13.2

slide-17
SLIDE 17

24/07/2012

  • P. Vande Vyvre

17

Online Upgrade Architecture

2 x 10 or 40 Gb/s FLP 10 Gb/s Computing Farm Network DAQ and HLT 10 or 40 Gb/s FLP EPN FLP ITS TRD Muon FTP L0 L1 FLP EMC EPN FLP TPC Data Storage FLP FLP TOF FLP PHO Trigger Detectors ~ 2500 links ~ 250 FLPs ~ 1250 EPNs