The IceCube data pipeline: from the South Pole to publication - - PowerPoint PPT Presentation

the icecube data pipeline from the south pole to
SMART_READER_LITE
LIVE PREVIEW

The IceCube data pipeline: from the South Pole to publication - - PowerPoint PPT Presentation

The IceCube data pipeline: from the South Pole to publication Jakob van Santen jakob.van.santen@desy.de PyData Berlin, 2016-05-21 2 Deutsches Elektronen-Synchrotron (DESY) Zeuthen Helmholtz research institute with ~200 scientists, postdocs,


slide-1
SLIDE 1

The IceCube data pipeline: from the South Pole to publication

Jakob van Santen jakob.van.santen@desy.de PyData Berlin, 2016-05-21

slide-2
SLIDE 2

Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

Deutsches Elektronen-Synchrotron (DESY) Zeuthen

2

Helmholtz research institute with ~200 scientists, postdocs, and students studying high-energy astrophysics with gamma rays and neutrinos Kosmos

slide-3
SLIDE 3
slide-4
SLIDE 4

IceCube South Pole Neutrino Observatory

What’s a neutrino? Why look for them at the South Pole? What are we trying to learn? How does IceCube find neutrinos?

slide-5
SLIDE 5

Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

What’s a neutrino?

5

Charged (electromagnetic interactions) Neutral (weak interactions

  • nly)

2.5e6 times less massive

slide-6
SLIDE 6

Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

Sources of neutrinos

6

Image: Wikipedia

Radioactive decay Nuclear reactors

Image: chemistryviews.com

The Sun

Image: N. Svoboda

Man-made particle accelerators

Image: CERN

Cosmic accelerators ~106 eV ~109 eV ~1015 eV Higher energy

slide-7
SLIDE 7

IceCube South Pole Neutrino Observatory

What’s a neutrino? Why look for them at the South Pole? What are we trying to learn? How does IceCube find neutrinos?

slide-8
SLIDE 8

Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

Cosmic rays

8 [eV] E

13

10

14

10

15

10

16

10

17

10

18

10

19

10

20

10

]

  • 1

sr

  • 1

s

  • 2

m

1.6

[GeV F(E)

2.6

E

1 10

2

10

3

10

4

10

Grigorov JACEE MGU Tien-Shan Tibet07 Akeno CASA-MIA HEGRA Fly’s Eye Kascade Kascade Grande IceTop-73 HiRes 1 HiRes 2 Telescope Array Auger Knee 2nd Knee Ankle

PRD 86: 010001 (2013)

Something accelerates nuclei to macroscopic energies…

IceCube-59 Tibet-III

5 TeV 20 TeV

Abbasi et al., ApJ, 746, 33, 2012 Amenomori et al., ICRC 2011

…but we don’t know what, or where! 1 Joule Neutrinos can point back to the cosmic accelerators!

slide-9
SLIDE 9

IceCube South Pole Neutrino Observatory

What’s a neutrino? Why look for them at the South Pole? What are we trying to learn? How does IceCube find neutrinos?

slide-10
SLIDE 10

Image: NASA Image: USAF

South Pole Station: 90 deg South, 2835 m above sea level

~2800 m of pure, clear ice

slide-11
SLIDE 11

Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

South Pole Station

11 Photo: Haley Buffman/NSF

Main station IceCube Lab

slide-12
SLIDE 12
slide-13
SLIDE 13

Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

IceCube: a cubic-kilometer neutrino telescope buried in ice

13

IceCube Lab (data center) Digital Optical Module (single-pixel camera)

slide-14
SLIDE 14

IceCube South Pole Neutrino Observatory

What’s a neutrino? Why look for them at the South Pole? What are we trying to learn? How does IceCube find neutrinos?

slide-15
SLIDE 15

Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

IceCube data pipeline

15

Data acquisition Feature calculation & event selection Simulation Analysis South Pole (real time)

  • ffline

Science! Challenges:

  • Getting data out of the South Pole
  • Generating simulated data
  • Allowing non-expert users to

configure & extend data pipeline for many distinct science topics

  • Distributing data to analyzers
slide-16
SLIDE 16

Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

A neutrino event in IceCube

16

Color ⇔ time Size ⇔ light intensity Neutrino Muon Interaction

slide-17
SLIDE 17

Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

Raw data

17

  • 1 neutrino for every 1

million penetrating muons

  • ~10 high-energy

neutrino events per year

  • Need features to

select them!

10 milliseconds of raw data

slide-18
SLIDE 18

Feature calculation & data selection

slide-19
SLIDE 19

Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

IceTray: IceCube’s processing framework

19

  • Core written in ~20k lines of C++
  • User interface exposed via boost::python
  • Two main components:
  • I3Frame: container for event data
  • I3Module: manipulates I3Frames
  • Data storage in files

Images: boost.org

slide-20
SLIDE 20

Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

I3Frames

20

In [1]: from icecube import icetray, dataio, dataclasses In [2]: f=dataio.I3File('hese.i3.bz2') In [3]: print f.pop_frame(icetray.I3Frame.DAQ) [ I3Frame (DAQ): 'CalibrationErrata' [DAQ] ==> I3Vector<OMKey> (137) 'FilterMask' [DAQ] ==> I3Map<string, I3FilterResult> (749) 'I3Geometry' [Geometry] ==> I3Geometry (401222) 'I3TriggerHierarchy' [DAQ] ==> I3Tree<I3Trigger> (616) 'OfflinePulses' [DAQ] ==> I3Map<OMKey, vector<I3RecoPulse> > (52917) 'PoleCascadeLinefit' [DAQ] ==> I3Particle (150) 'PoleMuonLlhFit' [DAQ] ==> I3Particle (150) 'PoleMuonLlhFitFitParams' [DAQ] ==> I3LogLikelihoodFitParams (68) 'PoleToIParams' [DAQ] ==> I3TensorOfInertiaFitParams (78) ]

“I3” file is a stream of serialized I3Frames boost::serialization provides load/save, object versioning, etc. I3Frame: dictionary of [immutable] C++ objects related to a single event

Flexible! Schema can change from event to event

slide-21
SLIDE 21

Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

I3Modules

21

I3Module I3Module I3Module I3Module

tray = I3Tray() tray.Add("I3Reader", filenamelist="foo.i3") tray.Add('HomogenizedQTot', Output='HomogenizedQTot', Pulses='OfflinePulsesHLC') tray.Add("I3Writer", filename="bar.i3") tray.Execute() Frame Frame Frame

  • I3Module: single-purpose processing stage
  • User (physicist) configures module chain in Python
  • An I3Module can:
  • Add new objects to the frame
  • Remove objects from the frame
  • Drop the frame
slide-22
SLIDE 22

Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

User-defined I3Modules

22

class Counter(icetray.I3ConditionalModule): def __init__(self, context): super(Counter,self).__init__(context) self.AddParameter("Key", "Name of counter to put in the frame", "Count") def Configure(self): self.key = self.GetParameter("Key") self.counter = 0 def Physics(self, frame): frame[self.key] = icetray.I3Int(self.counter) self.counter += 1 self.PushFrame(frame) tray.Add(Counter, Name="CountCount")

Prototype rapidly in Python, rewrite in C++ as needed

slide-23
SLIDE 23

Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

Filtering at the South Pole

23

3000 events/s 1 TB/day 300 events/s 100 GB/day IceCube Lab Satellite relay IceCube Data Warehouse (Madison, WI) 4 PB and counting

slide-24
SLIDE 24

Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

Aside: grid computing

24

  • Simulation requires tens of

millions of CPU and GPU hours

  • Opportunistic computing on

academic grids in US and Europe with HTCondor glide-ins, custom Python middleware

  • Some Linux flavor (usually Red

hat variant)

  • Software provisioned on

CVMFS (HTTP-based read-

  • nly filesystem)
slide-25
SLIDE 25

Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

Data formats for analysis

25

I3Frame

I3ParticleConverter I3FilterMaskConverter I3TableRow I3TableRow

HDF5 ROOT

Format-specific backend Event data Specific coercion for each object Abstract table row

  • I3Frame: flexible, but inefficient for

partial reads

  • Analysis development means

reading the same data over and

  • ver again → tabular formats
  • tableio: framework for turning

irregular event data into table rows

pytables, pandas, h5py, etc.

slide-26
SLIDE 26

Analysis

slide-27
SLIDE 27

Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

Histogramming

27

101 102 103 104 Number of collected photons 100 101 102 103 104 105 106 107 108 109 Events per year

  • Atmos. neutrinos

Penetrating muons Pre-selection

Blindness

Most IceCube analyses use binned data Pro

  • Predicted mean in each bin is

straightforward to calculate with Monte Carlo

  • Statistics are easy to understand

Con

  • Have to choose how to bin
slide-28
SLIDE 28

Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

dashi: histograms that do more

28

numpy.histogramdd()-backed histogram

  • bjects with built-in
  • summary statistics
  • manipulation methods: add, multiply,

slice, project, etc.

  • storage in hdf5 datasets

https://github.com/emiddell/dashi

# create & fill 3d histogram h = dashi.histogram.histogram(3, (linspace(0, 1, 101),)*3) h.fill(get_3d_data()) # project out dimension 1 h.project([0,2]) # plot a 1-d slice h[1,1,:].line(differential=True) # store for later with tables.open_file('foo.hdf5', 'a') as hdf: dashi.histsave(h, hdf, '/', 'my_histogram')

slide-29
SLIDE 29

Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

Example: discovering astrophysical neutrinos

29 μ νμ

μ Veto

Veto Simple event selection based

  • n 2 features:
  • > 6000 photon hits
  • hit pattern starts inside

detector volume 28 events survived in 2 years of data

slide-30
SLIDE 30

Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

Analysis

30

  • 80
  • 60
  • 40
  • 20

20 40 60 80 102 103 Declination (degrees) Deposited EM-Equivalent Energy in Detector (TeV) Showers Tracks

IceCube Preliminary

Bin data in observable space, compare counts to predicted mean in each bin A: < 5e-7 (discovery!) → doi:10.1126/science.1242856 Energy Zenith angle Q: What is the chance that the data is a fluctuation of the background?

slide-31
SLIDE 31

Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

Summary

31

  • IceCube is a cubic-kilometer neutrino detector at

the South Pole.

  • Python powers our data processing, simulation,

and analysis tools.

slide-32
SLIDE 32

icecube.wisc.edu

slide-33
SLIDE 33

Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

Want to learn more about the science?

33

Come to Berlin’s science open house in Adlershof on July 11! http://www.langenachtderwissenschaften.de

slide-34
SLIDE 34

Backup

slide-35
SLIDE 35

Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

35

slide-36
SLIDE 36

Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

36

slide-37
SLIDE 37

Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

37

slide-38
SLIDE 38

Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

38

slide-39
SLIDE 39

Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

39

slide-40
SLIDE 40

Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

40

slide-41
SLIDE 41

Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

41

slide-42
SLIDE 42

Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

42