LSST: Petascale opportunities and challenges Tony Tyson, University - - PowerPoint PPT Presentation

lsst petascale opportunities and challenges
SMART_READER_LITE
LIVE PREVIEW

LSST: Petascale opportunities and challenges Tony Tyson, University - - PowerPoint PPT Presentation

LSST: Petascale opportunities and challenges Tony Tyson, University of California, Davis 1 2 Relative data volume from survey telescopes & cameras 1000 Etendue ( m2 deg2 ) 100 Max 10 Survey 1 4 The new sky Probing Dark Matter


slide-1
SLIDE 1

1

LSST: Petascale opportunities and challenges

Tony Tyson, University of California, Davis

slide-2
SLIDE 2

2

slide-3
SLIDE 3
slide-4
SLIDE 4

4

1 10 100 1000 Etendue ( m2 deg2 ) Max Survey

Relative data volume from survey telescopes & cameras

slide-5
SLIDE 5

The new sky

Probing Dark Matter and Dark Energy Mapping the Milky Way

Finding Near Earth Asteroids

slide-6
SLIDE 6

6

Data volumes & rates are unprecedented in astronomy

5000 10000 15000 20000 GB Raw Catalog

Estimated Nightly Data Volume

LSST Pan-STARRS 4 SDSS

LS LSST w will m mak ake t ten ens of

  • f trillions

phot

  • tom
  • metric obser

ervat ations ns o

  • f tens

ens of

  • f

billions of

  • f obj
  • bjects
slide-7
SLIDE 7

NSF Review December 15-17, 2009 Tucson, AZ 7

DM System is widely distributed

Base Site

Base Center Co-located Data Access Center (DAC) Archive Center Co-located Data Access Center (DAC)

Archive Site Headquarters Site

Systems Operations Center (SOC) Education and Public Outreach Center (EPOC)

  • Site
  • A physical

location/space that hosts DM centers

  • Connected via

dedicated, protected fiber

  • ptic circuits
  • Center
  • A DM functional

capability hosted at a Site

slide-8
SLIDE 8

8

DM System relies on large-scale computational parallelism

  • With few exceptions, LSST

pipeline processing is “embarassingly parallel”

– 3024 parallel image readouts – O(108) sky tiles – O(109) objects

  • Computational clusters

are well matched to the available parallelism

– 5000 cores at Base – 12000 (yr1) – 33000 (yr10) cores at Archive

  • Middleware implements

flexible pipeline/production model

  • f parallelism
slide-9
SLIDE 9

9

DATA PRODUCTS

CLASSIF IFIC ICATION ON

slide-10
SLIDE 10

10

DM Pipelines

Solar System Cosmology Defects Milkyway Extended Sources Transients

Base Catalog

All Sky Database

Instance Catalog Generation

Generate the seed catalog as required for simulation. Includes:

Metadata Size Position

Operation Simulation

Type Variability

Source Image Generation

Color Brightness Proper motion

Introduce shear parameter from cosmology metadata

DM Data base load simulation

Generate per FOV

Photon Propagation Operation Simulation

Atmosphere Telescope Camera Defects Formatting

Generate per Sensor

Calibration Simulation

LSST Sample Images and Catalogs

IMAGE SIMULATIONS

slide-11
SLIDE 11

11

Full end-to-end simulations

slide-12
SLIDE 12

12

The Data Challenge

  • 3 Terabytes per hour

that must be mined in real time.

  • 20 billion objects will be

monitored for important variations in real time.

  • A new approach must be

developed for knowledge extraction in real time.

slide-13
SLIDE 13

13

The Data Challenge

  • ~3 Terabytes per hour

that must be mined in real time.

  • 20 billion objects will be

monitored for important variations in real time.

  • A new approach must be

developed for knowledge extraction in real time.

slide-14
SLIDE 14

14

LSST

slide-15
SLIDE 15

15

LSST

slide-16
SLIDE 16

Analytics

  • Complex computations
  • 100s of attributes per query
  • Iterative, successively more restrictive
  • Curiosity driven questions
  • 3 major query types
  • Needle in haystack
  • Correlations
  • Time series
slide-17
SLIDE 17

17

Science at the Limit

 Much of the breakthrough science using surveys (imaging or spectroscopy) have

  • ccurred at the limits of the surveys

 Sample incompleteness   Systematic errors

slide-18
SLIDE 18

LSST Wide-Fast-Deep survey

  • 4 billion galaxies with redshifts
  • Time domain:

1 million supernovae 1 million galaxy lenses 1 billion moving objects new phenomena

slide-19
SLIDE 19

LSST Wide-Fast-Deep survey

  • 4 billion galaxies with redshifts
  • Time domain:

1 million supernovae 1 million galaxy lenses 1 billion moving objects new phenomena

slide-20
SLIDE 20

20

Major opportunity and challenge:

slide-21
SLIDE 21

21

  • Characterize the known

clustering)

  • Assign the new

(classification)

  • Discover the unknown

(outlier detection) Benefits of very large data sets:

  • best statistical analysis of “typical” events
  • automated search for “rare” events

Tom Vestrand

slide-22
SLIDE 22

The dimension reduction problem:

Finding correlations and “fundamental planes”

  • f parameters
  • The Curse of High

Dimensionality !

– Are there combinations (linear or non-linear functions) of observational parameters that correlate strongly with one another? – Are there eigenvectors or condensed representations (e.g., basis sets) that represent the full set of properties?

slide-23
SLIDE 23

23

Automated discovery Data exploration

This is required also for automated Data Quality Assessment

slide-24
SLIDE 24

How To Learn More / Get Involved?

  • LSST lsst.org
  • Check out LS

S T dat abase t rac at ht t p:/ / dev.lsstcorp.org/ trac/ wiki/ LS S TDat abase

  • XLDB
  • XLDB4 (Oct 6-7@

S LAC)

  • Read past XLDB report s

ht t p:/ / www-conf.slac.stanford.edu/ xldb

  • S

hare your use cases, j oin t he communit y

  • SciDB
  • Check out ht t p:/ / scidb.org
  • Try it out

Open conference starting this year 1st public release