Current and Future Data Intensive Computing at DOE BES User - - PowerPoint PPT Presentation

current and future data intensive computing at doe bes
SMART_READER_LITE
LIVE PREVIEW

Current and Future Data Intensive Computing at DOE BES User - - PowerPoint PPT Presentation

Current and Future Data Intensive Computing at DOE BES User Facilities Steve Miller Scientific Computing Group Leader Neutron Scattering Science Division Mark L. Green Tech-X Corporation DOE Science Programs Org Chart


slide-1
SLIDE 1

Current and Future Data Intensive Computing at DOE BES User Facilities Steve Miller

Scientific Computing Group Leader Neutron Scattering Science Division Mark L. Green Tech-X Corporation

slide-2
SLIDE 2

2 Managed by UT-Battelle for the U.S. Department of Energy NSSD – Data Analysis

DOE Science Programs Org Chart

http://www.er.doe.gov/about/Organization/Organization_chart/OneSC-org.pdf#page=2

slide-3
SLIDE 3

3 Managed by UT-Battelle for the U.S. Department of Energy NSSD – Data Analysis

  • National Synchrotron Light Source (NSLS) – BNL, Brookhaven NY
  • National Synchrotron Light Source II (NSLS-II) – BNL, Brookhaven NY
  • Stanford Synchrotron Radiation Lightsource (SSRL) – SLAC, Stanford CA
  • Advance Light Source (ALS) – LBNL, Berkeley CA
  • Advanced Photon Source (APS) – ANL, Chicago IL
  • Linac Coherent Light Source (LCLS) – SLAC, Stanford CA
  • Spallation Neutron Source (SNS) – ORNL, Oak Ridge TN
  • High Flux Isotope Reactor (HFIR) – ORNL, Oak Ridge TN
  • Manuel Lujan Jr. Neutron Scattering Center (Lujan Center) – LANL, Los

Alamos NM

DOE Neutron and X-Ray User Facilities

“As part of its mission, the Office of Basic Energy Sciences (BES) plans, constructs, and operates major scientific user facilities to serve researchers from universities, national laboratories, and industry.”

http://www.sc.doe.gov/bes/BESfacilities.htm

slide-4
SLIDE 4

4 Managed by UT-Battelle for the U.S. Department of Energy NSSD – Data Analysis

Facility Users and Their Research

  • BES User facilities serve over 10,000 scientists and

researchers annually

– Facility users are largely NSF funded to perform their research – Diverse science areas include: material science, biology, chemistry, physics, crystalography, geology, and more.

  • Scientific techniques utilized for collecting data in the

areas of:

– spectroscopy – diffraction – scattering – imaging – tomography

http://www.sc.doe.gov/bes/BESfacilities.htm

slide-5
SLIDE 5

5 Managed by UT-Battelle for the U.S. Department of Energy NSSD – Data Analysis

ORNL Neutron Scattering Facilities SNS and HFIR

 SNS construction completed May 2006, user ops December 2007  SNS has achieved ~ 800 kWatt beam power  7 SNS fully operational instruments, 4 commissioning, and adding 2 more in FY10 – total of 23 instruments.  Second SNS target station being planned which will add another ~24 instruments.  HFIR cold source

  • perational May 2007

(One of the brightest in the world)  7 HFIR operational instruments (2 SANS on cold source)  2 HFIR instruments commissioning

slide-6
SLIDE 6

6 Managed by UT-Battelle for the U.S. Department of Energy NSSD – Data Analysis

Sequoia Instrument:

  • 187 detector banks
  • 8 tubes per bank, 128 pixels per tube, 8333 TOF samples per pixel, 4 Bytes each
  • Total histogram size per “run”: ~6GB, some instruments will produce 15GB per run
  • Experiments comprised of multiple runs
  • Data value estimated at approximately $32M/TB to produce

Instruments are Big!

Person to scale size

Sequoia Instrument

Detector Banks

slide-7
SLIDE 7

7 Managed by UT-Battelle for the U.S. Department of Energy NSSD – Data Analysis

  • SNS

– 100MB to 15GB per measurement depending upon instrument – Data collected in “event mode” which can be streamed – 1.3GB/day/instrument average data rate

  • APS

– 60 beam lines – Data collected ranges from a few KB to 100GB – Tomography beamlines can produce 10TB per experiment – Diffraction instruments can collect 300MB/sec continuous

  • LCLS

– Up to tens of GB/sec peak data rate – 1TB/day – First data to be collected in September

Data Production

slide-8
SLIDE 8

8 Managed by UT-Battelle for the U.S. Department of Energy NSSD – Data Analysis

  • For some instruments, users bring their own USB drives

and copy data during their experiment time

  • FTP
  • Web based data portal
  • File formats:

– NeXus HDF5 data format for neutron and some X-ray instruments – Proprietary or non-standard data formats for some instruments

Data Access

slide-9
SLIDE 9

9 Managed by UT-Battelle for the U.S. Department of Energy NSSD – Data Analysis

slide-10
SLIDE 10

10 Managed by UT-Battelle for the U.S. Department of Energy NSSD – Data Analysis

  • User authentication and cybersecurity
  • Metadata definition, capture, and data association
  • Facility infrastructure capacity
  • Data policy
  • Computing culture

– “the reduction software and data format for these large instruments needs rethinking or papers will not be published. My data files are too big (over 1.6Gb) and even with 4Gb of ram on my computer and looking at my data is painful.”

ISSUES

slide-11
SLIDE 11

11 Managed by UT-Battelle for the U.S. Department of Energy NSSD – Data Analysis

Emerging Vision – Interconnected Facilities

  • Facilitate new scientific discoveries via

multi-technique data analyses

  • Thick Client applications
  • run local or remote
  • coordinate jobs and data movement
  • gives users the autonomy they want
  • Orbiter thick client platform under

construction via DOE SBIR with Tech-X

  • Ad-hoc Virtual Organizations
  • user defined
  • enable data sharing
  • Facilities seek to provide users

inter-facility data movement - SNS, HFIR, APS, Lujan, and LCLS already collaborating

  • User Facility network (UFnet)

needed on top of Esnet

  • Each facility needs a “Network

Node” abstracting operation from analysis which would be enabled via “Nodeware”

slide-12
SLIDE 12

12 Managed by UT-Battelle for the U.S. Department of Energy NSSD – Data Analysis

  • Facility concentrates on science software development
  • Provide facility data management infrastructure
  • Leverage where possible!

– SciDAC tools – Esnet – TeraGrid – Collaborations and partnerships

Software Development Strategy

slide-13
SLIDE 13

13 Managed by UT-Battelle for the U.S. Department of Energy NSSD – Data Analysis

  • Want to process collections of data comprised of multiple

runs

  • Willing to wait seconds to minutes for results
  • Need to visualize data

– Processed – Live streaming

  • Want the same access at home as well as at the instrument
  • Growing interest in collaborating and for using multi-

technique data

Data Intensive Computing for Users

slide-14
SLIDE 14

14 Managed by UT-Battelle for the U.S. Department of Energy NSSD – Data Analysis

Questions?