Current and Future Data Intensive Computing at DOE BES User - - PowerPoint PPT Presentation
Current and Future Data Intensive Computing at DOE BES User - - PowerPoint PPT Presentation
Current and Future Data Intensive Computing at DOE BES User Facilities Steve Miller Scientific Computing Group Leader Neutron Scattering Science Division Mark L. Green Tech-X Corporation DOE Science Programs Org Chart
2 Managed by UT-Battelle for the U.S. Department of Energy NSSD – Data Analysis
DOE Science Programs Org Chart
http://www.er.doe.gov/about/Organization/Organization_chart/OneSC-org.pdf#page=2
3 Managed by UT-Battelle for the U.S. Department of Energy NSSD – Data Analysis
- National Synchrotron Light Source (NSLS) – BNL, Brookhaven NY
- National Synchrotron Light Source II (NSLS-II) – BNL, Brookhaven NY
- Stanford Synchrotron Radiation Lightsource (SSRL) – SLAC, Stanford CA
- Advance Light Source (ALS) – LBNL, Berkeley CA
- Advanced Photon Source (APS) – ANL, Chicago IL
- Linac Coherent Light Source (LCLS) – SLAC, Stanford CA
- Spallation Neutron Source (SNS) – ORNL, Oak Ridge TN
- High Flux Isotope Reactor (HFIR) – ORNL, Oak Ridge TN
- Manuel Lujan Jr. Neutron Scattering Center (Lujan Center) – LANL, Los
Alamos NM
DOE Neutron and X-Ray User Facilities
“As part of its mission, the Office of Basic Energy Sciences (BES) plans, constructs, and operates major scientific user facilities to serve researchers from universities, national laboratories, and industry.”
http://www.sc.doe.gov/bes/BESfacilities.htm
4 Managed by UT-Battelle for the U.S. Department of Energy NSSD – Data Analysis
Facility Users and Their Research
- BES User facilities serve over 10,000 scientists and
researchers annually
– Facility users are largely NSF funded to perform their research – Diverse science areas include: material science, biology, chemistry, physics, crystalography, geology, and more.
- Scientific techniques utilized for collecting data in the
areas of:
– spectroscopy – diffraction – scattering – imaging – tomography
http://www.sc.doe.gov/bes/BESfacilities.htm
5 Managed by UT-Battelle for the U.S. Department of Energy NSSD – Data Analysis
ORNL Neutron Scattering Facilities SNS and HFIR
SNS construction completed May 2006, user ops December 2007 SNS has achieved ~ 800 kWatt beam power 7 SNS fully operational instruments, 4 commissioning, and adding 2 more in FY10 – total of 23 instruments. Second SNS target station being planned which will add another ~24 instruments. HFIR cold source
- perational May 2007
(One of the brightest in the world) 7 HFIR operational instruments (2 SANS on cold source) 2 HFIR instruments commissioning
6 Managed by UT-Battelle for the U.S. Department of Energy NSSD – Data Analysis
Sequoia Instrument:
- 187 detector banks
- 8 tubes per bank, 128 pixels per tube, 8333 TOF samples per pixel, 4 Bytes each
- Total histogram size per “run”: ~6GB, some instruments will produce 15GB per run
- Experiments comprised of multiple runs
- Data value estimated at approximately $32M/TB to produce
Instruments are Big!
Person to scale size
Sequoia Instrument
Detector Banks
7 Managed by UT-Battelle for the U.S. Department of Energy NSSD – Data Analysis
- SNS
– 100MB to 15GB per measurement depending upon instrument – Data collected in “event mode” which can be streamed – 1.3GB/day/instrument average data rate
- APS
– 60 beam lines – Data collected ranges from a few KB to 100GB – Tomography beamlines can produce 10TB per experiment – Diffraction instruments can collect 300MB/sec continuous
- LCLS
– Up to tens of GB/sec peak data rate – 1TB/day – First data to be collected in September
Data Production
8 Managed by UT-Battelle for the U.S. Department of Energy NSSD – Data Analysis
- For some instruments, users bring their own USB drives
and copy data during their experiment time
- FTP
- Web based data portal
- File formats:
– NeXus HDF5 data format for neutron and some X-ray instruments – Proprietary or non-standard data formats for some instruments
Data Access
9 Managed by UT-Battelle for the U.S. Department of Energy NSSD – Data Analysis
10 Managed by UT-Battelle for the U.S. Department of Energy NSSD – Data Analysis
- User authentication and cybersecurity
- Metadata definition, capture, and data association
- Facility infrastructure capacity
- Data policy
- Computing culture
– “the reduction software and data format for these large instruments needs rethinking or papers will not be published. My data files are too big (over 1.6Gb) and even with 4Gb of ram on my computer and looking at my data is painful.”
ISSUES
11 Managed by UT-Battelle for the U.S. Department of Energy NSSD – Data Analysis
Emerging Vision – Interconnected Facilities
- Facilitate new scientific discoveries via
multi-technique data analyses
- Thick Client applications
- run local or remote
- coordinate jobs and data movement
- gives users the autonomy they want
- Orbiter thick client platform under
construction via DOE SBIR with Tech-X
- Ad-hoc Virtual Organizations
- user defined
- enable data sharing
- Facilities seek to provide users
inter-facility data movement - SNS, HFIR, APS, Lujan, and LCLS already collaborating
- User Facility network (UFnet)
needed on top of Esnet
- Each facility needs a “Network
Node” abstracting operation from analysis which would be enabled via “Nodeware”
12 Managed by UT-Battelle for the U.S. Department of Energy NSSD – Data Analysis
- Facility concentrates on science software development
- Provide facility data management infrastructure
- Leverage where possible!
– SciDAC tools – Esnet – TeraGrid – Collaborations and partnerships
Software Development Strategy
13 Managed by UT-Battelle for the U.S. Department of Energy NSSD – Data Analysis
- Want to process collections of data comprised of multiple
runs
- Willing to wait seconds to minutes for results
- Need to visualize data
– Processed – Live streaming
- Want the same access at home as well as at the instrument
- Growing interest in collaborating and for using multi-
technique data
Data Intensive Computing for Users
14 Managed by UT-Battelle for the U.S. Department of Energy NSSD – Data Analysis