Prof.’s M. Halem and Ye Yesha "Geoscience and Aerospace Applications
- n a Potential Heterogenous Cell BE Cluster
at UMBC"
Presentation at the STI Cell BE Workshop
At Georgia Tech Univ. June 18-19, 2007 halem@umbc.edu
Presentation at the STI Cell BE Workshop "Geoscience and - - PowerPoint PPT Presentation
Presentation at the STI Cell BE Workshop "Geoscience and Aerospace Applications on a Potential Heterogenous Cell BE Cluster at UMBC" Prof.s M. Halem and Ye Yesha At Georgia Tech Univ. June 18-19, 2007 halem@umbc.edu Overview
At Georgia Tech Univ. June 18-19, 2007 halem@umbc.edu
6/18/07 Page 2
6/18/07 Page 3
in the University of Maryland System
extensive university -- Carnegie Foundation
between Baltimore and Washington DC.
and public policy with ~$85M in external research funding
Remaking Science Education”
undergraduates/year; 90% matriculate to advanced degrees in science.
6/18/07 Page 4
~1000 undergrad, ~200 grad students
NSF data) among research universities
Security and Assurance, Center for Photonics, Lab For Advanced Information Technology, interdisciplinary Computational Lab for Autonomous Systems and Services (iCLASS), VLSI Lab, …
Science, graphics, databases, VLSI, …
6/18/07 Page 5
Laboratory for Autonomic Systems and Services’ (iCLASS) within the UMBC School of Engineering that supports high productivity computational research across campus
research in partnership with universities, industry and government by enabling an optical based network cyberinfrastructure for prototyping of large compute and data intensive computational systems in the areas of aerospace, geosciences, defense and medical imaging
science modeling, web-based service oriented science and engineering, pervasive computing, communicating devices, visualization and data preservation
“ bluegrit” and an Intel based IBM cluster of PCs “matisse” driving a Hyperwall for visualization. In addition, ‘bluegrit’ is part of the University System of Maryland Grid and a member of SURAgrid.
6/18/07 Page 6
Head Node
Compute Nodes
4x2.5 GHz Power PC 970 Processors
heterogeneous multicore processing system for exploratory research.
6/18/07 Page 7
Storage
Internal connectivity
External Connectivity
Circuit Service from UMBC to UMCP/UMIACS, NASA/GSFC, NSA/LTS, NGIT/ McLean Va., NLR
6/18/07 Page 8
Tiled Display
Matisse
drive an upgraded Tile display and explore other uses.
6/18/07 Page 9
6/18/07 Page 10
many temperature and moisture sounding instruments flown over several
spectral resolutions and temporal frequencies, and in different formats Data transformations such as gridding, statistical sampling, subsetting, convolving, etc. are needed to produce long climate data records to study global warming. These transformations lend themselves well to the data parallel model which are ideal for Cell BE Processor.
Figure shows of one month of gridded AIRS data for a single channel
6/18/07 Page 11
2520 km (6 min. )/ granule 135 scan lines /granule 240 granules/ day
6/18/07 Page 12
demand for atmospheric radiance data sets utilizing the IBM JS20/21 cluster. However, reprocessing will consume the entire current system.
processors for gridding different instruments:
FFT convolutions to match spectral radiances of lower resolution instruments
satellite observations (L1) for user specified spatial-temporal regions with selected statistical aggregations
Motivation for Cell Processing:
E.g., Each SPUs can project a separate granule in parallel with all other granules. Convolutions can be performed with 1-D FFT’s in single precision to give performance factors of 10X to 15X over leading superscalar and vector CUs, (Williams et. al., CF’06,May 2006, Ischia,Italy).
Page 13
6/18/07 Page 14
spacecraft on July 15, 2004.
and a 13x24 km2 footprint, guaranteeing daily global coverage.
technique [C.D. Rodgers, 2000)] that has become standard in the field.
to improve the vertical resolution of the ozone profile below 20 km compared to those from the SBUV instruments that have flown on NASA and NOAA satellites since 1970.
6/18/07 Page 15
and Jacobians efficiently and to correct for rotational Raman scattering
unprecedented spatial resolution.
minutes per pixel on off the shelf commodity Intel processors.There are 1.3 million pixels per day which would take about 8000 processors to keep up with real time processing of every pixel.
be well suited to parallelize the code using independent SPUs. It should be possible to use the SPU SIMD instructions to compute multiple profiles simultaneously within each SPU at 5X over Intel processors.
the proposed modest sized Cell blade cluster.
6/18/07 Page 16
Computations limit prediction:
frequently being accessed from disk
scales well for cluster use of 128 to 256 nodes Five day forecast results required within 6 hours for transmission to National Hurricane Forecast Center for inclusion in ensemble forecasts.
6/18/07 Page 17
NASA Ames
National Lambda Rail (10 - 40 Gb/s)
Northrop Grumman McLean, Virginia NASA Goddard Greenbelt, Maryland
DISTRIBUTED COMPUTING NODES NEXT-GENERATION NETWORKS UMBC SOC MODELING SERVICES SCIENCE ANALYSIS GEOS5/OGCM/ MOM4 GEOS5/OGCM/ MOM4 GEOS5/OGCM/ MOM4 GEOS5/OGCM/ POSEIDON GEOS5/OGCM/ POSEIDON GEOS5/OGCM/ POSEIDON UMBC and GSFC DATA PORTALS NOAA/NESDIS DATA and MODEL PORTALS EXTERNAL COLLABORATORS
Model to model & model to data validation / comparisons
6/18/07 Page 18
speedups of 20X and 10X in SP and 10X and 5X in DP
store memory
transfer large data sets from a remote TB memory cache over 10Ge optical network instead of from local disk
6/18/07 Page 19
updated by neighbors
6/18/07 Page 20
6/18/07 Page 21
6/18/07 Page 22
6/18/07 Page 23
Ray Tracing Algorithm
9x performance increase
PS3: 12 FPS @ 1280 X 720 p with 16 spheres 20 FPS @ 720 X 480 p with 16 spheres Intel 1.73 GHz: 0.3 FPS @1280X720 p w 16 spheres (No assembly optimization, SPU code)
6/18/07 Page 24
at different levels of detail, and producing coarser regions from the merging of regions of finer segmentations, while maintaining region boundaries at the full image spatial resolution. Image Segmentation transforms pixel-based analysis into region based analysis.
together with spectral clustering – controlled by a spclust_wght parameter. (J.
Intelligence, vol. 11, no. 2, pp. 150-163, 1989.)
such that the number of regions handled at any point in the program is restrained.
Recursive HSEG (RHSEG) facilitates a highly efficient
6/18/07 Page 25
like data (e.g., IMAGE spacecraft Radio Plasma Imager data)
effective parallel implementation. E.g. A full Landsat TM scene (6500x6500 by 6 bands) can be processed in two to eight minutes with 256 2.1 GHz CPUs (Thunderhead Beowulf Cluster).
results, and allows a user to extract useful segmentation results from the HSEG segmentation hierarchy
project between NASA and Bartron Medical Imaging has been initiated for the development of an extension to three-dimensional medical image analysis. Completed basic extension of HSEGViewer to enable viewing of arbitary 2-D data slices along a selected row, column or depth plane.
6/18/07 Page 26
View of segmentation of skull for depth plane 86 (out of 172): Grey Scale Image Four Region Segmentation
6/18/07 Page 27
View of segmentation of skull for row plane 128 (out of 256): Grey Scale Image Four Region Segmentation
6/18/07 Page 28
View of segmentation of skull for column plane 128 (out of 256): Grey Scale Image Four Region Segmentation
6/18/07 Page 29
CAT Scan Segmented Scan Bartron’s Med-Seg applied to body CAT scan
6/18/07 Page 30
6/18/07 Page 31
6/18/07 Page 32
Advantages and Potential of Med-Seg
information not normally seen with the human eye, which can normally differentiate only 8 to 10 bits
quickly, but is longer for large images. Proposing to test the Med-Seg algorithm
segmentation results.
CAT scans currently under study.
Agriculture and the Indian Health Service
6/18/07 Page 33
Medical Simulation Center, Uniformed Services University
6/18/07 Page 34
chance and necessity in the early evolution of life on earth. We expect to derive a better understanding of the potential for expanding biochemistry through synthetic biology/biochemical engineering.
variation in the structure and function of proteins, which is then ‘filtered’ by natural
computing can realistically reflect complex biochemical processes that are expensive and difficult to explore in a laboratory.
amino acids, to ask whether the very building blocks of life are non-randomly selected from the chemistry of this universe.
become apparent over recent decades that the ‘algorithm’ by which a linear, linked sequence of amino acids folds into a 3-dimensional structure imbued with biochemical function is one of enormous complexity.
would provide an enormous increase in the scale and flexibility of the research that we are currently undertaking.
6/18/07 Page 35
6/18/07 Page 36
trends, bursts and events
and negative reviews using statistical NLP and ML
social structure, opinions, biases and temporal information
between two unknown blogs using sentiment or link polarity
the social network of the blogs and finding the communities corresponding to topics of interest
Architectural Issues
processing
6/18/07 Page 37
6/18/07 Page 38
Commendations Commendations Accusations Accusations (to other devices) (to other devices) Routing attacks, Routing attacks, disruptions disruptions Unfair contention, Unfair contention, Jamming Jamming
Packet dropping, Packet dropping, Mangling, injection Mangling, injection
Trust evolution, reputation management, Trust evolution, reputation management, recourse recourse
6/18/07 Page 39
heterogeneous and dynamic
pervasive computing, MANETs, etc.
trust and reputation.
6/18/07 Page 40
6/18/07 Page 41
6/18/07 Page 42
6/18/07 Page 43
6/18/07 Page 45
(Lohr & Chapman)
12 FPS @ 1280 X 720 p with 16 spheres No acceleration or frustrum checking 20 FPS @ 720 X 480 p with 16 spheres Theoretical 26 Gflops
6/18/07 Page 46