High Performance Computing & Scientific Discovery

High Performance Computing & Scientific Discovery

Loading
Loading Social Plug-ins...
Language: English
Save to myLibrary Post page as Slide Download PDF
Go to Page # Page of 36

Description: The National Energy Research Scientific Computing Center (NERSC) mission is to accelerate the pace of scientific discovery by providing high performance computing, information, data, and communications services. Calculation: High accuracy ab initio calculations on O16 interaction model. Validating Climate Models AstroGK gyrokinetic code for astrophysical plasmas.

Shows how magnetic turbulence leads to particle heating Nanoscience Calculations and Scalable Algorithms.

 
Author: Katherine Yelick, NERSC Director (Fellow) | Visits: 1548 | Page Views: 1553
Domain:  High Tech Category: Business Subcategory: Cell phones 
Upload Date:
Short URL: http://electronics.wesrch.com/pdfEL1GP95DLVGGR

Loading
Loading...



px *        px *

* Default width and height in pixels. Change it to your required dimensions.

 
LithoVision Registration
Contents:
Science at NERSC
Katherine Yelick NERSC Director

NERSC Mission
The mission of the National Energy Research Scientific Computing Center (NERSC) is to accelerate the pace of scientific discovery by providing high performance computing, information, data, and communications services for all DOE Office of Science (SC) research.

2

NERSC is the Production Facility for DOE SC
� NERSC serves a large population of users
~3000 users, ~400 projects, ~500 codes

� Allocations managed by DOE
� 10% INCITE awards:
� Created at NERSC; now used throughout SC � Open to all of science, not just DOE or DOE/SC mission � Large allocations, extra service

� 70% Production (ERCAP) awards:
� From 10K hour (startup) to 5M hour; Only at NERSC, not LCFs

� 10% each NERSC and DOE/SC reserve

� Award mixture offers
�High impact through large awards �Broad impact across science domains
3

NERSC Serves DOE Mission Needs
� DOE's SciDAC Program
� Brings together interdisciplinary teams � 55 NERSC projects tied to SciDAC

� Focus on high end computing
� DOE/OMB measure of concurrency: Percent of time spent on jobs 1/8th of the machine � Set to 40% in 2008

� Aside: Mid-range computing workshop being organized
4

NERSC Serves Broad and Varying DOE Science Priorities
Usage by Science Type as a Percent of Total Usag
35.0

30.0 Accelerator Physics Applied Math Astrophysics Chemistry Climate Research Combustion Computer Sciences Engineering Environmental Sciences Fusion Energy Geosciences Lattice Gauge Theory Materials Sciences Nuclear Physics

25.0

20.0

15.0

10.0

5.0

0.0 2002 2003 2004 2005 2006 2007 2008

5

NERSC 2008 Configuration
Large-Scale Computing System Franklin (NERSC-5): Cray XT4 � 9,740 nodes; 19,480 cores � 13 Tflop/s sustained SSP (100 Tflops/s peak) Upgrading to QuadCore � ~25 Tflops/s sustained SSP (355 Tflops/s peak) NERSC-6 planned for 2010 production � 3-4x NERSC-5 in application performance Clusters NERSC Global Filesystem (NGF) 230 TB; 5.5 GB/s HPSS Archival Storage � 44 PB capacity � 10 Sun robots � 130 TB disk cache Analytics / Visualization � Davinci (SGI Altix)

Bassi (NCSb) � IBM Power5 (888 cores) Jacquard (NCSa) � LNXI Opteron (712 cores) PDSF (HEP/NP) � Linux cluster (~1K cores)

6

DOE Demand for Computing is Growing
Compute Hours Requested vs Allocated

� Each year DOE users requests 2x as many hours as can be allocated � This 2x is artificially constrained by perceived availability � Unfulfilled allocation requests amount to hundreds of millions of compute hours in 2008
7

Science Over the Years

NERSC is enabling new science in all disciplines, with over 1,500 refereed publications in 2007
8

Nuclear Physics
Calculation: High accuracy ab initio calculations on O16 using no-core shell model and no-core full configuration interaction model � PI: James Vary, Iowa State � Science Results:
� Most accurate calculations to date on this size nuclei � Can be used to parametrize new density functionals for nuclear structure simulations





Scaling Results:
� 4M hours used; 200K allocated � 12K cores; vs 2-4K before Franklin uncharged time � Diagonalize matrices of dimension up to 1 billion

9

Validating Climate Models
� � � INCITE Award for "20th Century Reanalysis" using an Ensemble Kalman filter to fill in missing climate data since 1892 PI: G. Compo, U. Boulder Science Results:
� Reproduced 1922 Knickerbocker storm � Data can be used to validate climate and weather models



Scaling Results:
� 3.1M CPU Hours in allocation � Scales to 2.4K cores � Switched to higher resolution algorithm with Franklin access

Sea level pressures with color showing uncertainty (a&b); precipitation (c); temperature (d). Dots indicate measurements locations (a). 10

Middle Users Capable Large-Scale Computational Science
� Calculations: AstroGK gyrokinetic code for astrophysical plasmas � PIs: Dorland (U. of Maryland), Howes, Tatsuno � Science Results � Shows how magnetic turbulence leads to particle heating � Scaling Results � Runs on 16K cores � Combines implicit and explicit methods
11

Modeling Dynamically and Spatially Complex Materials for Geoscience
� Calculation: Simulation of seismic waves through silicates, which make up 80% of the Earth's mantle � PI: John Wilkins, Ohio State University � Science Result
� Seismic analysis shows jumps in wave velocity due to structural changes in silicates under pressure

� Scaling Result
- First use of Quantum Monte Carlo (QMC) for computing elastic constants - 8K core vs. 128 on allocated time
12

Nanoscience Calculations and Scalable Algorithms
� Calculation: Linear Scaling 3D Fragment (LS3DF). Density Functional Theory (DFT) calculation numerically equivalent to more common algorithm, but scales with O(n) in number of atoms rather than O(n3) � PI: L.W. Wang, LBNL � Science Results � Calculated dipole moment on 2633 atom CdSe quantum rod, Cd961Se724H948 . � Scaling Results � Ran on 2560 cores � Took 30 hours vs many months for O(n3) algorithm � Good parallel efficiency (80% on 1024 relative to 64 procs)
13

Simulation of a Low Swirl Burner Fueled with Hydrogen
� Calculation: Numerical simulation of flame surface of an ultra-lean premixed hydrogen flame in a laboratoryscale low-swirl burner. Burner is being developed for fuel-flexible, near-zero-emission gas turbines. � PI: John Bell, LBNL
Science Result:
� Detailed transport and chemical kinetics using an adaptive low Mach number algorithm for reacting flow.

Scaling Results:
� Adaptive Mesh Refinement used to save memory and time. � Scales to 6K cores, typically run at 2K � Used 2.2M early science hours on Franklin
14

Image illustrates the cellular burning structures in hydrogen flames

NERSC User George Smoot wins 2006 Nobel Prize in Physics

Mather and Smoot 1992 COBE Experiment showed anisotropy of CMB

Cosmic Microwave Background Radiation (CMB): an image of the universe at 400,000 years

Impact Of High Performance Computing at NERSC
Calculation: Planck full focal plane
1 year simulation of CMB (T & P), detector noise & foregrounds 74 detectors at 9 frequencies 750 billion observations 54,000 files, 3 TB data PI: J. Borrill, LBNL

Science Result:
� 9 "routine" 1-frequency maps � Unprecedented 9-frequency map with entire simulated Planck data set

Scaling Results:
� 9-frequence problem ran for < 1 hour on 16K cores

NERSC Vision

18

NERSC Computing

19

New Model for Collecting Requirements
Modeled after ESnet activity rather than Greenbook
Two workshops per year, starting with BER and BES

Sources of Requirements
Office of Science (SC) Program Managers Direct gathering through interaction with science users of the network
Case studies, e.g., from ESnet
Magnetic Fusion Large Hadron Collider (LHC) Climate Modeling Spallation Neutron Source

Observation of the computing use and technology Other requirements Requirements aggregation

Moore's Law is Alive and Well

Moore's Law

2X transistors/Chip Every 1.5 years Called "Moore's Law" Microprocessors have become smaller, denser, and more powerful.

Gordon Moore (co-founder of Intel) predicted in 1965 that the transistor density of semiconductor chips would double roughly every 18 months.
Slide source: Jack Dongarra

21

Old:Power Wall Can put more transistors on a Transistors are Expensive; Power is New: chip than can afford to turn on Free
Scaling clock speed (business as usual) will not work 10000 Power Density (W/cm2)

Sun's Surface Rocket Nozzle Nuclear Reactor
8086

1000

100

Hot Plate
P6 Pentium� 486 1990 Year 2000 2010
Source: Patrick Gelsinger, Intel�

10 4004 8008 8085 386 286 8080 1 1970 1980

22

2005: Clock speed 2x every 2 years
2005 IT Roadmap Semiconductors

Clock Rate (GHz)

2005 Roadmap

Intel single core

23

23

2007: Cores/chip 2x every 2 years
Revised IT Roadmap Semiconductors

Clock Rate (GHz)

2005 Roadmap

2007 Roadmap Intel single core Intel multicore

24

24

Parallelism is "Green"
� Highly concurrent systems are more power efficient � Dynamic power is proportional to V2fC � Increasing frequency (f) also increases supply voltage (V): more than linear effect � Increasing cores increases capacitance (C) but has only a linear effect � Hidden concurrency burns power � Speculation, dynamic dependence checking, etc. � Push parallelism discovery to software (compilers and application programmers) to save power � Challenge: Can you double the concurrency in your software every 2 years?
25

NERSC Power Efficiency

26

Computational Requirements of the Office of Science Are Clear
Modeling and Simulation at the Exascale for Energy and the Environment has significant requirements for Exascale

27

Power Demands Threaten to Limit the Future Growth of Computational Science
� LBNL Study for Climate Modeling in 2008 (Shalf, Wehner, Oliker) � Extrapolation of Blue Gene and AMD design trends � Estimate: 20 MW for BG and 179 MW for AMD DOE E3 Report � Extrapolation of existing design trends � Estimate: 130 MW DARPA Exascale Study � More detailed assessment of component technologies
� Power-constrained design for 2014 technology � 3 TF/chip, new memory technology, optical interconnect





� Estimate: 20 MW for memory alone, 60 MW aggregate so far



NRC Study
� Power and multicore challenges are not just an HPC problem

NERSC will use an innovative approach to address this challenge

28

Evidence of Waste
TensilicaDP PPC450

� Power5 (Server)
� 389 mm2 � 120 W @ 1900 MHz

� Intel Core2 sc (Laptop)
Intel Core2

� 130 mm2 � 15 W @ 1000 MHz

� PowerPC450 (BlueGene/P)
Power 5

� 8 mm2 � 3 W @ 850 MHz

� Tensilica DP (cell phones)
� 0.8 mm2 � 0.09 W @ 650 MHz
Each core operates at 1/3 to 1/10th efficiency of largest chip, but you can pack 100x more cores onto a chip and consume 1/20 the power!
29

Example: Cloud Resolving Climate Simulation
� An "Exascale size" challenge is a ~1 km horizontal resolution, cloud resolving, climate simulation � Use massive concurrency for better simulation efficiency: parallelism is powerefficient
� Requires significant algorithm work

� E.g., Dave Randall's SciDAC work on Icosahedral grids � Anticipate algorithm and data structure scaling limits
� Circa 2008 estimate: 179 MW on AMD, 20 on BG/P, and 3 on Tensilicabased system tailored for Climate � Other examples: MHD, Astro, Nano, ...

30

NERSC Data

31

Data Tsunami
� Soon it will no longer be sufficient for NERSC to rely solely on center balance and HPSS to address the massive volumes of data on the horizon � The volume and complexity of experimental data will overshadow data from simulation
� � � � � � � LHC ITER JDEM/SNAP PLANCK SciDAC JGI Earth Systems Grid

32

NERSC Global Filesystem (NGF)
� � � � � After thorough evaluation and testing phase in production since early 2006 Based on IBM GPFS Seamless data access from all of NERSC's computational and analysis resources Single unified namespace makes it easier for users to manage their data across multiple system First production global filesystem spanning five platforms, three architectures, and four different vendors

33

Large Storage Environment (HPSS)

61+ million files 44 PB capacity 1.7x per year data growth

34

Example: SUNFALL Interface

� � �

Astrophysics data analytics Successful multidisciplinary team Uses machine learning for discovery

35

"Google" for Science: Access to Data Accelerates Science
� Data helps science
� Neanderthal genome example � Google has similar examples outside of science Arabic translation: Google with more data beats others with more specialists

� Google uses, MapReduce, for over 10K applications
� Clusters, grep, machine-learning,... � Hides load balancing, data layout, disk failures, etc.

� NERSC will do this for science
� Domain-specific analysis (by scientists) � Domain-independent infrastructure � Efficient use of wide area bandwidth

36

National Energy Research Scientific Computing (NERSC) Division
NERSC DIVISION DIRECTOR
KATHERINE YELICK

SYSTEMS DEPARTMENT
HOWARD WALTER

NERSC CENTER
WILLIAM KRAMER

SERVICES DEPARTMENT
FRANCESCA VERDIER

Department Head
COMPUTATIONAL SYSTEMS JAMES CRAW Group Leader MASS STORAGE JASON HICK Group Leader NETWORK, SECURITY & SERVERS BRENT DRANEY Group Leader COMPUTER OPERATIONS & ESNet SUPPORT STEVEN LOWE Group Leader DATA SYSTEMS SHANE CANON Group Leader

General Manager
SCIENCE DRIVEN SYSTEM ARCHITECTURE JOHN SHALF Group Leader

Department Head
USER SERVICES JONATHAN CARTER Group Leader ANALYTICS WES BETHEL, Team Leader (Matrixed - CRD) OPEN SOFTWARE & PROGRAMMING DAVID SKINNER Group Leader

37

Subscribe
x