Keynote II
David Kirk from NVIDIA
Multicore, Multipipes, Multithreads -- too much parallelism to handle?
I'm just going to post a small summary and whatever I thought stood out as interesting.
* Processor / System Parallelism
-- Single vs. Multi core
-- Fine vs. Coarse grained
-- Single vs. Multi pipeline
-- vector vs scalar math
-- Data vs. Thread Vs. Instruction level parallel
-- Single vs. Multithreaded processors
-- Message passing vs. Shared Memory communiction
-- SISD, SIMD, MIMD ...
-- Tightly vs. Loosely connected cores & threads
* Application / Problem Parallelism
-- No parallelism in workload means System/Processor parallelism is irrelevant
-- Large problems can more easily be parallelized
-- Good Parallel Behavior:
*-- Many inputs/results
*-- Parallel Structure -- Many similar computation paths
*-- Little Interaction between data/threads
-- Data parallelism easy to map to machine "automagically"
-- task parallelism requires programmer forethought
* Adopting Parallel software programing models will rely on university education.
* Cell Processor Approach to Parallelism (What is the programming model?)
-- Stuff we've seen so far
* Geforce 7800
-- 302 M trans, all computation oriented, no transistors being applied to cache or other non computation elements.
* translucency demo, the hand creatures with light sources behind them. This demo is much much longer, more intro than the E3 demo had. He froze the demo with girl in body suit, moving picture around and showing the geometry density of the models. (Sorry I didn't get any of the stats, but its all the Geforce 7800 stats.) upto 30 rendering passes to layer everything in. It's highly parallel computations.
*Life of a Triangle through graphics pipeline. (SLIDE 2 page 8) Simplified
Vertex Fetch
Vertex Processing
Primitive Assembly Setup
Rasterize & Zcull ( throwing away what isn't seen, adding it what is seen)
Pixel Shader
Texture <-> Frame Buffer
Pixel Shader
Pixel Engine (ROP)
Verteces are independent, so they are highly parallelizable.
Block Diagrams of Shaders and 7800 (Slide 1,2 page 9)
* Big GFlop #s
Geforce 6800 Ultra
- Clock : 425
- Vec4 MAD Ginstructions : 6.7568
- Gflops : 54.0544
Geforce 7800 GTX
- Clock : 430
- Vec4 MAD Ginstructions : 20.6331
- Gflops : 165.0648
*GPU Approach to Parallelism
-- single core
-- multipipeline
-- multithreaded
-- fine grained
-- Vector
-- Explicitly and Implicitly threaded (programmer threads code for shaders, instances spawned automatically)
-- Data Parallel (no communication between threads)
* Multi Pipeline App Improvement
-- Multithread applications: X time speedup where X is number of pipelines due to data parallelism
-- still requires a lot of software developement effort. (programming languages lack expressability of parallelism)
-- CPU is also a bottleneck
* Dual Core Procesors
-- no improvement to feeding GPUs because apps are single threaded.
* GPU Programming Languages
-- DX, OGL 1.3, Brook for GPUs (Stanford), SH for GPUs (for GPGPU)
-- various languages have parallelism baked into language
* Benefit of parallelism in GPUs is because the Data is parallel in nature, i.e. independent on a horizontal level. (vertices can be processed independently)
* Problems for widespread adoption of parallel programming.
-- Programming languages do not have parallelism baked into their semantics.
-- writing parallel code in single threaded language is INSANE
-- Language developement is critical
* Design Strategies for CPU/GPU
-- CPU: Make workload (one thread) run as fast as possible
*-- Caching
*-- Instruction/Data Prefetch
*-- "Hyperthreading"
*-- Speculative Execution
*-- limited by "perimeter" - communication bandwidth
*-- multicore will help... a little
-- GPU: Make the workload (as many threads as possible) run as fast as possible.
*-- Parallelism (1000s of threads)
*-- Pipelining
*-- limited by "area" -- compute capability
* Implementable programs on a GPU (versus the CPU)
-- Graphics (of course)
-- Image Processing and Analysis
-- Correlations - Radio telescope, SETI
-- Monte Carlo Simulation - Neutron Transport
-- Neural Networds (Speech recognition, Handwriting recognition)
-- Ray Tracing
-- Physical Modeling and Simulation
-- Video Processing
-- Black Scholes option Pricing
(Nvidia declined to speculate about the future where graphics plataues, would they expand into generic processing)
*The Good news and the Not-so-good news
-- The Good
*-- increasing Parallelism in CPU/GPU
*-- Workloads -- graphics and GP -- are highly parallel
*-- Moore's lay and the "capability curve" are still our friends
-- The Bad
* -- Parallel Programming is HARD (especially in a serial language/environment)
* -- Language and Tool support for parallelism is poor
* --Computer Science Education is not focussed on parallel programming (needs to be undergrad level and not grad level)
* Solution: More research into multithreaded developement, especially language design.
-------------------------------------------------------------------------
(Guys if i suck at taking notes, let me know and tell me how to improve!)
I don't think anything significant was really said in the talk, I'm a language guy so I was thrilled to hear some acknowledgement that we need better language design.