David Kirk passes the torch

trinibwoy

Meh
Legend
Supporter
http://www.nvidia.com/object/io_1233142016114.html

NVIDIA Corporation today announced that Bill Dally, the chairman of Stanford University’s computer science department, will join the company as Chief Scientist and Vice President of NVIDIA Research. The company also announced that longtime Chief Scientist David Kirk has been appointed “NVIDIA Fellow.”

“I am thrilled to welcome Bill to NVIDIA at such a pivotal time for our company,” said Jen-Hsun Huang, president and CEO, NVIDIA. “His pioneering work in stream processors at Stanford greatly influenced the work we are doing at NVIDIA today. As one of the world’s founding visionaries in parallel computing, he shares our passion for the GPU’s evolution into a general purpose parallel processor and how it is increasingly becoming the soul of the new PC.
 
Here's Bill's webpage at Stanford: http://cva.stanford.edu/billd_webpage_new.html

From his website and from what I've heard from friends, Bill's extremely well known as a systems / network / memory guy. He's very much a big-picture system architect, not a branch-predictor / alu / OOO reorder buffer CPU architect. The interesting thing that is a huge departure from Kirk is that I'm fairly certain he has 0 graphics experience (OpenGL/DX or GPGPU/CUDA).
 
Kinda weird, any ideas why Kirk is leaving? I don't think he has been doing anything wrong the last years, he's not too old, so why a new chief scientist? Plus he has much more experience with graphics hardware than anyone else out there ...
 
So, over what kind of timescale should we expect Dally to have an impact? I dare say I get the feeling he'll be setting out NVidia's next great architectural revolution, which would be 5+ years away.

At the same time, is there much that will need revolutionising by then?

Looking at his Stanford page it strikes me he's more of an infrastructure guy - streamlining and innovating the glue that makes systems work.

Jawed
 
jtkirkmj3.png
 
Davros: I assume it's something along the line of an IBM or Microsoft Fellow. The fact that he's made a huge contribution and deserves a title like this is obvious. What's less obvious is what it means practically.
 
So what D3D features/concepts are needed?
Jawed

IMO all that is needed is a really fast OpenCL implementation. Fast means hardware which can do efficient atomic operations, scatter, and gather. Larrabee, perhaps GT3xx, and perhaps whatever ATI is doing will get us there.

Seems as if this guy is bringing some creative background in the low power and cross chip networking department. Which seems of more long term importance, when current single chip solutions reach beyond acceptable limits in power and/or size.
 
At the Great Lakes Consortium last summer, among the interesting things is Kirk's presentation on the future of CUDA, which is listed as being at 4pm on Tuesday on this page:

http://www.greatlakesconsortium.org/events/GPUMulticore/agenda.html

here's a direct link to the presentation:

http://www.greatlakesconsortium.org/events/GPUMulticore/kirk.pdf

One of the nice things he talks about (in the streaming video) is the speed-up of the optical proximity correction used in developing the masks used in chip-manufacturing lithography.

It's interesting that he says that it takes months for a server farm of hundreds of CPUs to generate the optically-corrected mask that they use for chips the size of GT200. Months :oops: 200x faster with GPU acceleration :p

The entire programme of presentations is pretty interesting, though I haven't gone through it any detail and some of it looks like repeats of presentations you might have seen before.

Jawed
 
Intestingly though only on a sidenote, on page 32 of the linked PDF there's a Tesla-GPU (GT200, i know) dissected with a TPC referring to only 1/3rd of the whole TPC-Cluster I was aware of.

Some redefinition for Tesla or simply a mistake?

BTW:
n-Body in Astrophysics:
http://progrape.jp/
 
Intestingly though only on a sidenote, on page 32 of the linked PDF there's a Tesla-GPU (GT200, i know) dissected with a TPC referring to only 1/3rd of the whole TPC-Cluster I was aware of.
My interpretation is that what's shown as a cluster consists of 3 "blocks" out of a total of 6 blocks, per cluster. "Block" appears to correspond with what's indicated by TPA.

I've always interpreted this die picture to indicate that each cluster's 6 blocks are arranged in pairs, i.e. 2 blocks make up a SIMD.

But if we take TPA as the baseline SIMD, then what we actually have are two sets of 3 SIMDs. Each of the two sets has a dedicated TMU. There's a symmetry there that's appealing.

Some redefinition for Tesla or simply a mistake?
This isn't the first time I've seen an NVidia slide break down the die in this fashion. I've always thought of it as a mistake, just a simplification for the sake of presentation.

But now I'm wondering.

Perhaps GT200's SIMDs are really 4 wide.

If so, could this have been done to narrow operand collection bandwidth per SIMD. Perhaps that's why MUL is easier to get at (doesn't entirely make sense, but I still don't get why the MUL is hard to get at in GPUs before GT200).

So a cluster consists of 4 basic units:
  • 6x SIMD-4 ALUs
  • 6x MI ALUs (1 lane transcendental and 4 lanes interpolation)
  • 2x TMUs
  • 1x double precision ALU
BTW:
n-Body in Astrophysics:
http://progrape.jp/
From the talk that David gave it appears the astrophysics community is in the process of dumping its supercomputers. It's a fairly small community of scientists he says and CUDA is something of a big bang phenomenon.

Mixing metaphors, they'll soon be referring to the years before 2009 as years BC - before CUDA :LOL:

The oscillating fan is hilarious. Mate, you do know you can stop it oscillating! And the red-bordered LCD, wow.

Jawed
 
Back
Top