NVIDIA GT200 Rumours & Speculation Thread

Status
Not open for further replies.
From the horse's mouth:

http://www.wcm.bull.com/internet/pr/rend.jsp?DocId=350329&lang=en

The new Bull NovaScale supercomputer consists of a cluster of 1,068 8-core 'generalist' computing nodes (Intel® processors), delivering some 103 Teraflops of power, and 48 specialist 512-core GPU nodes, providing additional theoretical power of up to 192 Teraflops.
"Per core", ahem, SP, erm lane, that's 7.8GFLOPs. As compared with 3.4GFLOPs in G92 per lane.

Jawed
 
3.9GHz per lane or each lane has more ALU capability (significantly more). e.g. if each lane is counted as having 3 FLOPs (instead of 2 FLOPs in G92), then that's 2.6GHz ALU clock.

As to the config of the GPU, well take your pick. 2.6GHz implies to me that this is a 128 lane GPU. Or, if you believe it's a 2TFLOP GPU, then 256 lanes.

Or it could just be that Bull is shitting us.

Jawed
 
From the horse's mouth:

http://www.wcm.bull.com/internet/pr/rend.jsp?DocId=350329&lang=en
The new Bull NovaScale supercomputer consists of a cluster of 1,068 8-core 'generalist' computing nodes (Intel® processors), delivering some 103 Teraflops of power, and 48 specialist 512-core GPU nodes, providing additional theoretical power of up to 192 Teraflops.


"Per core", ahem, SP, erm lane, that's 7.8GFLOPs. As compared with 3.4GFLOPs in G92 per lane.

Jawed
Perfect fit.
That's 375 GFLOPS per GPU - if you read it as "48 specialist GPU nodes with a total of 512 GPU-Cores". That'd make the most sense to me at least. After all, AFAIK a "node" is referred to in the SC space not as a single board or a single Chip. So I'd expect 48 Tesla-S870 sitting in the room, each with 4 G80.
 
3.9GHz per lane or each lane has more ALU capability (significantly more). e.g. if each lane is counted as having 3 FLOPs (instead of 2 FLOPs in G92), then that's 2.6GHz ALU clock.

As to the config of the GPU, well take your pick. 2.6GHz implies to me that this is a 128 lane GPU. Or, if you believe it's a 2TFLOP GPU, then 256 lanes.

Or it could just be that Bull is shitting us.

Jawed

You guys are leaving out the most obvious optoin:

256 alus x 2 flops per alu x 2 ghz operating frequency x 192 gpus.

AKA 192 GT200 Ultras each with 256 alus compared to 240 in the GTX config. Which would make the GT200 16 pipe x 16 alus which seems reasonable. Disabling 1 pipe for yield recovery in the consumer space get you 240 alus.

Aaron spink
speaking for myself inc.
 
Perfect fit.
That's 375 GFLOPS per GPU - if you read it as "48 specialist GPU nodes with a total of 512 GPU-Cores". That'd make the most sense to me at least. After all, AFAIK a "node" is referred to in the SC space not as a single board or a single Chip. So I'd expect 48 Tesla-S870 sitting in the room, each with 4 G80.

But 48*4*375GFLOPs != 192TFLOPs...
 
AKA 192 GT200 Ultras each with 256 alus compared to 240 in the GTX config. Which would make the GT200 16 pipe x 16 alus which seems reasonable. Disabling 1 pipe for yield recovery in the consumer space get you 240 alus.
Heh, how stupid do you think GPU architects are? :) They don't need to be able to even theoretically support 256 ALUs on any SKU in order to have an extra multiprocessor for redundancy.

If GT200 has 10 clusters with 3x8 ALUs each, then clearly it's not possible to have any config with 256 ALUs; after 240, you'd go to 272. However, it is easy to have an extra 8-wide ALU/processor (or more) for redundancy purposes even then.

EDIT: My guess is that Bull's PR is a bunch of bull****. The Nehalem variant used is clearly a 4-core CPU, and they claim it's 8-core; why should I believe their GPU numbers then if they can't even get the CPU stuff right?
 
You guys are leaving out the most obvious optoin:

256 alus x 2 flops per alu x 2 ghz operating frequency x 192 gpus.

AKA 192 GT200 Ultras each with 256 alus compared to 240 in the GTX config. Which would make the GT200 16 pipe x 16 alus which seems reasonable. Disabling 1 pipe for yield recovery in the consumer space get you 240 alus.
I agree this is an "obvious" configuration, but Bull's statement implies only half the 49152 lanes the obvious configuration implies.

48*512=24576 per Bull versus 192*256=49152 for the obvious configuration.

Maybe error-correcting redundancy (2 GPUs for 1 result) is confusing things?

Jawed
 
EDIT: My guess is that Bull's PR is a bunch of bull****. The Nehalem variant used is clearly a 4-core CPU, and they claim it's 8-core; why should I believe their GPU numbers then if they can't even get the CPU stuff right?
8 logical cores aka hyperthreading?
 
Or then again it's eight cores per node.
also, 48x4 GT200s would make sense with 1TFlop per GPU (ahem, still doesn't add up with the "512 cores" bullshit)
 
8 logical cores aka hyperthreading?
I can accept that... if it means aaron gets to complain about it for the next few posts! ;) Just kidding, you get my point though: that's really ridiculous marketing if true, and doesn't give me much more confidence that the PR is factually accurate or precise.
 
And you reach 192 TFlops wth 192 G80s... how?
Not at all - my bad - but maybe a "node" is somethin else than a standard-sized tesla configuration? Maybe a "node" is a rack with a controller 1U and two to four 1U S870?

512 G80-cores are perfectly capable of reaching a market(ing)-value of 192 TFLOPs.

But 48*4*375GFLOPs != 192TFLOPs...
True, but 512*0,375=192. Maybe not all Tesla-Boxes are S870-Racks.
 
Last edited by a moderator:
We'll see. PC INpact, which did have an inside scoop apparently, claimed that it was a next-generation Tesla solution, so likely based on GT200. Anyway, I already wrote the news item for this (Rys/Geo just didn't have the time to proofread it yet) and I claim in it that it's likely GT200-based, so stop contradicting me here, damnit! ;) Plus, if they're using Nehalem, that's coming out after the GT200-based Tesla. So why in the world would they want to stick with G80 or even G92?
 
Interestingly, the French version reads slightly differently to me than the English version (I find it clearer). It says:

"The new Bull Novascale supercomputer is a cluster associating 1068 'generalist' nodes with 8 cores each - Intel processors - delivering 103 teraflops of power, and 48 GPU nodes with 512 cores each, providing an additional theoretical 192 teraflops."

Unfortunately, even in the French version it's hard to say if the plural "Intel processors" pertains to the ensemble or to an individual node. French makes a more parsimonious use of commas than English, so I have added two commas where they would make sense in English. What leaves no doubt though, is that the 512 cores are attributed to a single GPU node.
 
We'll see. PC INpact, which did have an inside scoop apparently, claimed that it was a next-generation Tesla solution, so likely based on GT200. Anyway, I already wrote the news item for this (Rys/Geo just didn't have the time to proofread it yet) and I claim in it that it's likely GT200-based, so stop contradicting me here, damnit! ;) Plus, if they're using Nehalem, that's coming out after the GT200-based Tesla. So why in the world would they want to stick with G80 or even G92?


Exactly. This supercomputer is coming out in 2009, so why not GT200 if not a GT200 tweaked respin (ala G92 vs G80).
 
Status
Not open for further replies.
Back
Top