NVIDIA GT200 Rumours & Speculation Thread

trinibwoy · Jun 9, 2008

Jawed said:
Something I've been told: each SM has a dedicated double-precision MAD unit, so there's 30 in total. That's a surprise, 1/12th of single-precision, way less than I was expecting, 78 GFLOPs

Maybe that's what Rys was referring to.

That's another benefit of VLIW vs "scalar" -> Much easier to reconfigure the ALU's to do different things on the fly.

Either way, isn't 78 GFlops kinda weak even compared to CPUs?

no-X · Jun 9, 2008

AnarchX said:
Read the right, latest documents.

Aren't they a bit ambiguous if it come to exact numbers about texture adressing? At last the leaked slides. Diagrams lacks TAUs, specifications lacks exact number of TAUs... I just recollected these interesting posts:

http://forum.beyond3d.com/showpost.php?p=1163557&postcount=1124

http://forum.beyond3d.com/showpost.php?p=1163570&postcount=1126

Jawed · Jun 9, 2008

Love_In_Rio said:
one dummy question, is double precision used for gaming graphics or only for gpgpu tasks ?

Just GPGPU effectively.

It will eventually make it into D3D, but it seems like a low priority as there really isn't much (any?) use for it.

Jawed

Jawed · Jun 9, 2008

trinibwoy said:
Maybe that's what Rys was referring to.

That's another benefit of VLIW vs "scalar" -> Much easier to reconfigure the ALU's to do different things on the fly.

I think in ATI's case the implementation of dot-product (which naturally requires four ALU lanes to work together) is a key part of this. A lot of the wiring "was already there", I guess.

Either way, isn't 78 GFlops kinda weak even compared to CPUs?

Yeah, though maybe NVidia has a bandwidth advantage.

I don't know what Nehalem is meant to be capable of

But that'll be what's on people's radar when they ponder NVidia for double-precision scientific GPGPU.

Larrabee, if it's 2 TFLOPs single-precision, could be half that in double-precision.

Jawed

ShaidarHaran · Jun 9, 2008

Jawed said:
Just GPGPU effectively.

It will eventually make it into D3D, but it seems like a low priority as there really isn't much (any?) use for it.

Jawed

I'm no EE (nor Dev) so I may be way out in left field on this one, but is it possible that DP math may be more beneficial to physics performance and/or complexity?

ShaidarHaran · Jun 9, 2008

Jawed said:
I think in ATI's case the implementation of dot-product (which naturally requires four ALU lanes to work together) is a key part of this. A lot of the wiring "was already there", I guess.

Yeah, though maybe NVidia has a bandwidth advantage.

I don't know what Nehalem is meant to be capable of But that'll be what's on people's radar when they ponder NVidia for double-precision scientific GPGPU.

Larrabee, if it's 2 TFLOPs single-precision, could be half that in double-precision.

Jawed

IIRC Sandy Bridge is the next big jump in CPU DP compute power, and it's been pegged at approximately 200GFLOPs.

Jawed · Jun 9, 2008

ShaidarHaran said:
I'm no EE (nor Dev) so I may be way out in left field on this one, but is it possible that DP math may be more beneficial to physics performance and/or complexity?

What like the precise impact location of a bullet over a distance of 1km? Seems unlikely.

Avoidance of rounding errors during multi-body collision? I suspect you'd need an incomputably large number of objects (in real time) interacting for that to be the case. At least for a few years.

Jawed

ShaidarHaran · Jun 9, 2008

Jawed said:
What like the precise impact location of a bullet over a distance of 1km? Seems unlikely.

Avoidance of rounding errors during multi-body collision? I suspect you'd need an incomputably large number of objects (in real time) interacting for that to be the case. At least for a few years.

Jawed

LOL, ok, thanks for clearing that one up

I'm grasping at straws here, hoping for a way to put all that new compute power to use inside our own home PCs...

nAo · Jun 9, 2008

DP would be useful for SATs and to filter exponential shadow maps on 'long' ranges without employing log-filtering.

Jawed · Jun 9, 2008

nAo said:
DP would be useful for SATs and to filter exponential shadow maps on 'long' ranges without employing log-filtering.

Interesting.

What sort of DP-ALU:TEX ratio would you need for decent performance?

Presumably the creation of a SAT requires only a very short shader with a couple of DP ops. Filtering would be pretty hairy though, wouldn't it?

Jawed

Rangers · Jun 9, 2008

Okay, seems we've had a dearth of performance leaks the last couple days. Anybody want to call the first leaked review??

If I recall past launches, we usually get one the weekend before NDA expires.

MfA · Jun 9, 2008

Larrabee is an area efficient design, not a desktop processor ... so I wouldn't expect half speed DP, expect quarter speed DP (at least as far as MAD is concerned). The 8 core Sandy Bridge would probably just get to 200 GFLOPs, assuming half speed DP and 2 AVX pipelines per core, in 2010 on a 32 nm process ... that's not really going to close any gaps.

If it's true Nvidia has dedicated DP engines I wonder what happened ... did they just judge ATI as not much of a threat and thought they could simply do it like this and save on development or did they find out about ATI's DP support and felt they had to follow but didn't have enough time to do it efficiently this generation? Regardless, I'm sure their next part will do it just fine without wasting area.

3dilettante · Jun 9, 2008

MfA said:
Larrabee is an area efficient design, not a desktop processor ... so I wouldn't expect half speed DP, expect quarter speed DP (at least as far as MAD is concerned). The 8 core Sandy Bridge would probably just get to 200 GFLOPs, assuming half speed DP and 2 AVX pipelines per core, in 2010 on a 32 nm process ... that's not really going to close any gaps.

Larrabee was projected to hit 1 TFLOP DP in Intel's slides, though the numbers pointed to an FMAC to hit it.
Most of the discussion points to just one vector unit (per core) with registers wide enough for 16 SP values.

If DP is quarter speed, Larrabee would hit 4 TFLOPs SP, though nothing so far indicates it has the massive operand bandwidth or the necessary number of vector units necessary for hitting that.

MfA · Jun 9, 2008

Sorry, with that kind of roundabout reasoning based on slides from a very high level presentation on a design which probably wasn't even finalized in any way shape or form I'd rather just trust on my intuition

I wouldn't expect half speed DP.

PS. of course I didn't expect 1/12th speed DP for NVIDIA either ...

V3 · Jun 10, 2008

MfA said:
Sorry, with that kind of roundabout reasoning based on slides from a very high level presentation on a design which probably wasn't even finalized in any way shape or form I'd rather just trust on my intuition I wouldn't expect half speed DP.

PS. of course I didn't expect 1/12th speed DP for NVIDIA either ...

If IBM can get Cell to half speed DP of its SP, I thought Intel would aim for the same. I mean Larrabee is aimed at that sector. If Intel or AMD aren't taunting Intel with all this GPGPU stuff, I'm sure Intel would be content with just CPU.

78 GFLOPS is pretty poor peak for the reported size and power consumption of the board. It couldn't even beat the Cell board.

Jawed · Jun 10, 2008

MfA said:
PS. of course I didn't expect 1/12th speed DP for NVIDIA either ...

It might be fairer to think of it as 1/8th of the MAD throughput

Now that NVidia has decided to count the MUL, though, I'm inclined not to. After all we're counting 1/5th MAD rate on ATI. Though it's 2/5th for ADD.

Sigh, and I said 249GFLOPs for RV770 based on 777MHz, not 750, so 240GFLOPs

Jawed

suryad · Jun 10, 2008

trinibwoy said:
http://www.forum-3dcenter.org/vbulletin/showpost.php?p=6568820&postcount=523

Nothing revolutionary. Certainly nothing that should get Rys so excited. Wonder what he's talking about....

I may be gullible to Nvidia's marketing but from that slide I am quite interested in finding out what exactly has been improved to SLI...

mczak · Jun 10, 2008

trinibwoy said:
Either way, isn't 78 GFlops kinda weak even compared to CPUs?

Depends how you define "weak". Current Core2Quad is 2 (vec2 fp64) * 4 (cores) * 2 (mul+add) * clock flops. That's 48 gflops for a 3ghz c2q. So the presumed GTX280 would still be faster - but OTOH it would be in the same ballpark so you could certainly consider it weak (and it would look like a loser in the flops/power category).

Pete · Jun 10, 2008

V3 said:
78 GFLOPS is pretty poor peak for the reported size and power consumption of the board. It couldn't even beat the Cell board.

Well, are we sure it'd draw full power doing DP? I remember it was said G80's texture units drew a lot of power, so maybe if GT200's workload is DP-focused it won't get close to the expected gaming power draw.

This should make for an interesting page in any review's power draw stats, if they test multiple scenarios (even just power draw during different 3D Mark feature tests) rather than aim for peak draw.

mczak · Jun 10, 2008

Pete said:
Well, are we sure it'd draw full power doing DP? I remember it was said G80's texture units drew a lot of power, so maybe if GT200's workload is DP-focused it won't get close to the expected gaming power draw.

You're probably right (assuming all those SP units are just idling along) but still power draw would probably be quite a lot. btw is this supposed separate dp mad in addition to the 8 single precision mads (thus helping with the single precision flops too?) or replacing one of them?

NVIDIA GT200 Rumours & Speculation Thread

trinibwoy

Meh

no-X

Jawed

Jawed

ShaidarHaran

hardware monkey

ShaidarHaran

hardware monkey

Jawed

ShaidarHaran

hardware monkey

nAo

Nutella Nutellae

Jawed

Rangers

MfA

3dilettante

MfA

V3

Jawed

suryad

mczak

Pete

Moderate Nuisance

mczak

Similar threads