Nvidia GT300 core: Speculation

Status
Not open for further replies.
An nVidia beta tester has confirmed on the hardware.no forum that the GT300 has been taped out and the first samples are up and running in nVidia's labs at 700/1600/2100.

Er what? Looking here:
http://www.diskusjon.no/index.php?session=30f4aca6320c877e855d3202f632a00b&showtopic=873626&st=3620

My norwegian is non-existent, but it looks like someone quoted the hw-infos story(...am still waiting from january to buy my 40nm RV790) then said below they hoped it would available for them to beta test in summer.

(Can someone confirm that he said "hope" rather than "will be" available for testing).
 
Er what? Looking here:
http://www.diskusjon.no/index.php?session=30f4aca6320c877e855d3202f632a00b&showtopic=873626&st=3620

My norwegian is non-existent, but it looks like someone quoted the hw-infos story(...am still waiting from january to buy my 40nm RV790) then said below they hoped it would available for them to beta test in summer.

(Can someone confirm that he said "hope" rather than "will be" available for testing).
He is pointing out that the stories about the tape-out is true.

He is hoping he will receive a GT300 card to test during the course of the summer.

I can translate a earlier post he made:
Talking about future products from nvidia: "We will begin to see the first 40nm products around Computex in the beginning of June with main focus on notebook products. Last I heard about GT300 is october/november, with the launch of CUDA 3.0. They (nvidia) have said it will be a "very different" architecture. It remains to be seen what that means."
 
Last edited by a moderator:
Last I heard about GT300 is october/november, with the launch of CUDA 3.0. They (nvidia) have said it will be a "very different" architecture. It remains to be seen what that means."

Better PhysX support, better and faster programming and computational work.

Should be interesting.

US
 
Well if the specs are true this radical new architecture obviously isn't doing anything to reduce bandwidth requirements :D Though I've almost convinced myself that they're going to expand the shared memory concept to be an integral part of graphics rendering too. It just doesn't make sense to have all that hardware idle on consumer parts. And who knows what other bits they could delegate to the shader core if the rumoured 10-fold increase in DP performance is really there.
 
Shared memory might be used to hold vertex attribute data and barycentrics for interpolation on demand.

Jawed
 
It's so low in GT200 that 10x isn't a big deal, particularly if SP capability increases by 2x+ and DP scales with both an architectural change and a change in the ratio SP:DP.

Jawed
 
Let's see. On G200, sp is 624G (the mul doesn't count). Let's assume a 2.5x increase, that comes to about 1.5T. 10x DP means 780G. Roughly half of peak SP perf. Now if LRB has DP:SP 1:2, it won't be surprising. All modern CPU's (SSE2 onwards) have it.

But for GPU's it's a total waste. Remember, rv770 has 1:4 DP:SP (not counting the t unit). And that's with the SP units leveraged to do DP math. AMD's ratio of 1:4 DP:SP seems much more balanced from a die area vs money making utility POV.
 
Larrabee's DP is a waste by the same measure. I still don't have a good idea how x86 DP is implemented and how much extra cost it has in comparison with no DP at all or things like ATI's DP.

As for money-making, well this is supposed to be a reassuringly expensive GPU, particularly in Tesla. Other GT3xx GPUs prolly won't have DP so won't carry the burden.

Jawed
 
IMO - as long as they're GTxx, they're gonna have DP. But the question is: Are we going to see a GT300 as high-end-consumer card for xmas or rather a G300.

My money is on the last one.
 
Yes, I've noticed your previous musings on this. It'd be pretty amazing if true.

It might reflect the supposedly abysmal state of 40nm or other factors to do with timeliness/re-design, I suppose. Leave DP until after the new architecture is shipping?

A DP variant to follow on 28nm by summer/autumn 2010?

Jawed
 
Larrabee's DP is a waste by the same measure.
Absolutely. But then LRB isn't a GPU, nor is it meant to be. It's a CPU which also-does-graphics.

I still don't have a good idea how x86 DP is implemented and how much extra cost it has in comparison with no DP at all or things like ATI's DP.

I guess the x86 implementation is closer to the ATI's dp implementation than it is to the nv's present dp implementation..
 
Absolutely. But then LRB isn't a GPU, nor is it meant to be. It's a CPU which also-does-graphics.

I guess the x86 implementation is closer to the ATI's dp implementation than it is to the nv's present dp implementation..

What are the differences between the two? I know ATI has more DP units than NV, but that's not a big difference.

DK
 
For a start, ATI reuses the fp multiplier units to perform dp multiplication. SP units are of course, expanded to 27 bit . NV has separate alu's for SP and DP. For ATI, the registers become just like SSE2. Either a float4 or a double2.
 
For a start, ATI reuses the fp multiplier units to perform dp multiplication. SP units are of course, expanded to 27 bit . NV has separate alu's for SP and DP. For ATI, the registers become just like SSE2. Either a float4 or a double2.

Gotcha, thanks for that clarification. I think Intel supports DP as a native data type, and ends up with half the throughput that they'd have for SP. But I'm not 100% sure about that.

I don't know what the power trade-offs for the two approaches are, but shared SP/DP obviously is more area efficient.

DK
 
Absolutely. But then LRB isn't a GPU, nor is it meant to be. It's a CPU which also-does-graphics.

Wouldn't you say that when engineers start with a chalkboard design of an architecture that they have defined in their minds whether they're about to design a CPU or a GPU? The fact that LRB is based on x86 doesn't make it a CPU or a hybrid CPU+GPU. Unless of course you can use LRB in a future system w/o a CPU and the unit can take over generel processing as well as graphics.

If you're to oversimplify things as much even today's GPUs are basically a bunch of arithmetic logic units in the heart of the GPU "decorated" with a number of ff units all around it. How each IHV implements X,Y,Z is another chapther but the outcome can always still be either a CPU or a GPU.

I guess the x86 implementation is closer to the ATI's dp implementation than it is to the nv's present dp implementation..
That's what I would figure too and I don't think LRB is in realtime more than theoretical SP=1/4 DP as on today's Radeons (whereby if I recall correctly reality is more in the 1/5 ballpark but that's besides the point).

I haven't digged into what Intel is claiming about double precision performance, but if they mention anywhere something along the line like "half the rate" if that should imply half the vector rate, it's more likely that the result is rather as speculated in the former paragraph.
 
Last edited by a moderator:
Unless of course you can use LRB in a future system w/o a CPU and the unit can take over generel processing as well as graphics.
Is it unlikely? I think LRB would be quite good solution for a game console. Intel wasn't very successful in this segment (I can remember only Intel's CPU in Xbox 1) and LRB could change it...
 
Is it unlikely? I think LRB would be quite good solution for a game console. Intel wasn't very successful in this segment (I can remember only Intel's CPU in Xbox 1) and LRB could change it...

Especially that is highly unlikely since afaik console manufacturers aren't interested due to too high power consumption for the performance they're targeting.
 
Status
Not open for further replies.
Back
Top