Nvidia GT300 core: Speculation

DegustatoR · Feb 10, 2009

CarstenS said:
Honestly, that would make the chip too big.

There are other ways of doing this...
Maybe not at full speed but via the same FP32 units which would generally be faster and more efficient then what we have now in GT200.

Arun · Feb 10, 2009

DegustatoR said:
There are other ways of doing this...
Maybe not at full speed but via the same FP32 units which would generally be faster and more efficient then what we have now in GT200.

John Nickolls of NVIDIA insisted to me back in June (wrt Tesla) that their approach was very efficient - of course, that's what you'd expect them to say. However I could believe that 1xDP+8xSP where SP is a true 24-bit mantissa unit is less costly than 8xSP with a loop mechanism for DP where the SP is 27-bit mantissa as required for the DP, if not 32-bit.

Of course, given that RV770 has a 32-bit mantissa (!!) and single-cycle INT32 MULs for every unit, the devil is in the details... Presumably the point is that the fewer the number of units, the better the job the backend guys are likely to do. NV seems to have forgotten that on many levels in the 65nm generation.

3dilettante · Feb 10, 2009

Arun said:
Of course, given that RV770 has a 32-bit mantissa (!!) and single-cycle INT32 MULs for every unit, the devil is in the details...

Can you confirm this?
I was under the impression that INT MUL was still relegated to the fat ALU.

CarstenS · Feb 10, 2009

DegustatoR said:
There are other ways of doing this...
Maybe not at full speed but via the same FP32 units which would generally be faster and more efficient then what we have now in GT200.

That would not be what I'd call "native", but point taken.

Jawed · Feb 10, 2009

3dilettante said:
Can you confirm this?
I was under the impression that INT MUL was still relegated to the fat ALU.

Yeah.

http://forum.beyond3d.com/showpost.php?p=1260895&postcount=67

Int4 MAD parallel is ~1/5 of float4 MAD parallel.

Jawed

Jawed · Feb 10, 2009

Arun said:
John Nickolls of NVIDIA insisted to me back in June (wrt Tesla) that their approach was very efficient - of course, that's what you'd expect them to say. However I could believe that 1xDP+8xSP where SP is a true 24-bit mantissa unit is less costly than 8xSP with a loop mechanism for DP where the SP is 27-bit mantissa as required for the DP, if not 32-bit.

I think the key point is that NVidia's implementation of DP is far richer than the basic approach AMD's taken. All that richness costs, but presumably makes use of the existing register-file etc. infrastructure, so is relatively efficient and being slow (because it's not used much) makes it even more efficient.

Jawed

Blackraven · Feb 15, 2009

Any idea on what we can expect for the GT300 series (as well as tech specs & such)?

Well, if I were to guess, since ATI has a 2 gig video card (4870 X2), then I expect the GT300 flagship to do the same and also have GDDR5 memory USING a 40 nm manufacturing process node.

Just a hunch

P.S.
Can we also expect the GT300 to have less heat and power consumption over the ATI 4870 X2??? (not that it matters that much to enthusiasts but still......)

trinibwoy · Feb 15, 2009

Crossfire and SLI setups do not double your memory size. The 4870X2 has the same effective memory size as a 1GB card. And yes, GT300 should have lower power consumption than today's dual-GPU setups.

Domell · Feb 16, 2009

GT300 expectation? Well, i would like to see another revolution with going from GT2xx to GT3xx like G7x-->G80 was. So 50-100% performance bump in real world computations over the fastest GPUs of today.

About specs i have no idea what architecture changes will happen so i can`t put any numbers of SP, ROPs etc.

Moreover i don`t think NVIDIA will do the same mistake like with GT200 and GT300 won`t be as big as GT200 has been. It probably be a big chip but i expect die size around 400-450 mm^2 at worst in 40nm process but i wouldn`t be surprised if it will have below 400mm^2.

The last thing i would like to say is that maybe NVIDIA is going well with GT300 and maybe (and i hope so) we will se it earlier than we expect? Why? In a last few weeks we could read that GT212 is probably canceled. Maybe they have decided to cancel GT212 (IF it`s true of course) because of GT300?

spacemonkey · Feb 17, 2009

Are the details of Nvidia's current DP implementation published anywhere? All I could find is this summary from one of cptn. Kirk's slideshows...

Ailuros · Feb 17, 2009

Domell said:
GT300 expectation? Well, i would like to see another revolution with going from GT2xx to GT3xx like G7x-->G80 was. So 50-100% performance bump in real world computations over the fastest GPUs of today.

I'll mark that one for the paragraphs to come.

About specs i have no idea what architecture changes will happen so i can`t put any numbers of SP, ROPs etc.

Assuming the 1st paragraph is in the right direction, it would be wise to not make any estimates on unit counts at all before defining what each unit could be capable or where it has vanished to.

Moreover i don`t think NVIDIA will do the same mistake like with GT200 and GT300 won`t be as big as GT200 has been. It probably be a big chip but i expect die size around 400-450 mm^2 at worst in 40nm process but i wouldn`t be surprised if it will have below 400mm^2.

Below 400sqmm doesn't sound to me like something that would have a chance for such performance increases as above. Compared to G70 the increase in transistors for G80 was over 2x times (and obviously a lot more compared to G71). Unless NV has started years ago a design from absolute 0 and designed every transistor from scratch (highly unlikely), any significant performance increase (which is usual - minus exceptions - for each new technology generation) will not come for free.

The last thing i would like to say is that maybe NVIDIA is going well with GT300 and maybe (and i hope so) we will se it earlier than we expect? Why? In a last few weeks we could read that GT212 is probably canceled. Maybe they have decided to cancel GT212 (IF it`s true of course) because of GT300?

If there's a problem with high complexity chips and 40nm it doesn't sound likely for the next generation to arrive earlier than planned.

Domell · Feb 17, 2009

OK but as you know between G7x and G8x there were major architecture changes because of Unified Shader. I think that between GT300 and GT200 won`t be such a big architectural changes so there is a chance that number of transistor won`t increase so much like from G70 to G80. That`s why i think die size won`t be as big as GT200 has, more likely it will be G80/GT200B die size or even smaller. Whatever they say, they know that this isn`t right way to make huge chips (GT200 problems give you answers). IMO 400-450 mm^2 is a critical size for GPUs.

rpg.314 · Feb 17, 2009

Nvidia would be amazingly dumb to produce gt300 that weighs 576mm2. will they make a gtx285 sized chip, probably somewhat smaller.

CarstenS · Feb 17, 2009

What if they don't have a choice?

DegustatoR · Feb 17, 2009

Domell said:
Moreover i don`t think NVIDIA will do the same mistake like with GT200 and GT300 won`t be as big as GT200 has been.

I don't see what's wrong with GT200's size. The problem of GT200 is in performance, not size.

Ailuros · Feb 17, 2009

Domell said:
OK but as you know between G7x and G8x there were major architecture changes because of Unified Shader.

G80 had reasonable changes that could have appeared even if it wouldn't have been a USC. As just two examples de-coupling texturing from the ALUs as well as organizing SIMDs to be connected between them as MIMD. Just a few things that were present in Radeons before ATI's D3D10 GPUs.

I think that between GT300 and GT200 won`t be such a big architectural changes so there is a chance that number of transistor won`t increase so much like from G70 to G80.

The performance increase between G8x and G7x was up to >3x in total. If you now want to speculate at least in the twice the performance direction between GT3x0 and GT200 I have the feeling that it won't be possible to reach that goal w/o doubling transistor count. IHVs don't have magic wands on the other hand and you can easily compare transistor counts from past architectures and their relative performance increase.

That`s why i think die size won`t be as big as GT200 has, more likely it will be G80/GT200B die size or even smaller. Whatever they say, they know that this isn`t right way to make huge chips (GT200 problems give you answers). IMO 400-450 mm^2 is a critical size for GPUs.

If NVIDIA wanted or wants to avoid "GT200 problems" they would or should have abandoned the monolithic single high end chip strategy. The question is have they?

***edit:

DegustatoR said:
I don't see what's wrong with GT200's size. The problem of GT200 is in performance, not size.

Yes and no since the performance/mm2 ratio rather stinks unlike the performance/Watt ratio.

Jawed · Feb 17, 2009

DegustatoR said:
I don't see what's wrong with GT200's size. The problem of GT200 is in performance, not size.

The performance per TMU and ROP is pretty poor so there's an opportunity there to make some radical space savings.

NVidia could well wow us with a "density increment" similar to that seen with RV770 over RV670. I doubt the ALUs will offer-up much of a gain, but simply increasing ALU:TEX will provide a "free" density increment.

Jawed

Ailuros · Feb 17, 2009

Depends what capabilities you'd speculate each ALU is really capable of. No one seems to have made up his mind yet whether they'go for SIMD or MIMD (or some other funky variation starting with an 'M'), there's the dilemma where they've put the programmable tesselation functionalities exactly and there's nothing I've read or heard so far that suggests that they'll keep the DP units separate for the next generation or incorporate them in the existing pipeline....etc.

Any comparison with RV670 & RV770 doesn't make sense to me, considering the added and removed functionalities of the latter both compared to the 'mothership' R600.

My gut feeling is that for the D3D11 generation and since from what it sounds like AMD and NVIDIA intend to manufacture at 40nm, that the perf/mm2 advantage might not be as high as compared to RV770@55nm vs. GT200@65nm for AMD, unless of course AMD is not aiming for roughly twice the performance with a new technology generation which afaik is typical for IHVs.

DegustatoR · Feb 17, 2009

Ailuros said:
Yes and no since the performance/mm2 ratio rather stinks unlike the performance/Watt ratio.

That's the performance problem, size has nothing to do with it.

Jawed said:
NVidia could well wow us with a "density increment" similar to that seen with RV770 over RV670. I doubt the ALUs will offer-up much of a gain, but simply increasing ALU:TEX will provide a "free" density increment.

Yes, but that doesn't mean that they can't wow us double time not only in density but in density+size =)
40nm will allow them to put 2+ times more units in the chip (these units sure can be different from G8x architecture -- and i hope that they'll be more efficient and smarter of course) alone. Combined with density increase GT300 should be able to be at least 3 times faster than GT200. I'd prefer 4+ times myself =)

CarstenS · Feb 17, 2009

An astonshingly simple thought just crossed my mind. What is it that keeps a fixed TMU-count of 8 per TPC? A reduction to 4 per TPC while doubling TPC-count would IMO lead to a nice increase in performance (if they eliminate some other bottlenecks on the way).

In addition to a much smaller ROP-area you could get a ~480 SP GT300 with 80 TMUs, 32 ROPs IMO down to just above 420mm².

Or is there anything absolutely requiring 8 TMUs per TPC which i have missed?

Nvidia GT300 core: Speculation

DegustatoR

Arun

Unknown.

3dilettante

CarstenS

Moderator

Jawed

Jawed

Blackraven

trinibwoy

Meh

Domell

spacemonkey

Ailuros

Epsilon plus three

Domell

rpg.314

CarstenS

Moderator

DegustatoR

Ailuros

Epsilon plus three

Jawed

Ailuros

Epsilon plus three

DegustatoR

CarstenS

Moderator

Similar threads