PDA

View Full Version : ELSA hints GT206 and GT212


Pages : 1 [2] 3 4

Arun
20-Oct-2008, 12:57
DegustatoR: And so does G92b AFAICT... what's your point? :) Of course, that doesn't answer the GT206 mystery...

BTW, tentative GT21x line-up possibilities:
1T|40A|1R -> 0.2TFlops+ -> GT218/???
3T|120A|3R -> 0.6TFlops+ -> GT216/Late March
6T|240A|6R -> 1.2TFlops+ -> GT214/Early May
12T|480A|12R -> 2.8TFlops+ -> GT212/Late June

OR

1T|32A|1R
4T|128A|3R
8T|256A|6R
16T|512A|12R

OR

2T|48A|2R
4T|128A|3R
8T|256A|6R
12T|480A|12R

OR

...

In the first possibility, the ALU ratio might seem high until you add this little catch.
GT212 ALUs: 8 MADDs, 8 MULs/2SFU/2DP
GT214+ ALUs: 8 MADDs, 4 MULs/1SFU/1DP

David Kirk said very explicitly in one of his uni courses that they could fiddle with the MADD vs SFU ratio as they saw fit, and as the ALU-TMU ratio increases it makes a lot of sense to reduce it in my mind. This would also be fairly simple if you truly tied the SFU & DP units would only result in completely negligible limitations. This would also result in the MUL not being wasted *at all* in graphics, since for register file access reasons I won't go into here it is not realistic to expect more than half a MUL to be exposed anyway.

I also think the first scenario is more likely because it is likely more practical for them to tweak that rather than the SP-TMU ratio. As for why they'd make such big one-generation jumps in the ALU ratio, remember it's much easier to be pad-limited on 40nm for a given amount of memory bandwidth, and bandwidth requirements obviously scale faster with TMUs/ROPs than ALUs. A wider memory bus also increases the ratio of non-digital functionality which doesn't scale, and that's obviously not a good thing.

As usual, I expect to be horribly wrong here and to be massively disappointed by NVIDIA's frequent failure to come up with a coherent roadmap and by their fundamental misunderstanding of the difference between gross margins and gross profit. Oh well! ;)

DegustatoR
21-Oct-2008, 06:56
DegustatoR: And so does G92b AFAICT... what's your point? :) Of course, that doesn't answer the GT206 mystery...
The point is that it's the same chip -)
It could be "GT200b", sure, but it's still 10 24/8 TPCs, 512-bit bus whatever you want to call it. Otherwise it would have another device id in the drivers.

BTW, tentative GT21x line-up possibilities:
1T|40A|1R -> 0.2TFlops+ -> GT218/???
3T|120A|3R -> 0.6TFlops+ -> GT216/Late March
6T|240A|6R -> 1.2TFlops+ -> GT214/Early May
12T|480A|12R -> 2.8TFlops+ -> GT212/Late June
I think you're running a bit ahead of time =)

From my point of view if NV wants to be competitive with their GT21x parts (presuming that's GT20x architecture on 40nm) they'll need to do some rethinking and rebalancing of G8x architecture. Otherwise they'll end up being slower with the same complexity or even with more transistors.
I don't think that we'll see the return of 384-bit bus in GT21x chips -- 256-bit GDDR5 should be enough for them.

GT216 is probably a G94 replacement (128-bit GDDR5, 128 SP / 32 TMU?), GT212 is a GT200 replacement (256-bit GDDR5, 384 SP / 96 TMU?) and GT206 is a G92b/GT200 replacement for 9800GTX+/GTX260 parts (55nm, 256-bit GDDR5, 192 SP / 64 TMU?). Plus GT200b which should replace GTX280 and add GTX290 on top of the line.

But as i've said they'd probably want to do something with their SMs in GT21x parts otherwise they'll end up slower per transistor than RV8x0 line. Plus they need to fix AA and add 10.1 support maybe?

In the first possibility, the ALU ratio might seem high until you add this little catch.
GT212 ALUs: 8 MADDs, 8 MULs/2SFU/2DP
GT214+ ALUs: 8 MADDs, 4 MULs/1SFU/1DP
I really, REALLY don't think that going forward with seperate DP units is the right thing to do. They may do this for the top-end part to be used in Teslas but for anything below that they'll probably go with version 1.2 CUDA compute capability without DP support.

igg
21-Oct-2008, 12:47
NordicHardware has a new article about GT206/212/216 (http://www.nordichardware.com/news,8261.html):
You have all heard about the GT206, also known as GT200b in the press. As we predicted back in May, this would be a part of the Fall/Winter refresh and it is due any day now. Specifications are of course sparse but expect a tweaked GT200 core running cooler and faster, it's all good.
After GT206, there is a GT212 core on roadmaps. There are rumors that this will be the first NVIDIA chip with GDDR5, but I wouldn't bet on it. It should arrive in Q2 2009.
Comment: According to Elsa slides GT212 will be produced in 40 nm already.
Sources has informed us of another chips that is in the works. NVIDIA is also working on GT216 which is said to be the first chip to reach the market using TSMC's 40nm process. It will go up against AMD's RV870 chip and should hit the market at about the same time. That time is late Q2 2009, or early Q3. There are rumors that this chip will be DirectX 11 compatible, which would put it even with RV870 in terms of DX support.
However, Arun kind of disagrees (http://forum.beyond3d.com/showpost.php?p=1229112&postcount=173) in his post:
Nice, that cens article is the first one I've seen mention this correctly, although they got most of their other facts wrong: GT216 will be the first NVIDIA 40nm GPU. However, it won't replace GT200, and it should be out in late Q1, not late Q2 (the latter is actually for the chip replacing GT200, so they likely just got some stuff confused). To get back on topic, presumably the roadmap calls for AMD's first 40nm chip to come out before that GT200 replacement too but after GT216, however who the hell knows at this point.

Arun
21-Oct-2008, 15:12
The point is that it's the same chip -)
It could be "GT200b", sure, but it's still 10 24/8 TPCs, 512-bit bus whatever you want to call it. Otherwise it would have another device id in the drivers.Yes, it's still the same thing fundamentally, no reason it to have major changes. Whether GT200b and GT206 are the same thing is another debate completely, I still think it's more likely that they are not and GT206 is an ultra-low-end chip aimed at replacing G98/G86 in the Montevina Refresh timeframe but we'll see.

I think you're running a bit ahead of time =)Is that really a problem? :)

From my point of view if NV wants to be competitive with their GT21x parts (presuming that's GT20x architecture on 40nm) they'll need to do some rethinking and rebalancing of G8x architecture. Otherwise they'll end up being slower with the same complexity or even with more transistors.Okay let me put it this way: NVIDIA's perf/[transistor*mhz] is quite fine. What isn't fine is their transistor density and their clock speeds; the latter is in part because of the monstruous size of the chip which causes variability issues, but the former is very much both a failure *and* a design decision.

Part of their goal very very likely was to improve yields by reducing density, but I am skeptical they've got above-average yields for a chip of that size. I think they either screwed up at the implementation level or just overestimated the effect of density on variability (which clearly is still a big problem given GT200's awful clocks) and thus missed their original clock targets.

AMD's approach is to have a much denser but also more regular layout, combined with fine-grained redundancy to reduce the average impact of defects. Seemingly practice has proven their approach superior (although obviously they can't apply it universally). I would also be very interested in knowing the leakage vs performance characteristics of NVIDIA's gates versus ATI's; I quite suspect NV is lower on the performance curve, which is the kind of thing that seems to make sense on paper but might not in practice.

I don't think that we'll see the return of 384-bit bus in GT21x chips -- 256-bit GDDR5 should be enough for them.It seems GT212 might be the only GDDR5 chip, unfortunately.

But as i've said they'd probably want to do something with their SMs in GT21x parts otherwise they'll end up slower per transistor than RV8x0 line. Plus they need to fix AA and add 10.1 support maybe?Yes, AA & 10.1 would be a good thing on GT21x (which doesn't support DX11 AFAIK, based on my parsing of public statements from Michael Hara of Investor Relations). Regarding SM density, I think the RTL itself is fine, it's more of a density issue. I also think my proposed half-SFU solution would be a good way to improve perf/mm˛ slightly.

I really, REALLY don't think that going forward with seperate DP units is the right thing to do. They may do this for the top-end part to be used in Teslas but for anything below that they'll probably go with version 1.2 CUDA compute capability without DP support.In the grand scheme of things, a single FP64 MADD unit is pretty damn cheap. And changing your 24x24 MADD units into 27x27 or 32x32 ones isn't free either, so for basic DP support along with proper denormal support etc. this isn't such an awful solution. I agree however that there is no good reason to keep it on the low-end parts such as GT218, and I wouldn't be surprised if their approach changed in the DX11 generation anyway (which is where most of the R&D dollars are right now obviously).

DegustatoR
21-Oct-2008, 15:36
GT206 is an ultra-low-end chip aimed at replacing G98/G86 in the Montevina Refresh timeframe but we'll see
Why would they want to replace ultra-low-end now?

Okay let me put it this way: NVIDIA's perf/[transistor*mhz] is quite fine. What isn't fine is their transistor density and their clock speeds; the latter is in part because of the monstruous size of the chip which causes variability issues, but the former is very much both a failure *and* a design decision.
Even with RV770 transistor density GT200 would still have 530mm^2 die size @65nm and ~450mm^2 @55nm.
Considering that RV770 is dangerously close to GT200 in performance i'd say that they definately have an issue with their perf/transistor ratio right now.
But maybe it's an issue of GT200 more than an issue of G8x architecture.

It seems GT212 might be the only GDDR5 chip, unfortunately.
Well, if GT206 isn't a mainstream part then the next candidate is GT212, yeah.

Yes, AA & 10.1 would be a good thing on GT21x (which doesn't support DX11 AFAIK, based on my parsing of public statements from Michael Hara of Investor Relations).
The thing is that if GT212 (GT216 is probably a low-end chip being the first on 40nm) will be out in 2Q09 then there's no point for them to support anything less than DX11 in it. I even think that they probably should scrap any hi-end part they have planned in the GT21x line and use it as a guinea pig for 40nm process while bringing GT30x DX11 stuff closer.
They're late on almost every front and it's time to do some roadmap rearranging imho.

Jawed
21-Oct-2008, 22:11
AMD's approach is to have a much denser but also more regular layout,
What does "more regular" mean? What's the advantage? What's more regular?

Jawed

Jawed
21-Oct-2008, 22:14
Even with RV770 transistor density GT200 would still have 530mm^2 die size @65nm and ~450mm^2 @55nm.
Those sizes sound too large comparing 954M versus 1.4B transistors. So, how did you work that out?

Jawed

marllt2
22-Oct-2008, 03:52
Yes, AA & 10.1 would be a good thing on GT21x (which doesn't support DX11 AFAIK, based on my parsing of public statements from Michael Hara of Investor Relations).

Could you remind us what Mr Hara said about that please ?

DegustatoR
22-Oct-2008, 06:21
Those sizes sound too large comparing 954M versus 1.4B transistors. So, how did you work that out?
Yeah, my math is probably wrong there.
530 is correct, but at 55nm it would be 380mm^2 not 450.
Still pretty big compared to RV770 considering their performance figures.

Arun
22-Oct-2008, 16:18
Why would they want to replace ultra-low-end now?Because G98 is a POS and they need to compete against RV710?

Even with RV770 transistor density GT200 would still have 530mm^2 die size @65nm and ~450mm^2 @55nm.What? If we exclude I/O & Analogue, I think it's pretty clear that ~260 * (1400+ / 965) = 380mm˛+ on 55nm. This could be combined with 384-bit GDDR5 and higher clocks, which would result in similar perf/mm˛ (or, more accurately, similar perf/mm˛ to a hypothetical ATI part with the same performance target!)
[Pre-Publish EDIT: Oh, just noticed you corrected that yourself now, okay then]

Considering that RV770 is dangerously close to GT200 in performance i'd say that they definately have an issue with their perf/transistor ratio right now.Notice that I said [transistor*mhz]... G92b can reach clocks very near HD4870, so I feel it's fair to say that's not a bad metric to consider.

But maybe it's an issue of GT200 more than an issue of G8x architecture.My point is it's an issue of synthesis, not the actual RTL-level architecture (although the ALU-TEX ratio and the choice to stick to GDDR5 obviously don't help perf/mm˛ much either).

The thing is that if GT212 (GT216 is probably a low-end chip being the first on 40nm) will be out in 2Q09 then there's no point for them to support anything less than DX11 in it. I even think that they probably should scrap any hi-end part they have planned in the GT21x line and use it as a guinea pig for 40nm process while bringing GT30x DX11 stuff closer.Good idea: let's quadruple risk for a company that badly needs to improve their position and can't afford any more screw ups! Given how a certain semi-risky decision on G96/G98 turned out, I'm sure Jen-Hsun will LOVE that idea! :p
They're late on almost every front and it's time to do some roadmap rearranging imho.It's easy to forget that moving boxes on pieces of paper doesn't allow you to change reality. If your kind of strategy was pursued, NVIDIA could have canned G71/G72/G73 since G80 was originally scheduled to come up in a very similar timeframe. But new architectures are very prone to delays, and that kind of risk is absolutely senseless IMO. I think their current roadmap is pretty much as follows:
Q1: First DX10 or DX10.1 40nm chip (mid-range).
Q2: Other DX10 or DX10.1 40nm chips (family).
Q3: First DX11 40nm chip (ultra-high-end).
Q4: Other DX11 40nm chips (family).

It's perfectly plausible that each of these steps gets delayed by one quarter or more though, and it is not predictable which will be most delayed. By trying to get DX11 out of the door quicker, you just risk not having a competitive product in the market for an extra 6 months. These teams are already parallel and we're near enough tape-out even for DX11 that more people suddenly dedicated to the project just risks delaying everything, so I'm not sure I see the point.

Could you remind us what Mr Hara said about that please ?He indicated 40nm would be in H1 while the new arch would be in H2.

What does "more regular" mean? What's the advantage? What's more regular?I wasn't thinking specifically of this company or that approach, but this is not a bad start to see the kind of thing I mean: http://www.tela-inc.com/ - what I find particularly cool with Tela's tech, BTW, is if you can reduce leakage you can use transistors higher on the performance-leakage curve, which also means you can improve your perf/mm˛ more than the raw density impact of the approach! :)

Jawed
22-Oct-2008, 18:15
It's perfectly plausible that each of these steps gets delayed by one quarter or more though, and it is not predictable which will be most delayed.
It seems 40nm has already been delayed. 65nm was a bit sickly at birth for the IHVs - it seems to have affected NVidia more.

Why have 65nm/55nm apparently given NVidia so much grief, comparatively speaking?

I wasn't thinking specifically of this company or that approach, but this is not a bad start to see the kind of thing I mean: http://www.tela-inc.com/
So you're saying that AMD is maybe using a library of regularised, "guaranteed compatible" structures as opposed to the "semi- or fully-custom" design that NVidia is using?

Jawed

Jawed
26-Oct-2008, 23:31
http://translate.google.com/translate?u=http%3A%2F%2Fwww.hardware-infos.com%2Fnews.php%3Fnews%3D2473&sl=de&tl=en&hl=en&ie=UTF-8
GT206: Q4/2008, 55 nm
GT212: Q1/2009, 40 nm
GT216: Q2/2009, 40 nm
GT300: Q4/2009, 40 nm
Jawed

suryad
28-Oct-2008, 13:25
http://translate.google.com/translate?u=http%3A%2F%2Fwww.hardware-infos.com%2Fnews.php%3Fnews%3D2473&sl=de&tl=en&hl=en&ie=UTF-8
GT206: Q4/2008, 55 nm
GT212: Q1/2009, 40 nm
GT216: Q2/2009, 40 nm
GT300: Q4/2009, 40 nm
Jawed

Thanks Jawed. Looks like I am holding off for the forseeable future to upgrade my graphics card until my will to resist gives in.

igg
28-Oct-2008, 15:47
Based on the inofficial roadmap I think the GTX260 (maybe GT206->GTX270) should be the best bang for the buck until they introduce GT300.

Lets hope they'll announce GTX270/290 soon :)

rpg.314
28-Oct-2008, 16:09
http://translate.google.com/translate?u=http%3A%2F%2Fwww.hardware-infos.com%2Fnews.php%3Fnews%3D2473&sl=de&tl=en&hl=en&ie=UTF-8
GT206: Q4/2008, 55 nm
GT212: Q1/2009, 40 nm
GT216: Q2/2009, 40 nm
GT300: Q4/2009, 40 nm
Jawed

thanks for that jawed.

That seems to make a lot of sense. Arun broadly agreed with it as well. Though I am surprised that gt206 has been delayed so much.

As an aside, I wonder what other folks at these forums feel, but I think that AMD's approach here has been vindicated. One chip and they can launch 4 cards from $200 to top. (if they wanted to release 4850x2 with gddr5 that is) plus the yield salvage 4830. It could be a one off thing (as gt200 was delayed), but for 4-5 months nv has had nothing to hide behind and AMD's entire line up(next gen that is) from v cheap to v costly, is out in the market.

Cookie Monster
28-Oct-2008, 21:12
So many refreshes.

Possibly the GT216 is the 40nm version of GT206. What is GT212 then? possibly a 192SP/384bit (or 256bit) mid range card to replace all the "filler" G92b based cards?

Its strange that nothing has been leaked from the green camp.

igg
30-Oct-2008, 16:37
According to Fud:

Unknown 40 nm chip in March (http://www.fudzilla.com/index.php?option=com_content&task=view&id=10206&Itemid=34)
DualGPU card in December (http://www.fudzilla.com/index.php?option=com_content&task=view&id=10207&Itemid=34)

My guess is the DualGPU card will be based on two GT206.

Oushi
31-Oct-2008, 08:50
http://rumorfeed.blogspot.com/2008/10/ex-nvidia-employee-tells-all-news-on.html

I think GT212 and GT300 is too late !

Arun
31-Oct-2008, 09:55
http://rumorfeed.blogspot.com/2008/10/ex-nvidia-employee-tells-all-news-on.html
I think GT212 and GT300 is too late !And that 'leak' is too false. (note that I'm not contesting the timeframes; delays can happen, but AFAIK that's clearly NOT their current roadmap, and at least one of GT212 or GT216 doesn't have GDDR5. Everything in that leak is so fishy it's ridiculous!)
igg: March should be GT216 if I'm right, see above: "3T|120A|3R -> 0.6TFlops+ -> GT216/Late March"

igg
31-Oct-2008, 11:05
March should be GT216 if I'm right, see above: "3T|120A|3R -> 0.6TFlops+ -> GT216/Late March"
0,6 TF, this should be the performance/mainstream chip, right?

Arun
31-Oct-2008, 12:03
Yes, I presume the idea would be to replace G94 with G92-like performance (probably better in some ALU-limited cases, worse in others especially with AA/AF). If those specs are right that is; either way, it won't be ultra-high-end, that at least I know for sure from more than one source :)

CarstenS
31-Oct-2008, 18:10
I suspect it's also the foundation of the Quadro CX (http://www.nvidia.com/object/product_quadro_cx_us.html). In fact, I wouldn't be surprised if that was the exact same GPU bin as for a potential GX2... (384-bit & 150W TDP are pretty strong hints towards that) - too bad the clocks aren't public, if shaders were at >1300MHz we'd have our answer...
Oh, forget about it - didn't look at the date of that posting. Since it's >10 days old, pls disregard the following.
Two things in your very own evidence might contradict that: First, the very low memory bw, indicating the use of 800 MHz GDDR3 and an additional disable Quad-ROP/Rop-Partition whatever you prefer.

Second, the Tesla C1060, featuring a fully armed and operational battle station... the force runs equally strong with this one... *shush darth, shut up*. What I mean is, that this produkt is also quite a bit lower specced than the corresponding desktop part GTX280 (<200 vs 236 Watts), although it's carrying the same amount of functional units, as opposed to quadro CX.

Domell
02-Nov-2008, 08:36
And that 'leak' is too false. (note that I'm not contesting the timeframes; delays can happen, but AFAIK that's clearly NOT their current roadmap, and at least one of GT212 or GT216 doesn't have GDDR5. Everything in that leak is so fishy it's ridiculous!)
igg: March should be GT216 if I'm right, see above: "3T|120A|3R -> 0.6TFlops+ -> GT216/Late March"

But there is NO GT214 and GT218 on their roadmap at now. There are only GT212 and GT216 which are supposed to be released next year (most likely Q2).
...and doesn`t possible GT212 specs sounds more sensible when we take something like this -
96TMU/384SP/32ROP or 80TMU/320SP/32ROP? I think 480SP or 512SP isn`t good choice. 384SP or 320SP should be enough when clocks will be about 2Ghz. I think in 40nmit`s possible.
Another way is not increase numbers of TMUs and ROPs but significantly increase number of Shader Processors.

Jawed
02-Nov-2008, 12:28
Anyone any idea what kind of shrinkage NVidia will get with 40nm process, when compared against either 65nm or 55nm?

The reason I ask is that Arun thinks that NVidia is not squishing features as closely as possible - preferring to space them out. If that methodology is kept for 40nm, what kind of shrink will occur?

Another thing is that Windows 7 makes D3D10.1 a first class citizen for the desktop UI. Does this increase the likelihood that NVidia will be introducing a top-to-bottom 10.1 line-up before the D3D11 cards arrive?

Jawed

Arun
02-Nov-2008, 12:57
Two things in your very own evidence might contradict that: First, the very low memory bw, indicating the use of 800 MHz GDDR3 and an additional disable Quad-ROP/Rop-Partition whatever you prefer.The S1070 also has 800MHz GDDR3, but it really uses the same memory chips as the GTX 280; it's just clocked down to improve reliability. Sorry for not disregarding that point, couldn't just let it pass! ;)

But there is NO GT214 and GT218 on their roadmap at now. There are only GT212 and GT216 which are supposed to be released next year (most likely Q2).GT212/GT214/GT216 all very much do exist and are on NVIDIA's roadmap. GT218, I haven't heard in a while but I wouldn't really expect it before the others anyway. Surely you don't think the public leaks always perfectly represent NVIDIA's internal roadmap?

Another way is not increase numbers of TMUs and ROPs but significantly increase number of Shader Processors.Isn't that exactly what I proposed? :grin:
G94: 32 TMUs, 16 ROPs, 64 SPs
G92: 64 TMUs, 16 ROPs, 128 SPs
GT214: 24 TMUs, 12 ROPs, 120 SPs

Anyone any idea what kind of shrinkage NVidia will get with 40nm process, when compared against either 65nm or 55nm?2x compared to 55nm, excluding non-digital stuff which should shrink very little, is a fair bet. That's assuming a slight increase in transistors/mm˛ in addition to the process' natural shrink (so as to compensate the lower SRAM shrink), which seems like a reasonable bet to make in my mind. Of course, if the feature set/arch isn't 100% the same or if they optimized noticeably more for power (as I've suggest everyone *should* do on 40nm) it's harder to estimate.

Honestly, I'm more interested in the kinds of clocks they could achieve. Obviously the 90->65/55 transition for NV has been awful both in terms of density *and* performance, so if we assume some of those are fixable internal issues and 40nm allows for higher-than-traditional improvements at the same time (although who knows at what cost), it could get interesting. Not that the same (i.e. interesting) isn't true for AMD also, of course! :)

The reason I ask is that Arun thinks that NVidia is not squishing features as closely as possible - preferring to space them out. If that methodology is kept for 40nm, what kind of shrink will occur?Just to be clear, it's certainly not the only factor; I'm just arguing it's very likely to be one of them. Whether the voluntary desire to have that is the largest one or the smallest one, who knows!

Another thing is that Windows 7 makes D3D10.1 a first class citizen for the desktop UI. Does this increase the likelihood that NVidia will be introducing a top-to-bottom 10.1 line-up before the D3D11 cards arrive?I've heard the possibility that GT21x is D3D10.1 a few times, but who knows, there's enough FUD flying around that I don't think it makes a lot of sense to speculate about it at this point.

Blazkowicz
03-Nov-2008, 00:52
no D3D 10.1 would be the safer bet? as we are talking about another generation of G80 derivatives.

CarstenS
03-Nov-2008, 08:22
The S1070 also has 800MHz GDDR3, but it really uses the same memory chips as the GTX 280; it's just clocked down to improve reliability. Sorry for not disregarding that point, couldn't just let it pass! ;)
Really - not even the option for lower voltage's being used?

Arun
03-Nov-2008, 09:50
Really - not even the option for lower voltage's being used?I really don't know, but I would *presume* it to be a combination of both; lower voltage than max in order to improve lifetime, and lower clocks for that given voltage in order to improve reliability.

bowman
03-Nov-2008, 14:13
DDR3 is catching up to GDDR3 now - seeing as they are sacrificing bandwidth for reliability anyway, why not roll GPUs with DDR3 IMCs that support ECC for Tesla? Seems to me that's one of the biggest complaints to GPGPU, no ECC, GPUs are intended for 'sloppy' operations (real-time rasterization) etc.

Wouldn't this open it up to a bigger market?

Jawed
04-Nov-2008, 15:18
Another thing is that Windows 7 makes D3D10.1 a first class citizen for the desktop UI. Does this increase the likelihood that NVidia will be introducing a top-to-bottom 10.1 line-up before the D3D11 cards arrive?
Allison Klein in her PDC presentation very briefly said that it's D3D10 not 10.1.

Jawed

Kaotik
06-Nov-2008, 15:46
Allison Klein in her PDC presentation very briefly said that it's D3D10 not 10.1.

Jawed

http://www.microsoft.com/whdc/device/display/GraphicsGuide_Win7.mspx
Microsoft released a paper regarding graphics in Windows7 which confirms that the desktop is D3D10, but it also states that 10.1 or 11 support is required for "best windows 7 experience"

Blazkowicz
08-Nov-2008, 03:52
it's irrelevant!
who really gives a shit about that 3D desktop? DX9 is more than enough, and 3D desktops available on gnu-linux do more with DX7-level hardware.

Kaotik
08-Nov-2008, 06:02
it's irrelevant!
who really gives a shit about that 3D desktop? DX9 is more than enough, and 3D desktops available on gnu-linux do more with DX7-level hardware.

But DX10 (10.1, 11, who knows) might be a bit lighter load than DX9 based one. Also, the OGL desktop comparison is irrelevant, who ever said that for example Aero couldn't be done on a bloody DX5 card if they wanted to, it's a matter of is there point to do it so, or is it easier, lighter and what not when you use the capabilities of modern day cards (which would translate those DX7 or 5 or whatever calls to be done on shader units anyway, OGL is another thing of course, but same units would still do the calculations)
And do more? A rolling cube? Didn't you just say who gives a shit about 3D desktop? :roll: "can be done" and "should be done" are two different things, what makes you think one couldn't twist Aero to do that damn fancy cube with virtual desktops on all sides? (in fact, i think there was some project like this already somewhere?)

Kaotik
09-Nov-2008, 00:36
Allison Klein in her PDC presentation very briefly said that it's D3D10 not 10.1.

Jawed

http://www.istartedsomething.com/20081029/windows-7-dwm-cuts-memory-consumption-by-50/
http://farm4.static.flickr.com/3238/2983005443_dc5b1cf034.jpg

Jawed
09-Nov-2008, 01:11
I watched a video a few days back, I think it was to do with the Colour Hot Tracking in the taskbar icons, and the presenter said this is only available on 10.1 - I think...

This isn't that video:

http://au.youtube.com/watch?v=DLZcGDyacHo

So I suppose 10.1 adds some eye-candy to the eye-candy and I guess it makes more efficient use of render-targets/textures which could be how the memory savings come about.

Jawed

BRiT
09-Nov-2008, 02:19
Currently in Vista all windows are double-buffered. In Windows 7, they no longer need to do the double buffering. They mentioned this during the Keynote, at least that's what I recall. I might be misremembering since I did attend a lot of PDC sessions.

Humus
09-Nov-2008, 13:51
Any word on what changes where made so that double buffering isn't needed anymore? Or why it was required in the first place?

fellix
09-Nov-2008, 15:12
I know that Apple's OSX does double-buffering of the GUI elements (windows) in both system memory and the frame buffer (RT), but some folks are arguing, that this is not the exact case in Vista Aero?!
We need more clarification on this.

DmitryKo
10-Nov-2008, 06:24
FYI, the sessions and PPX presentations are available at Channel 9.

http://channel9.msdn.com/pdc2008/

Click "Windows 7 (http://channel9.msdn.com/tags/pdc2008.windows+7/)" in the tag cloud.

DmitryKo
10-Nov-2008, 07:47
Any word on what changes where made so that double buffering isn't needed anymore? Or why it was required in the first place?

This is a DirectX/GDI interoperability issue that has been resolved in Windows 7.

http://blogs.msdn.com/greg_schechter/archive/2006/05/02/588934.aspx

Dual buffers per window - yes, it's true that GDI windows have both a system memory and a video memory representation. There is without doubt a memory cost to doing this. One obvious alternative is to simply have a video memory representation and have the GDI redirection mechanism render to that format. There are two primary problems with this. The first is that the formats are not the same, and GDI doesn't support rendering into the DirectX format. Even if that were resolved, the more fundamental issue remains. Many GDI operations (XORs, alpha blending, and text are examples) are read-modify-write operations. To do that to a native video memory surface would involve reading back from video memory into the CPU (and thus into system memory), performing the operation, and then writing back. This is typically a horribly slow and pipeline-stalling operation.


WDDM v1.1 features 2D GUI acceleration DDI, so the system memory copy can now be eliminated.

http://www.microsoft.com/whdc/device/display/GraphicsGuide_Win7.mspx
Guidelines for Graphics in Windows 7
(this paper has been temporarily removed, but I have a local copy).

N00b
10-Nov-2008, 12:24
Xbitlabs reports (http://www.xbitlabs.com/news/video/display/20081107235337_Nvidia_Reports_Successful_Transitio n_to_New_Process_Technology.html)that Nvidia has made the transition to 55nm, which may mean that we will see GT200b / GT206 / GTX 270 / GTX 290 soon.


“Improving gross margin while managing operating expenses enabled us to significantly improve our operating fundamentals. We transitioned our performance segment GPUs to 55nm and are now poised to recapture lost share,” said Jen-Hsun Huang, president and chief executive at Nvidia, during the most recent conference call with financial analysts.

“We have 65nm inventory remaining, but everything we are ramping now is 55nm and everything on the high-end that we are shipping now is 55nm,” added Mr. Huang, implying that Nvidia is ready with its code-named GT200b/GT206 graphics processor, which is a lower-cost version of the GT200/G200 that powers GeForce GTX 260 and GTX 280 graphics cards.

suryad
10-Nov-2008, 19:29
Xbitlabs reports (http://www.xbitlabs.com/news/video/display/20081107235337_Nvidia_Reports_Successful_Transitio n_to_New_Process_Technology.html)that Nvidia has made the transition to 55nm, which may mean that we will see GT200b / GT206 / GTX 270 / GTX 290 soon.

Yeah I just saw that as well. Good news but with the rumored ATI 4xxx series refresh it will possibly be status quo?

homerdog
10-Nov-2008, 21:29
Latest Quadro FX graphics card has 4GB of RAM (http://techreport.com/discussions.x/15866)

Quadro FX 5800
52 billion texels/s
4GB GDDR3 @ 800MHz
189W TDP

GTX280
48.2 billion texels/s
1GB GDDR3 @ 1107MHz
236W TDP


So there's a slightly increased core clock - 650MHz if my maths are correct - and a much lower TDP. Sure the GDDR3 is clocked lower but there's four times(!) as much of it.

Psycho
10-Nov-2008, 23:50
The TDP for the gtx280 is quite high compared to the actual load power - maybe they have tightened the specifications a bit for this one? (I guess that would also be a natural thing to do for a refresh, when you have better control of the process etc).
edit: it's most probably the 55nm part, the power saving based on TDP numbers just seems much better than usual for the 65->55 transition.

Tchock
11-Nov-2008, 03:38
Maybe just a new revision of GT200 instead?

Reviewers always seemed to had good samples of the GTX 280 but in the retail wild I don't think thermal output was that consistent.

They might have just binned it easily, but still I'm inclined to think that this is the 55nm shrink, especially after the 4GB RAM power consumption delta.

Domell
11-Nov-2008, 09:08
Well i wonder what NVIDIA will do with names of their future GPUs based on GT2xx architecture. According to rumours cards based on GT206 aka GT200B will be GTX290/GTX270. Q1/Q2 next year there are GT12/216. One of them is supposedly mainstream GPU which could be named as GTX250/GTX240/GTX230 but the second is most likely their hext "Big thing" until GT300 is released. So it should be significantly faster than any current GT200 version and even GT206. It is said to have much more Shaders and faster clockw thanks to 40nm. So if it is supposed to be much faster than GT200 and GT206 (which are around the same performance level IMO) what name this GPU will have? GTX295 or what? All we know that GTX380 and others GTX3xxs are reserved for GT300 GPUS.

Still completely mess with names.

PS. And another thing is how NVIDIA wants to compete with ATI with "only" DX10 GPU (GT212/GT216) next year when ATI is going to have DX11 GPUs (Rv870)?

igg
11-Nov-2008, 10:24
@Domell: I don't expect Rv870to be a DX11 GPU. IMO it's more likely they'll increase performance to compete with GT212 and release the DX11 GPU in the end of 2009.

Naming: Nvidia introduced a series called GeForce GT1xx (GeForce GT150 and so on). Maybe the new GPUs will be released as ULTRA290 or something like that.

Kaotik
11-Nov-2008, 11:09
Well i wonder what NVIDIA will do with names of their future GPUs based on GT2xx architecture. According to rumours cards based on GT206 aka GT200B will be GTX290/GTX270. Q1/Q2 next year there are GT12/216. One of them is supposedly mainstream GPU which could be named as GTX250/GTX240/GTX230 but the second is most likely their hext "Big thing" until GT300 is released. So it should be significantly faster than any current GT200 version and even GT206. It is said to have much more Shaders and faster clockw thanks to 40nm. So if it is supposed to be much faster than GT200 and GT206 (which are around the same performance level IMO) what name this GPU will have? GTX295 or what? All we know that GTX380 and others GTX3xxs are reserved for GT300 GPUS.

Still completely mess with names.

PS. And another thing is how NVIDIA wants to compete with ATI with "only" DX10 GPU (GT212/GT216) next year when ATI is going to have DX11 GPUs (Rv870)?

Or maybe GT206/200b won't be named GTX270/290 but rather 260/280+? Or maybe there will be just 270 & 280(+) with the new 280 having 206 too

DegustatoR
12-Nov-2008, 13:30
There won't be anything new in the NV's desktop videocard line this year AFAIK.

Domell
12-Nov-2008, 19:08
http://www.fudzilla.com/index.php?option=com_content&task=view&id=10421&Itemid=1

...interesting but what he was talking about? Because according to leaked unofficial roadmap there will be GT212/216 in 40nm and GT300 too. So could it be possible that NVIDIA is going to do some architectural improvements in GT212/216 over current GT200s? (i mean something like G80-->G9x improvements).

N00b
17-Nov-2008, 09:09
Digitimes is reporting (http://www.digitimes.com/news/a20081114PD208.html) that Nvidia will reduce prices in order to increase market share.

Does that mean they are clearing inventory to make room for the GTX206/GT200b or does it mean that GTX206/GT200b is already here (thus enabling lower prices), or neither? Any insights?

Arun
17-Nov-2008, 09:20
Might also be because 40nm is ahead of schedule. I don't know if it's NV, ATI or Altera... but either way that would seem to bode well for the first 40nm GPUs to be available in late Q1: http://eetimes.com/news/semi/showArticle.jhtml;jsessionid=WKFIO0MPANTNEQSNDLOSK HSCJUNN2JVN?articleID=212100139

Given that NV is barely starting to ship 55nm GPUs in the G94-G98 area, and that this new GPU will likely phase out G94 (and maybe G92 too), if they were the ones in the lead you'd expect them to try to minimize inventory beyond historical levels. Of course, if it's ATI...

DegustatoR
17-Nov-2008, 10:12
AFAIK current yields on GT200 are very good (quite a bit better than was planned), G200-103 and -203 are chips working on GTX260/280 frequencies on lower voltage levels.
GT200b (i'm still not sure wether it's the same as GT206 or not) is likely shipping in QFX5800 right now. But there won't be any desktop line-up updates this year. They may silently switch GT200 to 55nm in GTX260/280 but won't introduce new names/positions.

Jawed
17-Nov-2008, 10:25
I suppose a possibility is that there are so many GT200s in the channel that they've halted production on them entirely - 55nm variants are only going into non-consumer products.

Jawed

Cookie Monster
17-Nov-2008, 11:32
So are they will doing a "8800GTS", and release a 2nd revision of GTX260 and 280 with the new GT206?

Also maybe they've decided to reserve the GTX270/290 monikers for the 40nm refresh since the 40nm parts could be ahead of schedule since its ready for volume production right now.

suryad
17-Nov-2008, 15:15
So for the layman who is looking to upgrade his 8800 ultras SLI setup, a refresh is not expected this year then?

trinibwoy
23-Dec-2008, 22:25
From the RV740 thread:

And I roughly stick to it, and I think the range of specs I'd find plausible for GT216 is all the way from 3T/72A to 4T/160A.

I'd be very surprised if Nvidia stuck to 3 SIMDs in the TPC's for the lower end GT2xx stuff. It just wouldnt match up well with what's out there now. 3T/72A will probably not impress vs the current 4T/64A of G94b and I don't think they'll jump all the way to 5 SIMDs just yet.

If I were a betting man my completely unfounded guess for a 40nm GT2xx lineup using your notation would be:

GT212: 10T/320A @ ~ 1650Mhz (Q2/Q3 09) with 8T/256A yield-enhancing SKU.
GT214: 5T/160A @ ~ 1650Mhz (Q1/Q2 09)
GT216: 2T/64A @ ~ 1650Mhz (Q1/Q2 09)
GT218: 1T/32A @ 1500Mhz (Q3/Q4 09)

I'm betting Nvidia will try to go small and fast this time around like they did with G71 and G92. Which would mean their first big 40nm part will be GT3xx ~ Q1 2010. Every generation most people bet too high, so I'm gonna play devil's advocate and bet low this time :)

Whatever their plans in the $100-$200 segment, I'm sure RV740 screwed them all up though.

Arnold Beckenbauer
23-Dec-2008, 23:12
I'm lost here: T = TPC so it's Cluster or SIMD, too?
A SIMD with 5 Cores (or multi-processors)? Possible? Yes. Usefull and Efficient? Hm...
Dual-Issue would be a real fun with 5 cores per Cluster/SIMD.

Arun
23-Dec-2008, 23:12
Hmmm - what are those clocks? If those are core clocks, what do the shader clock look like - 5GHz? :) If those are shader clocks, isn't that a bit low given the claimed 40nm performance boosts by TSMC?
Arnold: Yeah, that comes from people using AMD nomenclature for NV products, since in AMD's case it makes sense there are 4 SIMDs in R600 and 10 in RV770, while in NVIDIA's case clearly there are multiple instructions per cluster. I meant 5 multi-processors per cluster personally, with the "half-MUL/SFU" catch I explained in another thread.

Domell
23-Dec-2008, 23:38
From the RV740 thread:



I'd be very surprised if Nvidia stuck to 3 SIMDs in the TPC's for the lower end GT2xx stuff. It just wouldnt match up well with what's out there now. 3T/72A will probably not impress vs the current 4T/64A of G94b and I don't think they'll jump all the way to 5 SIMDs just yet.

If I were a betting man my completely unfounded guess for a 40nm GT2xx lineup using your notation would be:

GT212: 10T/320A @ ~ 1650Mhz (Q2/Q3 09) with 8T/256A yield-enhancing SKU.
GT214: 5T/160A @ ~ 1650Mhz (Q1/Q2 09)
GT216: 2T/64A @ ~ 1650Mhz (Q1/Q2 09)
GT218: 1T/32A @ 1500Mhz (Q3/Q4 09)

I'm betting Nvidia will try to go small and fast this time around like they did with G71 and G92. Which would mean their first big 40nm part will be GT3xx ~ Q1 2010. Every generation most people bet too high, so I'm gonna play devil's advocate and bet low this time :)

Whatever their plans in the $100-$200 segment, I'm sure RV740 screwed them all up though.

Well, i think those specs sounds very reasonable and they are very possible IMO. I think NVIDIA wants to do the same thing with GT2xx 40nm like they have done with G71 and G92 as well. That could be definitely right move.

PS. How do you think how big could be GT212 with specs like above in 40 nm? Is there any chance to have below 300mm^2?

Arnold Beckenbauer
23-Dec-2008, 23:44
Hmmm - what are those clocks? If those are core clocks, what do the shader clock look like - 5GHz? :) If those are shader clocks, isn't that a bit low given the claimed 40nm performance boosts by TSMC?
Arnold: Yeah, that comes from people using AMD nomenclature for NV products, since in AMD's case it makes sense there are 4 SIMDs in R600 and 10 in RV770, while in NVIDIA's case clearly there are multiple instructions per cluster. I meant 5 multi-processors per cluster personally, with the "half-MUL/SFU" catch I explained in another thread.

I'm really a lucky guy being so far away, that you can't hit me by a frying pan.
http://www.beyond3d.com/content/reviews/51/3

My understanding: The multi-processors work on their own warps, but they all execute the same instruction (the same TCP of course)?

trinibwoy
24-Dec-2008, 01:01
I'm lost here: T = TPC so it's Cluster or SIMD, too?
A SIMD with 5 Cores (or multi-processors)? Possible? Yes. Usefull and Efficient? Hm...
Dual-Issue would be a real fun with 5 cores per Cluster/SIMD.

I meant it the way Arun did. A "T" is a TPC or thread-processing cluster. When I referred to SIMD count earlier I was referring to the number of SIMDs per cluster, not the width of each SIMD (which presumably stays at 8). Nvidia's SIMDs are also referred to as Streaming Multi-processors (SM). I probably didn't make this clear but the configs I laid out assumed 4 SIMDs per cluster for a total of 32 processors per cluster, up from 24 on GT200 and 16 on G8x.

I'm really a lucky guy being so far away, that you can't hit me by a frying pan.
http://www.beyond3d.com/content/reviews/51/3

My understanding: The multi-processors work on their own warps, but they all execute the same instruction (the same TCP of course)?

From the article ;)

Each SP in each SM runs the same instruction per clock as the others, but each SM in a cluster can run its own instruction. Therefore in any given cycle, SMs in a cluster are potentially executing a different instruction in a shader program in SIMD fashion.

trinibwoy
24-Dec-2008, 01:10
Hmmm - what are those clocks? If those are core clocks, what do the shader clock look like - 5GHz? :) If those are shader clocks, isn't that a bit low given the claimed 40nm performance boosts by TSMC?


Shader clocks of course!! It might seem low but I'm aiming low - promises never seem to pan out with these things :) And even with those clocks GT212 could see a healthy 50% advantage over GTX 285.

PS. How do you think how big could be GT212 with specs like above in 40 nm? Is there any chance to have below 300mm^2?

Yeah I think there's a good chance of ending up < 300mm^2. The shrink to 55nm wasn't nearly linear so there's a good chance that the move to a full node at 40nm cuts a lot of fat. In addition to any pipeline stage reductions Nvidia is able to pull off on the new process a la G71.

AnarchX
24-Dec-2008, 08:33
Seems that they are really working on GT214 and even considering GDDR5:
http://www.linkedin.com/pub/3/255/6b3

fellix
24-Dec-2008, 09:59
My understanding: The multi-processors work on their own warps, but they all execute the same instruction (the same TCP of course)?
Wasn't that the SPMD model (Single Program/Process, Multiple Data), or in a shader "wording" -- single kernel on multitude of batches? :shock:

trinibwoy
24-Dec-2008, 12:13
Seems that they are really working on GT214 and even considering GDDR5:
http://www.linkedin.com/pub/3/255/6b3

He's gonna be in so much trouble for that :lol:

Lukfi
24-Dec-2008, 13:09
Poor guy is soooo fired :twisted: (or maybe it's a smoke screen)
I wonder if nVidia keeps the naming scheme that would put GT214 in the lower midrange, GT212 into upper midrange and so on... if so, then why do we have rumours about GT212, GT214 and GT216 but no GT210 as a high-end monolithic monster chip?

Arun
24-Dec-2008, 13:20
Poor guy is soooo fired :twisted: (or maybe it's a smoke screen)Meh, GT212/GT214/GT216/GT218 codenames were leaked, what, in 3Q07? If he gets any real trouble for putting that on his resume, his boss should seriously reconsider his value to humanity.

As for considering GDDR5, it's strange that he mentions that for GT214 - of course he says "simulation" so it really doesn't mean anything concrete as to the final part... And you'd expect them to experiment with all possibilities at one stage or another anyway, so once again doesn't tell us all that much.

I wonder if nVidia keeps the naming scheme that would put GT214 in the lower midrange, GT212 into upper midrange and so on... if so, then why do we have rumours about GT212, GT214 and GT216 but no GT210 as a high-end monolithic monster chip?GT300-or-whatever-it's-called was still slated for 4Q09 last time NV talked about it I think - clearly they want to be able to showcase a performance boost there. I wouldn't be surprised if GT212 was 384-bit GDDR5 and GT300 was 512-bit GDDR5, but then again I really have no idea about the memory config of either TBH.

Jawed
24-Dec-2008, 13:34
I think:

Did frame buffer simulation (DDR2/GDDR3/GDDR5) for various boards for G96 and GT214
(my emphasis) is pretty interesting there :grin:

Jawed

DegustatoR
24-Dec-2008, 13:38
I'm still thinking more in lines of...
- 256-bit GDDR5 40nm GT212 @ ~150% GT200 performance, two of those for AFR top end
- 384-bit GDDR5 40nm GT300 @ ~300% GT200 performance
512-bit bus probably won't come back until the second DX11 generation line-ups.

Oh, yeah, i heard that GT200 has GDDR5 support in the MCs but NV doesn't see any reason to use GDDR5 on GT200 boards (which is understandable considering that AFAIK GDDR5 costs 3-3.5 times more that GDDR3 right now while GT200 isn't nearly bandwidth starved even with GDDR3).

Rayne
24-Dec-2008, 14:16
... GT200 isn't nearly bandwidth starved even with GDDR3 ...

If you own a GTX 280, i can provide a sample where the frame rate is almost linear with the bandwidth of the card.

I'm waiting for a 512bit GDDR5 monster. That day, i'll say 'goodbye' to my performance problems.

Arun
24-Dec-2008, 14:33
Oh, yeah, i heard that GT200 has GDDR5 support in the MCs but NV doesn't see any reason to use GDDR5 on GT200 boards (which is understandable considering that AFAIK GDDR5 costs 3-3.5 times more that GDDR3 right now while GT200 isn't nearly bandwidth starved even with GDDR3).In the MCs? I don't know, but I'm pretty damn sure the PHYs don't support GDDR5 so that'd seem strange although not impossible. Also, 3-3.5 times? I more than a little bit doubt that to say the least but heh :)

INKster
24-Dec-2008, 15:22
I think:


(my emphasis) is pretty interesting there :grin:

Jawed

He again mentions G96 and GT214 together in this:

•Did loadline analysis and core power transient simulations for G96 and GT214.

A 9600 GT/G94b at higher core/shader clocks, coupled with high-speed GDDR5 on a 256bit bus sounds plausible, especially if it's intended to replace the G92-based 9800 GT/9800 GTX/GTX+ boards, using a smaller, simpler core.
It can get close to the RV770 LE/HD 4830, if not HD 4850 levels.


BTW, did i mention the awesomeness of the name "Zhenggang" ? :D

Arun
24-Dec-2008, 15:28
I swear all of you guys are just trying to figure out what roadmap would guarantee NVIDIA's bankruptcy despite their huge amounts of cash in the bank :p I know G92 and GT200 deflated expectations, but come on...

Jawed
24-Dec-2008, 15:30
I took that to mean that G96 will never see GDDR5 (since it's out there). It seems to me more likely that NVidia was evaluating the timing/pricing of GDDR5 and considering whether G96 or GT214 would be the chip that introduces it.

But, well, we're unlikely ever to find out about G96-specific stuff. Any revision of G96 in the same performance bracket is surely going to be GT21x, perhaps GT214, not G9x based.

But I dare say the future's looking bright for GT214...

Jawed

INKster
24-Dec-2008, 15:35
So..., can we settle on the "GT214" vs "RV740" codename war scheme for Q1'09 in the performance midrange segment ?

Arun, it's not Lehman Brothers, is it ? :D

rpg.314
24-Dec-2008, 15:41
Meh, GT212/GT214/GT216/GT218 codenames were leaked, what, in 3Q07? If he gets any real trouble for putting that on his resume, his boss should seriously reconsider his value to humanity.

Nice:)

trinibwoy
24-Dec-2008, 16:10
So..., can we settle on the "GT214" vs "RV740" codename war scheme for Q1'09 in the performance midrange segment ?

Looks like it. But I suspect Nvidia will be pissed if AMD tries to push RV740 for ~ $100. The 9600GT debuted at closer to $200 IIRC.

trinibwoy
24-Dec-2008, 16:18
I'm still thinking more in lines of...
- 256-bit GDDR5 40nm GT212 @ ~150% GT200 performance, two of those for AFR top end
- 384-bit GDDR5 40nm GT300 @ ~300% GT200 performance
512-bit bus probably won't come back until the second DX11 generation line-ups.

Hmmm, I don't know. I think we're at the point where the flagship needs to have at least 1GB RAM (see GTX 295 tanking at high resolutions in modern games). A 384-bit bus would imply 1.5GB of GDDR5 - not a cheap proposition.

Domell
24-Dec-2008, 22:54
I think that chip which NVIDIA needs the most is a worthy succesor of G92 as performance chip and chip like NV43 or G73.
Mainstream chip with 192SP/48TMU/16ROP/256-bit Mem. Bus could have a chance to be "the second GF6600GT". Die size in 40nm wouldn`t be bigger than 200mm^2.

I don`t think that any of new GT2xx in 40nm will be "next REAL highend" chip. Why? Because they don`t need another highend chip when GT300 is going to be released in next 10-12 months.

DegustatoR
25-Dec-2008, 09:34
If you own a GTX 280, i can provide a sample where the frame rate is almost linear with the bandwidth of the card.
Anybody can create a sample which will show you that any card is limited by anything. I'm talking about average usage situations.
According to what i heard NV's prototypes of GT200 with GDDR5 showed +10% performance increase on average which kinda confirms my point.

In the MCs? I don't know, but I'm pretty damn sure the PHYs don't support GDDR5 so that'd seem strange although not impossible. Also, 3-3.5 times? I more than a little bit doubt that to say the least but heh :)
Well, that's what i heard. Is it true or not -- i don't know myself.

Hmmm, I don't know. I think we're at the point where the flagship needs to have at least 1GB RAM (see GTX 295 tanking at high resolutions in modern games). A 384-bit bus would imply 1.5GB of GDDR5 - not a cheap proposition.
GT300 will probably show up in big quantities at MSRP price level in the begininning of 2010. 4870 X2 have 2 GB of GDDR5 right now and it's selling (if i'm not mistaken) for $499. I don't see what's not cheap in having a 1,5 GB of GGDR5 in more than a year.
And as for GTX295 @ high resolutions -- i'm not really sure that the dip in performance is related to having only 896 MB of on-board memory. Similar dip can occur on a regular GTX280 with 1 GB VRAM. I think that there's something different going on.

Rayne
25-Dec-2008, 14:56
Anybody can create a sample which will show you that any card is limited by anything. I'm talking about average usage situations.
According to what i heard NV's prototypes of GT200 with GDDR5 showed +10% performance increase on average which kinda confirms my point.

Yes, i see your point, but, try to understand mine too. I'm trying to run old titles with SSAA @ 1920x1200, and there is still no way to run them, after several gens of gfx cards. The bandwidth has not grown linearly with the shading power, and yes, the new games do run much better, but damn it, some old titles that i want to run, do not scale properly. As reference, my own BR2 patch runs at 50fps on the 100GB/s 8800GTX (oc'ed), but, it runs at 70fps on the 140GB/s GTX 280. 140/100 = 40% -> 70/50 = 40%. Now you can understand why i wish a revolution at the memory bandwidth department. And SLI/CF do not run properly with my code, because there are too many dependencies between the frames. :sad:

Arun
25-Dec-2008, 15:01
Hmmm, I don't know. I think we're at the point where the flagship needs to have at least 1GB RAM (see GTX 295 tanking at high resolutions in modern games). A 384-bit bus would imply 1.5GB of GDDR5 - not a cheap proposition.Just thought I'd quickly comment on that: just like 256Mbit GDDR3 probably wasn't such a great idea in the R600 timeframe, 512Mbit GDDR5 in 2H2009 likely wouldn't be a good idea either; it would be substantially more expensive than just half the price of a 1Gbit chip.

So if you need that amount of bandwidth, I don't think you can get away from such monstruous levels of memory. At the same time the level of performance of these cards will hopefully be astonishing, and so unless 1H09's games are much more performance intensive than I think they will be, they will be tested in many cases at very high resolutions with massive textures and tons of AA. Also remember GT212 might be used for CUDA even if GT300 hits its ETA, so being able to easily support massive amounts of memory (3-6GB) makes some sense there too.

Arnold Beckenbauer
25-Dec-2008, 17:20
AMD RV740 GPU 40nm tape-out finished, Nvidia targeting GT212 for 2009, says paper (http://www.digitimes.com/news/a20081224PB202.html)

In other news, Nvidia's 40nm GT212 GPU is expected to complete its tape-out at the beginning of 2009, and mass production is scheduled for the second quarter. Nvidia's next-generation GT216 and GT300 GPUs will also transition to a 40nm process in mid or late 2009, added the paper.

I'm lost again.
GT216 is the next gen GPU from Santa Clause?

DegustatoR
25-Dec-2008, 17:27
I think they switched G212 and G216 in there...

Lukfi
26-Dec-2008, 10:48
Just to let you know, our dear Mr. Zhenggang Cheng deleted the part about GT214 from his LinkedIn profile :-D

Salvadore
26-Dec-2008, 18:50
Here the specifications of the GT216:

http://news.ati-forum.de/index.php/news/38-andere-hardware/210-gt216-spezifikationen

Kaotik
26-Dec-2008, 21:43
Here the specifications of the GT216:

http://news.ati-forum.de/index.php/news/38-andere-hardware/210-gt216-spezifikationen

Has ati-forum.de actually ever had any real specs in their 'news' for ANY chip?:lol:

Salvadore
26-Dec-2008, 21:49
Has ati-forum.de actually ever had any real specs in their 'news' for ANY chip?:lol:

We don't got information from insider the last 6 months about new GPUs. The GT216 specs are the only one!
And the RV740 specs (http://news.ati-forum.de/index.php/news/34-amdati-grafikkarten/189-rv740-spezifikationen-aufgetaucht) are written down from fudzilla (256bit memory interface (http://news.ati-forum.de/index.php/news/34-amdati-grafikkarten/186-rv740-mit-256bit-speicherinterface)) and vr-zone (the rest). :smile:

Domell
27-Dec-2008, 00:06
Hmm if GT216 has this specs then i wonder what specs will get GT214 (performance-mainstream chip) which is supposed to be positioned between GT212 (40nm highend chip) and GT216 (40nm mainstream chip).

Salvadore
27-Dec-2008, 00:10
The GT214 will have GDDR5:
http://www.hardware-infos.com/news.php?news=2609

Domell
27-Dec-2008, 00:17
Ok but this is the only difference between GT214 and GT216? GDDR5 vs GDDR3? I don`t think so. Maybe GT214 will have 192SP and 256-bit Mem bus but then NVIDIA wouldn`t have to release GPU with specs like GT216 linked above because to get 160SP with 192-bit mem bus is very simple - disable one cluster (probably 32SP/8TMU/64-bit MC/4 ROPs.

Salvadore
27-Dec-2008, 00:35
I think the TMUs, ROPs and SPs will different too. And also the clocks!

trinibwoy
27-Dec-2008, 02:44
The GT214 will have GDDR5:
http://www.hardware-infos.com/news.php?news=2609

You really need to stop posting circumstantial evidence in a factual way. They are simply pointing to that Zenggang guy who had GT214 in his linkedin profile. That is hardly proof of anything.

Salvadore
27-Dec-2008, 03:08
You have right, but it's the only thing about GT214 I've found.

CarstenS
27-Dec-2008, 13:32
The GT214 will have GDDR5:
http://www.hardware-infos.com/news.php?news=2609
I wonder how HW-Infos arrived at that conclusion? Their source, which is the infamous Linked-In-entry, does not state that. Only that Mr. Zenggang did FB-sims for various SKUs (G96- and GT214-based) with different memory configurations - among them GDDR5.

But G96-boards did not use GDDR5 either, as we all know...

Salvadore
27-Dec-2008, 13:39
I don't know, but in future, I think, they have to set on GDDR5 in higher-mainstream cards.

trinibwoy
28-Dec-2008, 15:25
Also remember GT212 might be used for CUDA even if GT300 hits its ETA, so being able to easily support massive amounts of memory (3-6GB) makes some sense there too.

Yeah that's why sticking with a 512-bit bus makes sense to me - for those massive Tesla framebuffers. Also, how would you get 6GB on a 384-bit bus? I thought the maximum configuration was 2 chips per 32-bit channel? I suspect Nvidia will stick to 512-bit GDDR5 on GT300.

Jawed
29-Dec-2008, 09:31
As posted by Shtal in the GT300 thread:

http://vr-zone.com/articles/nvidia-40nm-desktop-gpus-line-up-for-2009/6359.html?doc=6359

This transition to 40nm will first take place with their high end GT212 GPU in Q2 follows by the mainstream GT214 and GT216 as well as value GT218 in Q3. GT212 will be replacing the 55nm GT200 so you can expect pretty short lifespan for the upcoming GTX295 and GTX285 cards.
Hmm, this is quite a different sequence on 40nm than was generally expected :razz:

Jawed

CarstenS
29-Dec-2008, 10:13
Makes sense to me.

Since june timeframe i guess, they realized that they need to carry over GT200-performance on a cheaper price per die as fast as they can.

Plus, with larger chips you usually do not have such a high risk (financially) when switching to/testing a state of the art fabrication process.

You won't need as much units shortly after launch like in the volume segments, usually you do not have to meet strict OEM-cycles for high-end-parts, so a delay for a respin is not that crucial and you have the option to sell binned-down versions of a GPU - a luxury, an entry-level-product usually cannot afford.

trinibwoy
29-Dec-2008, 15:10
Or maybe after two generations straight of AMD having a significant process advantage they've decided to shed their paranoia of another 130nm era debacle to avoid being left behind yet again.

DegustatoR
29-Dec-2008, 15:26
Two generations? What generations are those? Significant? 65nm vs 55nm with 55 being a version of 65? Or 90nm vs 80nm with 80nm chip coming half year late and gaining nothing from 80nm?
The reason why NV was avoiding 55nm in RV670 timeframe has nothing to do with them being scared of a new process. It was more of an availability thing.
With 40nm being the main and only TSMC node for the time being and general economy slow down nothing is stopping them from going to 40nm with AMD.
But i have severe doubts about GT212 being the first 40nm chip from NV. Even ATIs engineers prefer to go with the simplier chip first now. And for NV it's like a tradition of sorts since NV43. So i'm still pretty sure that we'll see GT216 or GT214 before GT212.

Btw, what the hell is GT206? =)

Arun
29-Dec-2008, 16:37
Two generations? What generations are those? Significant? 65nm vs 55nm with 55 being a version of 65? Or 90nm vs 80nm with 80nm chip coming half year late and gaining nothing from 80nm?Very nice theory indeed, in practice on 65/55nm things weren't so pretty because of a variety of factors that resulted in the 80nm G84/G86 competing with 55nm chips for nearly 6 months (although from an OEM design cycle perspective it wasn't as big of a problem).

In this specific case, I think what needs to be realized is that while NV could justify lagging behind if they hit all their milestones, if they get delayed then by the time their part comes out it would have been more attractive not to be so conservative on process technology out of fear for wafer cost/yields.

On the other side of the coin, might have to be added the possibility that TSMC gave more attractive pricing to NVIDIA on older nodes in order to amortize them further. Remember much of the reason why MS took so long to transition the XBox360 GPU to 65nm is that TSMC's pricing just wasn't sufficiently attractive because they weren't as big of a customer as ATI. So while companies like ATI might get preferential pricing to go first on a process node for TSMC to be able to justify investment, NVIDIA might have gotten preferential pricing for sticking to an older node to amortize it further.

TSMC is not a "dumb" entity that just creates naive roadmaps pricing schemes not based on customer relationships. Both capacity and the different pricing models for different customers is dependent on complex feedback loops, and anything that doesn't take that into account is unlikely to be a very useful theory IMO.

The reason why NV was avoiding 55nm in RV670 timeframe has nothing to do with them being scared of a new process. It was more of an availability thing.Indeed. The fundamental problem however is they could not easily at the same time work on some chips being 65nm and others 55nm; the "optimal" line-up both for NVIDIA and TSMC would have had a mix of both from the start, but this was not an option apparently.

With 40nm being the main and only TSMC node for the time being and general economy slow down nothing is stopping them from going to 40nm with AMD.Pretty much, although I think it has been the plan for a long time that NVIDIA/AMD would both be very aggressive with 40nm. Nearly all handheld capacity will remain on 65 until well into 2010, and companies like CSR are only going to ramp 90nm for products like Bluecore7 in 2H09, so capacity reductions in older nodes shouldn't be catastrophic because of the shift to 40nm.

I would suspect that while NVIDIA/AMD's pricing for 40nm must be high, that of the likes of Broadcom and Marvell must be even higher for 2009 to encourage them not to shift too quickly in the few product line-ups they have with short design cycles. I also suspect TSMC sees large early investments in 40nm as a way to steal some customers from UMC/Chartered and encourage the likes of NV not to dual-source with them again this generation, or at least not as much.

But i have severe doubts about GT212 being the first 40nm chip from NV.So do I, my expectations for GT212's die size are too large for it to make much sense in my mind.

Btw, what the hell is GT206? =)My guess, FWIW, is that it is a G98 replacement that got canned. The fact there was a 'i' (i.e. integrated) version of the same is a strong hint in that direction; given the debacle that is NVIDIA's chipset division, it probably got killed in favour of focusing on future 40nm products.

AnarchX
29-Dec-2008, 16:54
My guess, FWIW, is that it is a G98 replacement that got canned. The fact there was a 'i' (i.e. integrated) version of the same is a strong hint in that direction; given the debacle that is NVIDIA's chipset division, it probably got killed in favour of focusing on future 40nm products.

Look at #1, ELSA saw GT206 as high-end part.

Probably someone just misread GT200b as GT206, since b looks a bit like the 6, especially for asians?

trinibwoy
29-Dec-2008, 16:57
Two generations? What generations are those? Significant?

I was thinking of RV670 and RV770. And I should have said process/die-size advantage. They went up against considerably larger 80/90nm and 65nm parts from Nvidia.

The reason why NV was avoiding 55nm in RV670 timeframe has nothing to do with them being scared of a new process. It was more of an availability thing.
With 40nm being the main and only TSMC node for the time being and general economy slow down nothing is stopping them from going to 40nm with AMD.

I don't think Nvidia's conservative stance on process adoption is debatable. They've openly been willing to take their time moving to new nodes.

But i have severe doubts about GT212 being the first 40nm chip from NV. Even ATIs engineers prefer to go with the simplier chip first now. And for NV it's like a tradition of sorts since NV43. So i'm still pretty sure that we'll see GT216 or GT214 before GT212.

Perhaps, but remember G92 and G94 hit around the same time with the 8800GT actually making it to market months before the 9600GT so there's still a possibility. But I agree Q2 seems too aggressive for a big 40nm part.

KonKort
29-Dec-2008, 19:56
So do I, my expectations for GT212's die size are too large for it to make much sense in my mind.

What are you expectations for GT212's die size?

Domell
29-Dec-2008, 20:26
Wasn`t G92 (the fastest G9x chip) first GPU in 65nm process from NVIDIA? So IMO GT212 (the fastest GT2xx chip) could be first GPU from NVIDIA made in 40nm as well.

KonKort
29-Dec-2008, 21:28
Yes, that's right. But G92 did not have more SPs or TMUs - well 64 instead of 32 TAUs ;).

It looks like that GT212 will have more SPs and TMUs than GT200.
The last time Nvidia brought a new high end chip in a new manufacture was in 2003. NV30 aka Geforce FX 5800.

Domell
29-Dec-2008, 22:03
Well, it doesn`t look like NV30 syndrome IMO. Why? because NV30 was completely new architecture compared to NV25, totally new generation and it have had aboy 2X more transistors than NV25.

Between GT212 and GT200 (even GT200B) is not such a big difference. There is NO new architecture and NO significant increase number of transistors. Moreover i think that GT212 will have only more ALUs than GT200 and number of TMU will be the same as GT200 has. I think NVIDIA will do 32SP per cluster (24SP at now) so then we could see something like this - 320ALU,80TMU,32ROP,512-bit MC. This is my opinion about GT212.

trinibwoy
29-Dec-2008, 22:36
I think NVIDIA will do 32SP per cluster (24SP at now) so then we could see something like this - 320ALU,80TMU,32ROP,512-bit MC. This is my opinion about GT212.

Hey, stop stealing my opinions and claiming them as your own! :razz:

Domell
29-Dec-2008, 22:45
I`m not saying it`s my own opinion and i have said it first but only agree with it. ;) This is most reasonable move which NVIDIA could do with their GT2xx 40nm highend chip.

KonKort
30-Dec-2008, 00:04
Well, it doesn`t look like NV30 syndrome IMO. Why? because NV30 was completely new architecture compared to NV25, totally new generation and it have had aboy 2X more transistors than NV25.

Between GT212 and GT200 (even GT200B) is not such a big difference. There is NO new architecture and NO significant increase number of transistors. Moreover i think that GT212 will have only more ALUs than GT200 and number of TMU will be the same as GT200 has. I think NVIDIA will do 32SP per cluster (24SP at now) so then we could see something like this - 320ALU,80TMU,32ROP,512-bit MC. This is my opinion about GT212.

You are right. The difference between NV25 and NV30 was much bigger than GT212 between GT200 wille be.
And you are right, too, that Nvidia will do 32 SP per cluster by GT212. But this is not the only one, who is change. It is definetly a bigger step than G80 to G92.

CarstenS
30-Dec-2008, 07:15
So do I, my expectations for GT212's die size are too large for it to make much sense in my mind.
That'd be one of the things amendable with 40nm technology.

Since I am no chip production/design expert: Isn't it the case, that you usually get a better shrinkage the more logic and cache, i.e. digital ICs, you have on a chip? Perfect target: Large Dies.

DegustatoR
30-Dec-2008, 11:43
In this specific case, I think what needs to be realized is that while NV could justify lagging behind if they hit all their milestones, if they get delayed then by the time their part comes out it would have been more attractive not to be so conservative on process technology out of fear for wafer cost/yields.
I completely agree -- and that's almost exactly what happened to GT200 which probably is the only real example of NV choosing the wrong process since the NV30/130nm fiasco.
Another problem lies in the low transistor density of 65/55 NV GPUs -- G92b is bigger than RV770 on the same 55nm process while having 160M less transistors. I think that's the real problem for NV in 65/55nm generation -- simply put NVs 65/55 process usage sucks and they need to improve it considerably on 40nm node.

TSMC is not a "dumb" entity that just creates naive roadmaps pricing schemes not based on customer relationships. Both capacity and the different pricing models for different customers is dependent on complex feedback loops, and anything that doesn't take that into account is unlikely to be a very useful theory IMO.
Exactly. And that's why it's pretty pointless to try and 'guess' die production cost from it's size alone. And that's why being "slow" to smaller TSMC nodes doesn't mean paying more for the GPUs. Especially when we're talking about node -> half-node transitions.

My guess, FWIW, is that it is a G98 replacement that got canned. The fact there was a 'i' (i.e. integrated) version of the same is a strong hint in that direction; given the debacle that is NVIDIA's chipset division, it probably got killed in favour of focusing on future 40nm products.
That's a valid theory =)
GT206 MCP77 iGT206 MCP79 iGT209
So iGT209 is killed too? 8)

I was thinking of RV670 and RV770. And I should have said process/die-size advantage. They went up against considerably larger 80/90nm and 65nm parts from Nvidia.
55nm RV670 went against 65nm G92 (although it's worth to mention that NV's tactical mistake here made them do it -- they should've put G94 ahead of G92 and against RV670 instead).
As for RV770 -- NVs roadmap was in such a mess at the point of RV770 launch that it doesn't really matter wether they used 65 or 55nm for GT200 -- it would look worse than RV770 anyway. For NV it would be wise to use 55nm/256-bit GDDR5 of course but they've originally planned to launch GT200 when there were neither (well, 55nm was available since autumn'07 but migrating to 55nm with such a complex chip as GT200 probably wasn't an option).

I don't think Nvidia's conservative stance on process adoption is debatable. They've openly been willing to take their time moving to new nodes.
I don't think it's 'conservative', i think it's 'strategical'. They first 'try' the process with a simple chip and then transit a more complex ones. This 'simple chip' from NV for the most part was available as soon as the process allowed it to be. So it's not like they're waiting for half a year before switching to a new process, they simply beginning the switch in the low end segment (for which nobody cares here anyway). And this strategy mostly paid off.
If you think about it, NV was never that late with process transitions compared to ATI/AMD:
130 - NV31/1Q03 - RV360/4Q03
110 - RV370/2Q04 - NV43/3Q04
90 - R520/4Q05 - G7(1/2/3)/1Q06
80 - RV535/3Q06 - G86/2Q07
65 - RV630/2Q07 - G92/4Q07
55 - RV670/4Q07 - G92b/2Q08
So it's a 1-2 quarters difference mostly with the exception of 80nm (probably for the same reasons why they were slow with 55nm transition).
Plus you have to consider that RV670 turned out to be good in it's first revision -- and that's a rare thing. If they would need another spin then RV670 would show up at retail at the end of 1Q08 with G92b launching at the end of 2Q08.

Perhaps, but remember G92 and G94 hit around the same time with the 8800GT actually making it to market months before the 9600GT so there's still a possibility.
Yeah, exactly -- and we all see how that turned out.
G94 would've been a much better competitor to RV670 and -- who knows? -- maybe they would have had a better luck with 65nm transition with a simplier and smaller G94? I hope they learn on their mistakes from the previous generation.

Wasn`t G92 (the fastest G9x chip) first GPU in 65nm process from NVIDIA? So IMO GT212 (the fastest GT2xx chip) could be first GPU from NVIDIA made in 40nm as well.
If it's a straight GT200 shrink, yes.
But it's most likely quite a bit more than GT200 (12 or 15 32/8 TPCs, 256-bit GDDR5 bus, DX10.1 support maybe?). Plus it looks like G92 role in this cycle will be performed by 55nm GT200b.

CJ
30-Dec-2008, 12:04
But i have severe doubts about GT212 being the first 40nm chip from NV. Even ATIs engineers prefer to go with the simplier chip first now. And for NV it's like a tradition of sorts since NV43. So i'm still pretty sure that we'll see GT216 or GT214 before GT212.

I'm pretty sure GT212 is the last one of the GT21x series to tape out too.

Arun
30-Dec-2008, 12:20
I'm starting to think about something... When both I and possibly other sites heard about GT214, it certainly hadn't taped-out. Assuming they left the possibility open until the end depending on market conditions, which is a big if, maybe they did switch to GDDR5 for GT214 and that LinkedIn entry means more than I thought (not that it really reveals much either way)

After all, this would be a quite impressive roadmap:
GT218: 64-bit GDDR3 [~15GB/s]
GT216: 192-bit GDDR3 [~60GB/s]
GT214: 192-bit GDDR5 [~110GB/s]
GT212: 384-bit GDDR5 [~240GB/s]
GT300: 512-bit GDDR5 [~320GB/s]

Nothing AMD couldn't counter of course, but it'd make for a more exciting competition that this one (in retrospect, that is; of course when you don't know what's going to happen it can always be exciting...)

Another problem lies in the low transistor density of 65/55 NV GPUs -- G92b is bigger than RV770 on the same 55nm process while having 160M less transistors. I think that's the real problem for NV in 65/55nm generation -- simply put NVs 65/55 process usage sucks and they need to improve it considerably on 40nm node.It's a known problem, expect it to be fixed in a firmware revision... *waits for VR-Zone, Expreview, and/or Fudzilla to link to this :D j/k* - more seriously, expect them to be much more aggressive on 40nm.

So iGT209 is killed too? 8)Maybe; if it was a 55nm product, nearly certainly. If it's 40nm, nearly certainly not.

If you think about it, NV was never that late with process transitions compared to ATI/AMD:You forgot RV350; of course everyone seems to always forget that one and how smoothly it went, poor ATI! :) You also forgot G73b, so NVIDIA wasn't that late to 80nm in fact.

Since I am no chip production/design expert: Isn't it the case, that you usually get a better shrinkage the more logic and cache, i.e. digital ICs, you have on a chip? Perfect target: Large Dies.The cost benefit is likely to be smaller for small chips if they include a lot of I/O or analogue (i.e. this doesn't apply to handheld chips in the same way etc.) - however very big chips are riskier and will suffer from yield problems. This is not just catastrophic defects like coarse redundancy would partially prevent; it's also variability among other things. Chips like GT216/RV740 in the 120-150mm˛ range are likely to be a relatively good compromise, IMO.

Jawed
30-Dec-2008, 13:09
For wafer yield purposes, isn't some degree of NVidia's "oversized" design deliberate? Spacing things out so there's less chance of lithography-related malfunctions? If so, wouldn't this explain the "less than expected" gains from the more advanced nodes?

Jawed

DegustatoR
30-Dec-2008, 13:39
I'm starting to think about something... When both I and possibly other sites heard about GT214, it certainly hadn't taped-out. Assuming they left the possibility open until the end depending on market conditions, which is a big if, maybe they did switch to GDDR5 for GT214 and that LinkedIn entry means more than I thought (not that it really reveals much either way)

After all, this would be a quite impressive roadmap:
GT218: 64-bit GDDR3 [~15GB/s]
GT216: 192-bit GDDR3 [~60GB/s]
GT214: 192-bit GDDR5 [~110GB/s]
GT212: 384-bit GDDR5 [~240GB/s]
GT300: 512-bit GDDR5 [~320GB/s]
I think you should consider inserting GDDR3-based GT212 solution (a la 4850) in there. And that kinda kills the idea of having GDDR5-based 192-bit solution (especially if GT212 is using 256-bit bus as i expect).
Plus -- do we really need 4 chips again after G98/96 fiasco? I always thought that having 4 chips for 0-400 price range is a bit too much.
So the question is -- what's faster -- GT216 or GT214? That LinkedIn thingie was about two chips -- G96 and GT214. Maybe we should consider the possibility of a GT214->GT216->GT212 three-chip line-up?
But if it's GT216->GT214->GT212, then GT212 will most probably have 15 32/8 TPCs and 384-bit bus, yes, GT216 -- something like 7 24/8 TPCs -- and GT214 is starting to look like GT200@40nm with 256-bit GDDR5...

expect them to be much more aggressive on 40nm.
Well i sure hope they will be -- for their own sake.

You forgot RV350; of course everyone seems to always forget that one and how smoothly it went, poor ATI! :) You also forgot G73b, so NVIDIA wasn't that late to 80nm in fact.
Sorry, i was using B3D 3D Tables time line 8)

Rayne
30-Dec-2008, 14:13
...
GT300: 512-bit GDDR5 [~320GB/s]
...

This would solve all my problems with the SSAA performance. :)

I cross my fingers. :yes:

AnarchX
01-Jan-2009, 14:53
Hello GT215!

NVIDIA_DEV.06A0.01 = "NVIDIA GT214"
NVIDIA_DEV.06B0.01 = "NVIDIA GT214 "
NVIDIA_DEV.0A00.01 = "NVIDIA GT212"
NVIDIA_DEV.0A10.01 = "NVIDIA GT212 "
NVIDIA_DEV.0A30.01 = "NVIDIA GT216"
NVIDIA_DEV.0A60.01 = "NVIDIA GT218"
NVIDIA_DEV.0A70.01 = "NVIDIA GT218 "
NVIDIA_DEV.0A7D.01 = "NVIDIA GT218 "
NVIDIA_DEV.0A7F.01 = "NVIDIA GT218 "
NVIDIA_DEV.0CA0.01 = "NVIDIA GT215"
NVIDIA_DEV.0CB0.01 = "NVIDIA GT215 "
http://www.xfastest.com/viewthread.php?tid=17608&extra=page%3D1
:wink:

Domell
01-Jan-2009, 15:00
Does it mean 40nm chips are pretty close?

trinibwoy
01-Jan-2009, 15:31
New option to force ambient occlusion? Wonder how that works. Seems like a very application specific thing.

Pic stolen from http://www.hardforum.com/showthread.php?t=1380556

http://img84.imageshack.us/img84/1441/ambientocclusionxo1.png

INKster
01-Jan-2009, 15:35
NVIDIA_DEV.0A20.01 = "NVIDIA D10M2-30
NVIDIA_DEV.06FF.01 = "NVIDIA HICx16 + Graphics

Intriguing... Hadn't heard those before. :???:

Jawed
01-Jan-2009, 16:09
New option to force ambient occlusion? Wonder how that works. Seems like a very application specific thing.
Ooh, interesting, wonder if they're doing screen space ambient occlusion? That could be quite widely applicable - erm, though my understanding of the algorithm is not exactly in-depth :lol:

Jawed

Cookie Monster
02-Jan-2009, 08:42
NVIDIA_DEV.0A20.01 = "NVIDIA D10M2-30
NVIDIA_DEV.06FF.01 = "NVIDIA HICx16 + Graphics

Intriguing... Hadn't heard those before. :???:

Think the first is the mobile version of GT200 hence the D10M2-30. The latter sounds like some sort of a bridge chip? nVIDIA... Hydra IC?? :lol:

Kaotik
02-Jan-2009, 17:27
Ooh, interesting, wonder if they're doing screen space ambient occlusion? That could be quite widely applicable - erm, though my understanding of the algorithm is not exactly in-depth :lol:

Jawed

Supported games:
Games supported by AO:
Assassin Creed
Bioshock
COD4
COD:WAW
CS:Source
Company of heroes
crysis (warhead not suported)
devil may cry 4
Fallout 3
Far Cry 2
Half-life 2: Episode Two (only!!)
Left 4 Dead
Lost Planet
Mirror's Edge
The Call of Juarez
World in conflict
World Of Warcraft

Neeyik did some testing over at FM forums
Crysis No AO - 19FPS
http://www.neeyik.info/pix/crysis%20with%20no%20driver%20AO.png
Crysis High AO - 9FPS
http://www.neeyik.info/pix/crysis%20with%20High%20AO.png
Bioshock no AO - 63 FPS
http://www.neeyik.info/pix/bioshocknoAO.png
Bioshock High AO - 14 FPS
http://www.neeyik.info/pix/bioshockHighAO.png
Fallout 3 no AO - 60 FPS
http://www.neeyik.info/pix/fallout3noAO.png
Fallout 3 High AO - 21 FPS

In short, 50-75% loss in FPS, but looks quite tasty gfx wise

Psycho
02-Jan-2009, 23:09
It's most probably SSAO as a pure post process and just darkening (+maybe some gain to compensate), so goodbye specular etc. And if the games don't write everything (particles etc) to the zbuffer it would be a problem. But if it works with the game it's a quite obvious driver thing to do for maybe a better IQ/performance compromise. A bit funny that Crysis (which pioneered the SSAO technique and doing it already) is one the list.
The performance hit is surprisingly high, must be a pretty big filter.
It seems to be vista only btw, so couldn't investigate it yet.

Salvadore
03-Jan-2009, 00:47
Does everybody knows everything about GT215?

XMAN26
03-Jan-2009, 01:46
What res was Crysis being played at? Was it 32 or 64bit? What game settings? What video card?

I run Vistau x64 with a eVGA GTX260 SC and get 40+ average @ 1680x1050 with game setting at high no AA in 64bit mode and 32 or so in 32bit mode with same settings. CPU is a Q6600@ 3.0 and 4GB ram.

Kaotik
03-Jan-2009, 02:24
I can't remember his exact setup, but I think the video card is 8800GTX

XMAN26
03-Jan-2009, 02:39
I can't remember his exact setup, but I think the video card is 8800GTX


If you can get his resolution and settings, I might be able to do a comparison with my rig.

CarstenS
03-Jan-2009, 08:45
Does everybody knows everything about GT215?

Actually, yes. http://forum.beyond3d.com/showpost.php?p=1253065&postcount=370

Salvadore
03-Jan-2009, 14:42
Actually, yes. http://forum.beyond3d.com/showpost.php?p=1253065&postcount=370

Thanks, but this I know too! :smile:
http://news.ati-forum.de/index.php/news/38-andere-hardware/219-nvidias-gt212215216-a-gt218-bald

CarstenS
03-Jan-2009, 15:02
Thanks, but this I know too! :smile:
http://news.ati-forum.de/index.php/news/38-andere-hardware/219-nvidias-gt212215216-a-gt218-bald
You didn't even bother to compare the information contained in AnarchXs Posting to the redundant info you're trying to spread, right?

edit:
In other words: There's no new substantial information in the link to your webpage.

a.tard.living.in.his.own
06-Jan-2009, 22:25
Supported games:
Games supported by AO:
crysis (warhead not suported)


wrong and btw. totally senseless:

ssao=1

http://img145.imageshack.us/img145/4406/crysis2009010623161268se4.th.jpg (http://img145.imageshack.us/img145/4406/crysis2009010623161268se4.jpg)

ssao=0

http://img145.imageshack.us/img145/9279/crysis2009010623161907bt6.th.jpg (http://img145.imageshack.us/img145/9279/crysis2009010623161907bt6.jpg)

MarcVenice
06-Jan-2009, 22:50
The GT215 codename also popped up here: http://vr-zone.com/articles/nvidia-40nm-mobile-gpus-line-up-for-2009/6378.html?doc=6378

Not sure if that was mentioned before. I disregarded it as a spelling error. What is an uneven number doing there? A mobile gpu is based on the desktop gpu, and no gt215 has been announced for the desktop. Untill it was ofcourse discored in the drivers no-one knew of it's existance.

Does someone have a conclusive answer ?

DegustatoR
07-Jan-2009, 12:04
It's interesting that there's no GT214 but GT215.
Maybe they're making a separate mobile GPU this time? A little bit slower and quite a bit cooler than desktop GT214...

KonKort
07-Jan-2009, 13:38
Here was speculated that a GT21x chip get to the market with 160 SPs and 40 TMUs. This is not 1/4 of GT212 (GT216) and not 1/2 of GT212 (GT214).

It is perhaps GT215, is not it? ;)

trinibwoy
07-Jan-2009, 19:25
If GT212 is indeed a 12-cluster part, and GT214 is half that why would they design and tape-out both a 5-cluster (GT215) and 6-cluster (GT214) chip?

I've always assumed in the back of my head that if GT212 is 12 clusters then the others would drop as follows:

GT212: 12 and 10 clusters, 320-384 SP's
GT214: 8 clusters, 256 SP's to go up against an RV770 refresh
GT216: 4 clusters, 128 SP's
GT218: 1/2 clusters 32/64 SP's

In which case if GT215 does exist it could be a 6 cluster, 192 SP part. Still seems very redundant though. Unless Nvidia drops a bomb and somehow squeezes 16 clusters into GT212 at 300mm^2.

AnarchX
07-Jan-2009, 20:02
Maybe:
GT214 - only GDDR5-MC
GT215 - GT214 with DDR2-GDDR3 MC
... because with booth it would be pad limited and be to big?

5 different chips GT212 to GT218 would be a bit to much or?

trinibwoy
07-Jan-2009, 20:30
Well AMD supported GDDR5 and GDDR3 on the same chip. Don't see why Nvidia couldn't as well.

Domell
07-Jan-2009, 21:07
Here was speculated that a GT21x chip get to the market with 160 SPs and 40 TMUs. This is not 1/4 of GT212 (GT216) and not 1/2 of GT212 (GT214).

It is perhaps GT215, is not it? ;)

How do you know what specs GT212 will have and GT214 is 1/2 of GT212 ; GT216 is 1/4 of GT212??

KonKort
07-Jan-2009, 21:47
Here you can read the likely specs of GT212 (http://www.hardware-infos.com/news.php?news=2629). I have got it from Nvidia.

I do not know if GT214 is 1/2 of GT212 and GT216 is 1/4 of GT212.
I am not even sure if the chips ever come. Perhaps only GT215.
Look in the past, G94 was 1/2 of G92 and G96 was 1/4 of G92. So I came to the following conclusion.

Domell
07-Jan-2009, 22:01
You are saying that someone from NVIDIA has told you GT212 real specs about 4-5 months before it`s release date? Everytime when some speces of their new GPUs were leaked it was about 3-4 weeks before it was released.

If you have got a internal source could you tell us when GT212 is supposed to be released? :)

BTW It would be great if this is real GT212 specs. Then it will be a real monster...until GT300 is released.

KonKort
08-Jan-2009, 15:54
GT212 will be released in 5-6 months - Q2/2009.
Yes, I can tell you very much, but not in a public forum. When I have got some good informations, I will post them at first on my site. And I have some good informations, I wait only until Phenom and GTX 285/295-hubbub will be finished. ;)

homerdog
08-Jan-2009, 16:21
Why wait?

Domell
08-Jan-2009, 16:32
Well, all seems great for everyone because GT212 is on the best way to be the second G92 if these specs are true but there is another very important thing - price. IF NVIDIA will price it about 300 $ then this chip could have the best price/performance ratio on the market like GF8800GT had a year ago.

I think with these specs GT212 (plus significantly faster clocks i think) has a chance to be faster than GTX285 about 20-30% :)

trinibwoy
08-Jan-2009, 17:24
It better be 50%+ faster with those specs.

Domell
08-Jan-2009, 17:54
Well, 50% faster in real world applications is a big thing. I mean 20-30% on average and sometimes could it be 50% as well.

This year is going to bring a huge performance bump between current top highend GPUs and GPUs which are supposed to be released in Q3/Q4 this year. If GT212 will be faster than GTX285 about 20-50% then i think that GT300 will be faster than GT212 at least 50% too (some major - or maybe totally new - architecture changes and improvements because of DX11 support).

G71-->G80 performance leap anyone? ;)

compres
08-Jan-2009, 18:11
G71-->G80 performance leap anyone? ;)

Not more than a snowball's chance in hell :)

Domell
08-Jan-2009, 18:31
Why? GT300 is most likely a totally new architecture or GT2xx architecture but with major improvements and as we know the biggest performance jumps are caused by introducing a new architecture. :)

suryad
08-Jan-2009, 18:33
Not more than a snowball's chance in hell :)

I agree. But from the specs it looks like it will be a monster performer. Cant wait to upgrade to that from my dual aging 8800 Ultras :)

KonKort
23-Jan-2009, 13:34
It looks like the first Nvidia-40-nm-chips are taped out. And of course these products are not GT212; first Nvidia starts with entry and mainstream chips his new 40nm-lineup.

Source: Fudzilla (http://www.fudzilla.com/index.php?option=com_content&task=view&id=11632&Itemid=1)

I am sure we will see the first cards in April.

CarstenS
23-Jan-2009, 13:55
So, that means Fudo's finally stumbled upon this thread? ;)

Arun
23-Jan-2009, 14:09
March was the *old* ETA, I think they missed that one already and it's very likely to be outdated info. April, who knows... :)

trinibwoy
23-Jan-2009, 17:49
Ah, so there is an ETA then?

suryad
25-Jan-2009, 02:16
March was the *old* ETA, I think they missed that one already and it's very likely to be outdated info. April, who knows... :)

I would like that since I think my Ultras have served me well but grown severely long in the tooth and its time I replaced them and laid them to rest.

KonKort
29-Jan-2009, 13:42
It looks like GT218 will be Nvidia's first 40nm chip. Probably with 32 SPs, 8 TMUs, 4 ROPs and 64 Bit memory interface.

Source: Hardware-Infos (http://www.hardware-infos.com/news.php?news=2700&sprache=1) (English)

PS: I hope it is the right thread for that information.

Lukfi
29-Jan-2009, 13:50
Any sources to back that specs up or just speculation? I find these unlikely, since that would mean a physical 4:1 ALU:TEX ratio while G8x/G9x all have 2:1 and G200 has 3:1. Judging from ATI's current lineup, I guess that higher ALU:TEX ratios are desirable in the enthusiast segment, while lower ratios better fit low-end chips. So unless nVidia developed a SIMD block with 16 SPs and 4 ALUs (or 32:8 since G200 has 24:8), which they'd use for every 40nm GT21x chip coming (which might just be the case), I think it's not likely GT218 having those specs.

trinibwoy
29-Jan-2009, 13:56
A 32 SP, 8 TMU 4:1 cluster is what many people including myself expect from GT21x. So these specs make complete sense in that they follow the historical trend of single cluster entry level parts.

Nvidia uses the same 2:1 ratio across all their G9x parts. It's only ATi that switched things up with RV710 in order to get 8 TMUs. A single RV7xx cluster is 80/4 and they wanted 80/8 so had to go with two 40/4 clusters.

DegustatoR
29-Jan-2009, 13:58
I find these unlikely, since that would mean a physical 4:1 ALU:TEX ratio while G8x/G9x all have 2:1 and G200 has 3:1. Judging from ATI's current lineup, I guess that higher ALU:TEX ratios are desirable in the enthusiast segment, while lower ratios better fit low-end chips.
ALU:TU ratios are higher than they should be in case of AMD's top range chips. 4:1 ratio through the whole GT21x line is quite possible.
That gives us 384/96 GT212, 256/64 GT214, 192/48 GT216 and 128/32 or 96/24 GT218 (somehow i doubt that one 4:1 TPC GPU is going to be feasable at all on 40nm so this 32/8/4 for GT218 is highly unlikely i guess).

KonKort
29-Jan-2009, 14:11
384/96 GT212

Right.

256/64 GT214, 192/48 GT216 and 128/32 or 96/24 GT218

Wrong, wrong and wrong or wrong. IMHO. :lol:

Arun
29-Jan-2009, 14:25
I'll honestly be rather disappointed unless it's a 8 TMU/40 SP ratio with only one SFU unit per multiprocessor (instead of two). But oh well, what hasn't disappointed me in NV's roadmap lately anyway? ;)

DegustatoR
29-Jan-2009, 14:34
Wrong, wrong and wrong or wrong. IMHO. :lol:
At least 192/48 is right whether it's GT216 or not :p
As for the GT218 being 32/8/4 -- G98 being 16/8/4 on 65nm has 86mm^2 die size. It would have ~30mm^2 die size on the 40nm. Even if there will be 2 times more ALUs the die size probably won't exceed 50mm^2. And I'm not sure you can have even 64 bit bus on a die this small. Plus i'm hearing that they'll be more aggressive with packing transistors on 40nm. So i'm thinking that the lowest GT21x chip will probably have at least two TPCs.

KonKort
29-Jan-2009, 14:50
30mm^2? Well, that sounds to good. :smile:

(45˛/65˛) * 0,92˛ = 0,4057
86mm^2 * 0,4057 = 34,89 mm^2

So, that shows a perfect shrink from 65 to 40nm by G98. But do not forget: GT218 will have 4x more SPs and every SP needs more transistors than G9x. Besides GT2xx includes other new features, who uses transistors (I do not mean the DP-ALUs ;))

I do not know the exactly die size. I estimate 45-65mm^2. I hope this is big enough to use a 64 bit memory interface. ;)

DegustatoR
29-Jan-2009, 14:57
GT218 will have 4x more SPs and every SP needs more transistors than G9x.
Why is that?
And 32 is only 2 times more than 16. 4 times more SPs than G98 gives us 64/16/4 GT218 which i can believe in.

Besides GT2xx includes other new features, who uses transistors (I do not mean the DP-ALUs ;))
What features could this be?

KonKort
29-Jan-2009, 15:07
Why is that?
And 32 is only 2 times more than 16. 4 times more SPs than G98 gives us 64/16/4 GT218 which i can believe in.

G98 has got only 8 SPs. 0.5 shader cluster.


What features could this be?
Well, there are many changes like doubling the cache between the streaming multiprocessors, MUL can use for general shading, too, better thread scheduler, more efficent TMUs (93 instead of 76% using theoreticly performance), better blending performance in ROPs, 2D save mode and so and so on...

DegustatoR
29-Jan-2009, 15:16
G98 has got only 8 SPs. 0.5 shader cluster.
I thought this was G86?

Well, there are many changes like doubling the cache between the streaming multiprocessors, MUL can use for general shading, too, better thread scheduler, more efficent TMUs (93 instead of 76% using theoreticly performance), better blending performance in ROPs, 2D save mode and so and so on...
That's GT200 compared to G92. Who said that GT21x will have the same changes that GT200 had? Who said that GT21x low end will have the same changes as GT200 top-end? Plus -- most of this has nothing to do with ALUs, it's control logic and on-chip memory changes.

trinibwoy
29-Jan-2009, 15:22
G98 has got only 8 SPs. 0.5 shader cluster.

Doesn't G98 have 16 SPs just like G86, except 1 SIMD is disabled for 8400GS parts? I haven't seen any references to the contrary.

I'll honestly be rather disappointed unless it's only one SFU unit per multiprocessor (instead of two).

Why?

Arun
29-Jan-2009, 15:23
G98 is 4 TMUs/16 SPs. As for those scaling numbers, uhhh, are you guys high? :D Small chips have sufficiently high ratios of analogue & I/O that scaling isn't anywhere near that good...

KonKort
29-Jan-2009, 15:26
I thought this was G86?

No, G86 has got 16 SPs. ;)

That's GT200 compared to G92. Who said that GT21x will have the same changes that GT200 had? Who said that GT21x low end will have the same changes as GT200 top-end?

I only have listed those news, where I am convinced that they are also in the entry segment encounter.

Can we agree that the GT218 with the same number of processing units need more transistors than G9x? That was my only intention. ;)

G98 is 4 TMUs/16 SPs. As for those scaling numbers, uhhh, are you guys high? Small chips have sufficiently high ratios of analogue & I/O that scaling isn't anywhere near that good...

G86: 16 SPs, 8 TMUs, 8 ROPs, 128 Bit
G98: 8 SPs, 8 TMUs, 4 ROPs, 64 Bit
G96: 32 SPs, 16 TMUs, 8 ROPs, 128 Bit

Arun
29-Jan-2009, 16:50
G86: 16 SPs, 8 TMUs, 8 ROPs, 128 Bit
G98: 8 SPs, 8 TMUs, 4 ROPs, 64 Bit
G96: 32 SPs, 16 TMUs, 8 ROPs, 128 BitYes, no, yes. G98 is the same config as MCP78, fwiw...

KonKort
29-Jan-2009, 17:11
G98 based products are 8400 GS (not all), 9300 GE and 9300 GS and all have got only 8 SPs.
Have you good a link, who shows that G98 has got 16 instead of 8 SPs? By Wikipedia.de (http://de.wikipedia.org/wiki/Geforce_9) G98 has got 8 SPs, too.

XMAN26
29-Jan-2009, 17:19
G98 based products are 8400 GS (not all), 9300 GE and 9300 GS and all have got only 8 SPs.
Have you good a link, who shows that G98 has got 16 instead of 8 SPs? By Wikipedia.de (http://de.wikipedia.org/wiki/Geforce_9) G98 has got 8 SPs, too.

http://laptoping.com/nvidia-geforce-9500m-gs-9300m-g.html
http://en.wikipedia.org/wiki/GeForce_9_Series#GeForce_9200GS.2F9400GT

Arun
29-Jan-2009, 17:22
Woah, you're actually right - I apologize. G98 is 8 TMU/8 SP, and MCP78 is 4 TMU/16 SP. And MCP79 is 8 TMU/16 SP. And they all have differences beyond that, too! Hopefully the genius who came up with NV's roadmap in that timeframe has already been fired by now... Especially given the other consequences that insanity had, which I won't go into here.

XMAN26: Those are, believe it or not, rebranded G84/G86s. And now that I think about it, it's indeed mobile GPU branding which confused me too, I suspect (along with MCP78)...

DegustatoR
30-Jan-2009, 11:21
...The chip itself is using 29mm packaging...
What? (http://www.fudzilla.com/index.php?option=com_content&task=view&id=11755&Itemid=34)
Anyway, if Fuad's right then GT216 looks like low end chip. (And GT218 is probably a 'zero end' chip then...)

Jawed
30-Jan-2009, 11:56
29mm per side for the package upon which the die is mounted? Do we know the sizes for prior GPUs in this class, e.g. G96?

http://www.gpureview.com/show_cards.php?card1=574&card2=513

Jawed

KonKort
30-Jan-2009, 14:49
Yes.

G96: 144 sqmm
G96b: 119 sqmm

Silent_Buddha
30-Jan-2009, 15:25
That can't possibly be 29mm per side...that'd be absolutely Huge. 841 sqmm?

Regards,
SB

Jawed
30-Jan-2009, 15:26
I'm talking about the packages, not the dies! Just a minor curiosity, to find out if the cards I linked have packages that are 29mm on each side.

Jawed

Lukfi
30-Jan-2009, 15:49
You mean that the package size hints 128-bit bus? With GDDR5, why not...

DegustatoR
30-Jan-2009, 16:02
You mean that the package size hints 128-bit bus? With GDDR5, why not...
GDDR5 on the card cheaper then $100? Not likely for another year or so.

Lukfi
30-Jan-2009, 17:24
=>Jawed: I did some quick measuring on GF 8600 and 9500 photos I found on the net, using PCI Express slot length for reference. G96 package seems to be about 30 mm, G84 came out to ~35 mm.

Arun
30-Jan-2009, 17:58
I wouldn't read too much into package sizes, although it would definitely seem to exclude a 200mm˛ chip... I don't think we can conclude much about the number of TMUs based on this either, since the difference between 24 and 32 TMUs on 40nm is literally ~7mm˛(!!!) - this number comes from TMUs being ~1/4th of GT200, which has 80 TMUs on a 583mm˛ die on 65nm, so ~145mm˛ which becomes <7mm˛ on 40nm.


I'm still betting on 192-bit GDDR3/32 TMUs/128-160 SPs personally. The only truly strange factor would be the 45W TDP according to Fudzilla, although perhaps that is just for the chip and doesn't include the DRAM... In which case it'd still be <75W, which is nice.

AnarchX
02-Feb-2009, 08:51
Core i7 DTR monster-notebook (11.8 lbs) with GT212-based G280 DDR5?

Launch Date May 1, 2009

Special Feature(s) Intel i7 Processor; RAID 0/1/5 support; nVidia G280 DDR5

Video and Graphics
Video Memory/Type 1GB GDDR3/GDDR5
Video Architecture/Chipset 16xPCI-E; Nvidia G280; 9800GTX; Quadro FX3700
http://www.eurocom.com/products/future/specselectfuture.cfm?model_id=201

Blazkowicz
02-Feb-2009, 17:25
it makes no sense to me anyway, that laptop is going to melt or severy burn you.

Lukfi
02-Feb-2009, 17:28
I'm still betting on 192-bit GDDR3/32 TMUs/128-160 SPs personally.
Isn't that quite a lot, if we're talking about GT216? If the "codename +2 = half specs" system is still valid, then GT212 would have to have approximately four times the units - 512-640 SPs and ~128 TMUs.
The specs you're proposing better fit GT214.

Arun
02-Feb-2009, 18:04
Isn't that quite a lot, if we're talking about GT216? If the "codename +2 = half specs" system is still validI suspect it roughly is (remember G98, for example, isn't quite exactly 1/2th G96 either), but more in terms of bus width than the number of SPs this time around; i.e. 192-bit GDDR3->384-bit GDDR3->384-bit GDDR5 for example... Of course, roadmaps can change and what was very simple and seemingly elegant on a piece of paper can turn out quite differently in the end.

Another argument in favour of GT216 having 32 TMUs is that both G84 and G96 had 16 TMUs; therefore if they sticked to the same kind of nomenclature, you'd expect GT216 to be more similar to G94 - which certainly had 32 TMUs, didn't it?

In the end, I wouldn't expect codenames to be such a reliable indicator of, well, anything at all - so we'll have to wait until more reliable leaks come out, IMO.

Lukfi
02-Feb-2009, 18:36
I suspect it roughly is (remember G98, for example, isn't quite exactly 1/2th G96 either), but more in terms of bus width than the number of SPs this time around; i.e. 192-bit GDDR3->384-bit GDDR3->384-bit GDDR5 for example... Of course, roadmaps can change and what was very simple and seemingly elegant on a piece of paper can turn out quite differently in the end.
Those bus widths sound quite plausible, except for one thing: there was only one chip in history with an unconventionally wide bus, that being G80. Other than that, all chips had 128, 256 or 512 bits, although for G92 a wider interface would have made sense.
Another argument in favour of GT216 having 32 TMUs is that both G84 and G96 had 16 TMUs; therefore if they sticked to the same kind of nomenclature, you'd expect GT216 to be more similar to G94 - which certainly had 32 TMUs, didn't it?
Not necessarily. I think that number will depend on whether nVidia keeps the 3:1 physical ALU:TEX ratio from G200, or goes for 4:1 as you suggested with the 128 SPs, 32 TMUs educated guess. So what if TMUs are not the reference point? Maybe SPs aren't either.
So GT216 could end up having 24 TMUs (it's still an increase) 72-96 SPs. Not an asphalt grinder in Q3'09, but it's value chip anyway.
Then GT218 could have exactly half these specs (36-48 SPs, 12 TMUs, 64-bit bus) and GT214 double them, like 144-192 SPs, 48 TMUs, 192 or 256-bit bus.
I don't know whether such specs are possible for those interface widths, but the 384 SPs/96 TMUs rumour has been around for some time concerning GT212.

And don't think I don't hate having to guess from codenames and package sizes, but it seems we've hit a drought this season, so I'm quite happy even for the Fudzilla info.

Arun
02-Feb-2009, 19:14
Sigh, why don't people get it? 384 SPs/96 TMUs is probably wrong. It was one of the original rumours for GT200, which also claimed G80-Ultra was a 80nm chip and GT200 would be 55nm, IIRC. The odds that it suddenly becomes spot-on one generation later aren't all that high IMO, although it's obviously not impossible either...

I don't necessarily disagree with you on anything (except for the fact NV's TMUs are grouped in groups of 8 nowadays and so they're unlikely to have a chip with 12), I'm just constantly shocked at how dubious the foundations of most discussions are in the speculation thread nowadays. If a rumour is probably wrong, what's the point of basing your analysis on it instead of, you know, your own speculation that you can try to justify somehow? It probably won't be right, but at least it won't necessarily be wrong.

trinibwoy
02-Feb-2009, 19:53
Sigh, why don't people get it? 384 SPs/96 TMUs is probably wrong.

I'm harboring some pretty optimistic wishes for GT212 but that's all going to come down to how much Nvidia has improved their transistor density. The other thing to consider is that DX11 is obviously going to be more expensive per flop so they can't do too much with GT212 without causing collateral damage to GT300.

Lukfi
02-Feb-2009, 20:29
Alright, let's forget the rumours. What do you think of these?

GT212 | 288 SPs | 96 TMUs | 4 ROP blocks, 256-bit interface, GDDR5
GT214 | 144 SPs | 48 TMUs | 3 ROP blocks, 192-bit interface
GT216 | 72 SPs | 24 TMUs | 2 ROP blocks, 128-bit interface

I'm basing it on :yep2:
- the ALU:TEX ratio being carried over from G200, as it was carried over from G8x to G9x
- TMU count still being higher compared to G92/G94/G96, even though not two times higher
- GT214 and GT216 could produce similar performance as G92 and G94
I'm not sure about :embarrased:
The bandwidth needs of such chips, the viability of proposed interface widths on 40nm, and DDR3/GDDR3/GDDR5 prices compared to the cost of adding a wider interface.

The initial 40nm line-up would include only GT214, GT216 and perhaps GT218 - these would obsolete G92, G94, G96 and G98. G200b would stay for some time, to be later obsoleted by GT212. GT212 would be launched when the 40nm process is mature enough so it makes sense instead of 55nm for G200b.

What I don't like about my speculative roadmap is the gap between GT214 and GT212. Maybe GT212 could be toned down a bit, or GT214 toned up - probably based on what die size will nVidia need to put the pads on for their chosen bus width. And there used to be an even greater gap between G80 and G84...

Now about GT218. You're right that the basic building block is 8 TMUs + 16 SPs on G8x/G9x and 24 SPs on G200. I wonder, though… several sources claim the Quadro NVS 420 to have two times 8 SPs - is that possible or is it a mistake?

Anyway, GT218 would be more of a video decoding GPU than a low-cost gaming GPU, so it could very well have just one SIMD:
GT218 | 24 SPs | 8 TMUs | ???
The problem is, I'm not sure whether the chip could support a 64-bit interface. 32-bit with GDDR5 would probably create sufficient bandwidth, but nobody will stick expensive GDDR5 onto a $50 card and 32-bit bus sounds very unlikely. It would only make sense if GDDR5 was the standard choice for GT21x cards and nVidia designed the ROP/MC blocks with 32-bit channels (along with something similar to ATI's ring-bus or hub, so the crossbar doesn't get huge), to offset GDDR5's slower command rate. But something tells me that won't be the case and the majority of the cards will use GDDR3, those will be exchanged for DDR3 as time goes, and GDDR5 will be used only where really needed.

Jawed
02-Feb-2009, 20:55
G94 was such a stupid configuration in comparison with G92 that it does make a mockery of any speculation.

http://www.gpureview.com/show_cards.php?card1=557&card2=548

Jawed

Lukfi
02-Feb-2009, 21:17
Why? It has half the SPs and TMUs and looks terribly underpowered on paper, but a 9600 GT comes very close to 8800/9800 GT in real scenarios. So G92's memory capacity/bandwidth limitation seems more stupid, in my lame opinion.

trinibwoy
03-Feb-2009, 00:22
Gotta go with Lukfi here. The 9600GT was and still is a great card. It's G92 that was bandwidth starved.

I'm still holding out for 4:1 on GT2xx though.

Jawed
03-Feb-2009, 00:49
The two are so close in performance (most games seem mostly dominated by ROP/BW) that it looks like a mistake - I shouldn't have implied that 9600GT's design was inherently broken, it's the line-up formed by these chips that looks farcical. Remember the smoke'n'mirrors with GSOs and GSs and umpteen variants of G92...

8800GT was great value before Christmas 2007 and 9600GT was even better when it came out a few months later, just making the more expensive NVidia GPUs really poor value.

Though it's tempting to speculate NVidia will continue the farce, it's pretty unlikely. Isn't it?

Anyway, Lukfi, the stratification you've drawn up seems reasonable to me

Jawed

DegustatoR
03-Feb-2009, 09:07
G92 was the really stupid configuration. They shoud've gone with 192 SP / 64 TU / 384-bit GDDR3 instead of G92.

no-X
03-Feb-2009, 09:13
The line-up was strange, but very competitive. I still believe that GF8800GT was more a marketing tool, than real product. It was hardly available and GF9600GT appeared sooner than nVidia "solved" this issue. It seems, that GF9600GT was simply late and nVidia needed anything to catch users eye - otherwise RV670 would have no competitor.

I think GF8800GT was initially purposely limited product - it was aimed to hold place for GF9600GT (which was much cheaper to produce, but late)

GF8800GT was very good for many brands - they sold them to suppliers under the condition, that they'll buy their overpriced 8400/8600, which nobody wanted.

It worked well.

Lukfi
03-Feb-2009, 09:24
8800GT was great value before Christmas 2007 and 9600GT was even better when it came out a few months later, just making the more expensive NVidia GPUs really poor value.

Though it's tempting to speculate NVidia will continue the farce, it's pretty unlikely. Isn't it?
The lineup was a farce because G92 was BW/memory starved. As I said, I have no idea about the number of ROPs and memory throughput needed for the GT21x chips I'm speculating about, so I expect nVidia not to make the same mistake as with G92 and use an adequate amount of memory, adequately fast.

Arun
03-Feb-2009, 13:47
Alright, let's forget the rumours. What do you think of these?

GT212 | 288 SPs | 96 TMUs | 4 ROP blocks, 256-bit interface, GDDR5
GT214 | 144 SPs | 48 TMUs | 3 ROP blocks, 192-bit interface
GT216 | 72 SPs | 24 TMUs | 2 ROP blocks, 128-bit interfaceMy first thought is that line-up is too dense; if you looked at the die sizes and board costs of GT214/GT216, I'm not 100% sure it'd be worth the trouble.

I really get the impression most people are still underestimating 40nm. It is from a density, performance and power perspective a very important process node - this is, however, compensated by greater wafer pricing increases than historical normals.

I also get the feeling most people are overestimating the die size of NVIDIA's SPs; you do realize they only take 25% of GT200's die size, right? So let's say ~150mm˛ for 240 SPs, or ~5mm˛ per group of 8 SPs. On 40nm, that goes down to less than 2.5mm˛... So the difference between 72 SPs and 96 SPs is certainly less than 8mm˛. Given the likely performance boost, does it really make sense to keep the SP ratio so low?

I would argue that it makes no sense at all. Of course, NVIDIA's 65nm line-up didn't make much sense either, so I fully understand people's skepticism.

Regarding G94, it's easy to say it doesn't make sense but the problem is if you cut the memory bus down to 192-bit, you had to use either 384MiB or 768MiB of DRAM; the former is too little in that market segment, the latter was too much in that timeframe. Fun stuff! :) I still do believe the most significant 'basic' mistake of the G9x line-up by far is not going for a 320-bit memory bus on G92, though.

Now about GT218. You're right that the basic building block is 8 TMUs + 16 SPs on G8x/G9x and 24 SPs on G200. I wonder, though… several sources claim the Quadro NVS 420 to have two times 8 SPs - is that possible or is it a mistake?It's not a mistake, G86 was 8 TMUs/16 SPs, MCP78 was 4 TMUs/16 SPs, but amusingly G98 was 8 TMUs/8 SPs as discussed in a recent thread (and very much to my surprise). MCP7A is 8 TMUs/16 SPs... There also was a SKU of G86 way back in the day that had 8 of its 16 SPs disabled.

---

BTW, one small comment - I think we're all assuming NV can't easily change the TMU-SP ratio in GPUs since, except for G98/MCP78, they seemingly never did so. I'm not sure that's right, and if so it'd make speculation about the ratio in different chips much more complex...

Lukfi
03-Feb-2009, 14:19
I still do believe the most significant 'basic' mistake of the G9x line-up by far is not going for a 320-bit memory bus on G92, though.
You'll get no argument from me there. If G80 had the same SP/TMU count and was designed with a 384-bit interface, there were bound to be problems with G92. Will nVidia make the same mistake twice?
It's not a mistake, G86 was 8 TMUs/16 SPs, MCP78 was 4 TMUs/16 SPs, but amusingly G98 was 8 TMUs/8 SPs as discussed in a recent thread (and very much to my surprise). MCP7A is 8 TMUs/16 SPs... There also was a SKU of G86 way back in the day that had 8 of its 16 SPs disabled.

BTW, one small comment - I think we're all assuming NV can't easily change the TMU-SP ratio in GPUs since, except for G98/MCP78, they seemingly never did so. I'm not sure that's right, and if so it'd make speculation about the ratio in different chips much more complex...
Hmm, my speculation was quite heavily dependent on the constant SP:TMU ratio. I didn't know that G98 and the MCP78 IGP were different. But they never did change the ratio on any other chips, that's strange. ATI has different ALU:TEX ratios for different market segments, so I guess this approach does make sense.

Unfortunately, without at least the SP:TMU ratio, we have close to nothing to fall back on :frown:

INKster
03-Feb-2009, 15:50
Could this be a GT218 ? (http://vr-zone.com/articles/nvidia-gt218-card--specs-surfaced/6529.html?doc=6529)

Arun
03-Feb-2009, 16:18
Interesting, based on the PCIe slot that package is indeed 23mm˛, which is what Fudzilla claimed in this post: http://www.fudzilla.com/index.php?option=com_content&task=view&id=11715&Itemid=34

So at least the two match. "The core is clocked at 550MHz and shader clock at 1375MHz" though? what the hell? Not only is this surprisingly low, these are the EXACT same clocks as the G96b-based G9400GT. Very dubious indeed...

KonKort
03-Feb-2009, 18:08
That's okay. GT218 has got compare to G98 32 instead of 8 SPs, so the shader power is much higher (132 instead of 34 Gflops).
The TMU/ROP performance is as good as G98 in theoreticly. But GT2xx has got several optimizing at these units.

I will remember: 22 W TDP. ;)

CarstenS
03-Feb-2009, 19:35
Interesting, based on the PCIe slot that package is indeed 23mm˛, which is what Fudzilla claimed in this post: http://www.fudzilla.com/index.php?option=com_content&task=view&id=11715&Itemid=34

So at least the two match. "The core is clocked at 550MHz and shader clock at 1375MHz" though? what the hell? Not only is this surprisingly low, these are the EXACT same clocks as the G96b-based G9400GT. Very dubious indeed...
23mm˛? Are you really sure, Nvidia did dare to make four small round GPUs and positioned them around a nearly quadratic centerpiece which only serves stability reasons? At least, IF i read VR-Zones diagramm correctly. ;)

*SCNR*

BTT: In this matter i concur with konkort that they must have realized by now the importance of shaders, thus having made the switch from the 2:1 Ratio of G80 and the like to 4:1 on GT21x with G(T)2x0 being an intermediary with 3:1.

It'd only make sense after all, given the massive amount of die-space their scheduling logic and associated bulkhead takes, to go for the slightly bigger die and try and outperform AMD. It's not like they got any other choice, given the enourmous FLOPS/mm˛ AMD already has achieved with their 2nd 55nm generation.

Arun
03-Feb-2009, 19:45
23mm˛? Are you really sure, Nvidia did dare to make four small round GPUs and positioned them around a nearly quadratic centerpiece which only serves stability reasons? At least, IF i read VR-Zones diagramm correctly. ;)Sorry, I meant 23mm, not 23mm˛. So 23x23; this is the package size, not the die size.

ninelven
03-Feb-2009, 19:55
thus having made the switch from the 2:1 Ratio of G80 and the like to 4:1 on GT21x with G(T)2x0 being an intermediary with 3:1. 4:1 or 5:1 on GT21X wouldn't surprise me... though I'd be a little disappointed with the former. Where I will be really disappointed is if GT3xx isn't 6:1 (which is where Nvidia said they were going 5 years ago... and still haven't made it).

suryad
03-Feb-2009, 19:56
What interests me is that I dont recall Nvidia ever releasing so many different products in such a short time. I mean we have the 285, 295, and the upcoming 212 and 218 etc ...it seems like they are flooding market with a lot of products with not much that is different between them. I dont think there was a precedent like this before....

Lukfi
03-Feb-2009, 19:58
=>ninelven: What you are talking about are physical SP vs. TMU counts, but that way you're ignoring that SPs are running at a much higher frequency. So while for example G92 is physically 2:1, effectively it is >4:1.

=>suryad: I think G200b was delayed a bit. And the rest is closely connected with the availability of 40nm, which seems to be kind of a holy grail these days...

Jawed
03-Feb-2009, 20:10
Sorry, I meant 23mm, not 23mm˛. So 23x23; this is the package size, not the die size.
I failed miserably trying to explain this earlier - hopefully peeps get it now!

Jawed

CarstenS
03-Feb-2009, 20:12
Sorry for failing to be a bit funny. :(
Of course, i know that Arun was talking about package.

Jawed
03-Feb-2009, 20:20
It'd only make sense after all, given the massive amount of die-space their scheduling logic and associated bulkhead takes, to go for the slightly bigger die and try and outperform AMD. It's not like they got any other choice, given the enourmous FLOPS/mm˛ AMD already has achieved with their 2nd 55nm generation.
GTX260-216 performs quite happily against HD4870 despite its FLOP shortfall.

Oh, and I don't include you in that group of peeps who didn't get the package thing! :lol: I thought that was quite funny.

Jawed

Arun
03-Feb-2009, 20:21
Sorry for failing to be a bit funny. :(
Of course, i know that Arun was talking about package.hah! :) Well that's partially my fault, given how many people seriously make that mistake all the time (and then tons of people don't spot it) you can't blame me for being a tad paranoid here... ;) I did realize you were kidding about the latter part (lol @ quadratic reasons), but I kinda assumed you were implying my measurement might be wrong. Oops.

ninelven
03-Feb-2009, 20:24
What you are talking about are physical SP vs. TMU counts, but that way you're ignoring that SPs are running at a much higher frequency. So while for example G92 is physically 2:1, effectively it is >4:1. No. I think I know what I am talking about. When Nvidia made this statement, they were referring to 3 Vec4's per TMU (equal clocks). This would be equivalent to 12 SPs per TMU if they were clocked the same, but I took into account that the SPs being clocked 2x higher, which is why I said and meant 6:1.

trinibwoy
03-Feb-2009, 22:57
Why is 3x4 = 8x1 ?

ninelven
03-Feb-2009, 23:36
Would you prefer a Flops to Texture Fill Rate comparison? I think you will end up in the same place...

CarstenS
04-Feb-2009, 08:42
GTX260-216 performs quite happily against HD4870 despite its FLOP shortfall.
They do. But I was extrapolation the given routes for both. If NVidia and AMD were just adding further (40nm-shrinked) SIMDs like there was no tomorrow, AMD would surely outperform Nvidia and with a much smaller chip to boot.

So my take would be - disregarding any IPC improvements in the individual SMs - that Nvidia is bound to increase not only the total # of FLOPS but also the FLOPS per TPC/clk.

Jawed
04-Feb-2009, 09:41
So my take would be - disregarding any IPC improvements in the individual SMs - that Nvidia is bound to increase not only the total # of FLOPS but also the FLOPS per TPC/clk.
Arguably GT21x GPUs were designed before the shock and awe of RV770 hit, so what are the chances NVidia will respond specifically by increasing ALU:TEX?

Also, like G71 was to G70, NVidia will, if anything, be concentrating on performance per mm and per watt, not absolute performance. Trying to deliver the most performance at each price point, but trying to avoid making GPUs larger than necessary.

Overall it seems to me that increasing ALU:TEX in the shadow of GT300 is unlikely. Sure there was prolly supposed to have been a longer gap between GT21x stragglers and GT300 than the single quarter it looks like it'll be, but I still think GT300's shadow is long enough to de-prioritise ALU:TEX increases.

Also, NVidia's architecture gets a free, implicit, ALU:TEX ratio increase from the more complex shaders because the less texturing-per-clock in a shader the more cycles are left over for math, due to the multifunction interpolator's duty to generate texel coordinates.

NVidia's recent problem has been its GPUs are much larger than the bus-size requires (excluding GT200 which appears to be that big solely because of its 512-bit bus), e.g. G94's 240mm2 is quite a bit larger than the ~190mm2 required for a 256-bit bus (subject to the vagaries of power pads...). I imagine NVidia would prefer to deliver smaller chips for a given bus size and 40nm is the node to do it, if ever there was. Additionally, most of GT21x could have an 18-month lifetime, like G92, which increases the importance of minimal die size.

Jawed

Lukfi
04-Feb-2009, 09:42
No. I think I know what I am talking about. When Nvidia made this statement, they were referring to 3 Vec4's per TMU (equal clocks). This would be equivalent to 12 SPs per TMU if they were clocked the same, but I took into account that the SPs being clocked 2x higher, which is why I said and meant 6:1.
I am pretty sure CarstenS was talking about the SP:TMU ratios, which is 2:1 on G8x/G9x, 3:1 on G200 and he expects it to be 4:1 on GT21x. Since you were quoting his words, I supposed (and probably everyone else did as well) you're talking about the same thing. I still don't see how the 6:1 number is relevant here...
NVidia's recent problem has been its GPUs are much larger than the bus-size requires (excluding GT200 which appears to be that big solely because of its 512-bit bus)
Wasn't R600 smaller than G200b?

trinibwoy
04-Feb-2009, 14:37
Would you prefer a Flops to Texture Fill Rate comparison? I think you will end up in the same place...

Sure, that's what I've been doing since G80 dropped. But even that is still a very rough estimation as different architectures exhibit different levels of effective flops throughput in real texturing scenarios.

DegustatoR
04-Feb-2009, 14:46
Alright, let's forget the rumours. What do you think of these?

GT212 | 288 SPs | 96 TMUs | 4 ROP blocks, 256-bit interface, GDDR5
GT214 | 144 SPs | 48 TMUs | 3 ROP blocks, 192-bit interface
GT216 | 72 SPs | 24 TMUs | 2 ROP blocks, 128-bit interface
It starts to look like this:

GT218 | 32 SP / 8 TU | 64-bit DDR3
GT216 | 96 SP / 24 TU | 128-bit G/DDR3
GT214 | 192 SP / 48 TU | 192-bit GDDR3
GT212 | 384 SP / 96 TU | 256-bit GDDR3/5

With GT212 covering everything above $200 with GDDR3 and GDDR5 and 2-chip AFR configs.
Not very impressive from where i'm standing. The only really good GPU in such line-up would be GT212, everything below it are just cost-cutting versions of G9x chips with less texturing power and hotter and noisier because of this.
I hope that's not how it'll turn out to be in the end =(

Jawed
04-Feb-2009, 14:59
Wasn't R600 smaller than G200b?
Thinking about it, GT200b is smaller but still has a 512-bit bus, so how big is GT200b?

Jawed

Arun
04-Feb-2009, 15:15
Quick point - why is everyone forgetting about GT215? :o There are two ways to consider that chip: either they want a 5-chip line-up (wtf?) or they canned one or two of the four chips and replaced that with a new one. I'd bet on the latter...

My current guess (it changes about every 30 minutes :P) is their line-up will look roughly like this, using DegustatoR's nomenclature for simplicity's sake:
GT218: 32 SP / 8 TU | 64-bit G/DDR3 | 0 DP & 1 SFU per SM [Early Q2]
GT216: 120 SP / 24 TU | 192-bit GDDR3 | 1 DP & 1 SFU per SM [Late Q2]
GT215: 320 SP / 64 TU | 256-bit GDDR5 | 1 DP & 1 SFU per SM [Q3]
GT300: 512-bit GDDR5 [Q4]

I know that might sound counter-intuitive for GT215, but I'd argue after a certain point the goal is to confuse your competitor more than anything else - especially with late roadmap changes! AMD has done an amazing job misinforming NV about their late changes, so I wouldn't exclude anything on either side. Plus, maybe someone at NV is superstitious and didn't want a chip that would be half-way between GT214 & GT212 performance to be named GT213... ;)

Jawed: R600 was ~420mm˛, GT200b is ~470mm˛ + NVIO...

KonKort
04-Feb-2009, 15:29
GT218: 32 SP / 8 TU | 64-bit G/DDR3 | 0 DP & 1 SFU per SM [Early Q2]
GT216: 120 SP / 24 TU | 192-bit GDDR3 | 1 DP & 1 SFU per SM [Late Q2]
GT215: 320 SP / 64 TU | 256-bit GDDR5 | 1 DP & 1 SFU per SM [Q3]
GT300: 512-bit GDDR5 [Q4]


Right, wrong, wrong, right.

How do you get to 120 SP/24 TU or 320 SP/64 TU? Do you think 40 SPs, 8 TUs per cluster or 20 SPs, 4 TUs per cluster?
I know it is wrong but I am interesting how you thinking. ;)

Jawed
04-Feb-2009, 15:34
Jawed: R600 was ~420mm˛, GT200b is ~470mm˛ + NVIO...
So GT200b is about 2.5mm on each side smaller than GT200 - about 21.5x21.5mm versus 24x24mm. Looking at the die picture for GT200 -65nm - it doesn't look like there's much room to shave off the sides due to the sheer quantity of what appears to be the GDDR3 physical interface.

Clearly, with a rectangular die the area could be much smaller for a given perimeter (bus width) - but GPUs with more than a 64-bit bus seem to be square-ish as a rule.

Jawed

DegustatoR
04-Feb-2009, 15:45
Quick point - why is everyone forgetting about GT215? :o There are two ways to consider that chip: either they want a 5-chip line-up (wtf?) or they canned one or two of the four chips and replaced that with a new one. I'd bet on the latter...
There is another possibility which i consider to be more probable: some of these 5 GT21x chips will show up only as mobile GPUs.
And GT215 is probably a chip with 128-bit GDDR3/5 and with 4 or 5 or 8 TPCs -- probably 8 because GT212 isn't the best candidate for notebook usage and GT214 if it's really 6TPC/192-bit with GDDR3 only configs in mind is a bit crap for notebooks also.
Hey, maybe GT215 is a late reaction to RV740? 8)

And i still think that GT300 with 512-bit GDDR5 is a bit extreme.

DegustatoR
04-Feb-2009, 15:47
[delete this]

Jawed
04-Feb-2009, 16:00
And i still think that GT300 with 512-bit GDDR5 is a bit extreme.
But AMD, with a 256-bit bus, will be completely unable to compete with "GTX360" if it has a 448-bit GDDR5 bus - which is quite unlike HD4870 versus GTX260.

So GTX380 could easily be $650 and GTX260 $450 and HD5870's performance would price it at around $300 again, this time putting no pressure on NVidia.

I hope AMD goes for more than 16xZ per 64-bit channel - 16xZ per 32-bit channel would be cool. It's the only way to compete with NVidia, who'll prolly double Z per channel in GT300 in comparison with GT200. AMD could use the highest clocking GDDR5 to compensate too - higher-clocked than NVidia. That would give 16xZ per 32-bit channel bandwidth to breathe.

Jawed

DegustatoR
04-Feb-2009, 16:16
But AMD, with a 256-bit bus, will be completely unable to compete with "GTX360" if it has a 448-bit GDDR5 bus - which is quite unlike HD4870 versus GTX260.
GT300 is (supposedly) a top-end GPU a la G80 and GT200. RV870 is (supposedly) a middle-class GPU a la RV670 and RV770. Any competetion between one GT300 and one RV870 is a problem for NV -- in much the same way it is now between GT200 and RV770.
As for the competetion between GT300 and an RV870-based AFR top-end from AMD, i'm not so sure that you have to have the same bandwidth on the one-chip top-end to be able to counter an AFR system which is quite ineffective in it's memory usage. A smarter way is to have less costly solution with the same performance -- and 384-bit GDDR5 might do the trick here.
But who knows what they've planned for GT300...

Razor1
04-Feb-2009, 16:19
But AMD, with a 256-bit bus, will be completely unable to compete with "GTX360" if it has a 448-bit GDDR5 bus - which is quite unlike HD4870 versus GTX260.

So GTX380 could easily be $650 and GTX260 $450 and HD5870's performance would price it at around $300 again, this time putting no pressure on NVidia.
Jawed


Well nV probably will adjust for thier short comings from the past launch, that or they are just stupid :razz:.

trinibwoy
04-Feb-2009, 16:39
Arun, what's your basis for 1 SFU per SM? I don't get the benefit there.

When I first heard of GT215 I assumed it was needed to fill a large gap between GT216 and GT214. I don't see how confusing the competition comes into play. It's not like the competition knows what GT214 is gonna look like, and if they already do then obviously confusing them is the least of your worries!!

Lukfi
04-Feb-2009, 16:42
On the contrary, if the competition does know, it's necessary to confuse them by making them think you've changed the design :)

trinibwoy
04-Feb-2009, 16:43
On the contrary, if the competition does know, it's necessary to confuse them by making them think you've changed the design :)

Well I meant that if they have the means to know in the first place, via those same means they will see through any diversionary tactics too :)

Arun
04-Feb-2009, 16:46
Arun, what's your basis for 1 SFU per SM? I don't get the benefit there.Die size? :o Remember the SFU is really the 'SFU/Interpolator/MUL' unit. Since in graphics only half the MUL can be used anyway for scheduling reasons, this would also reduce the waste on that front to zero, which is very efficient.

When I first heard of GT215 I assumed it was needed to fill a large gap between GT216 and GT214. I don't see how confusing the competition comes into play. It's not like the competition knows what GT214 is gonna look like, and if they already do then obviously confusing them is the least of your worries!!If they don't have an 'educated' hunch at this point, there's a problem. Look at RV670: NV knew it was 12 TMUs. And it was for some time; then AMD changed it, and NV never knew about it before it was much too late.

trinibwoy
04-Feb-2009, 16:51
Well it's nice to know what the competition is up to but how does that change anything? You should always strive to extract the most performance out of a given transistor budget. Your ability to do so isn't impacted in any way by what the competition is doing. That's exactly what ATi did with RV770 and it worked out great for them.

Arun
04-Feb-2009, 16:55
Well it's nice to know what the competition is up to but how does that change anything? You should always strive to extract the most performance out of a given transistor budget. Your ability to do so isn't impacted in any way by what the competition is doing. That's exactly what ATi did with RV770 and it worked out great for them.No, but it can help you know which chips to prioritize if they're on roughly the same schedule and, very very importantly, it helps you tremendously to manage your inventory situation. NVIDIA's G80 inventory surplus in late 2007 was because they thought they could keep selling 880GT at $299 and wait longer to launch the 8800 GTS 512MB. We all know how that turned out, but better competitive info would have helped them save a lot of money.

It also helps plan SKUs, since those can be changed quite a bit late into the design cycle, but not so late either that competitive info isn't useful. And when you get your info really early and if your team is really dynamic, you can even change your chip a tiny bit, but that isn't really the point generally.

Jawed
04-Feb-2009, 17:19
As for the competetion between GT300 and an RV870-based AFR top-end from AMD, i'm not so sure that you have to have the same bandwidth on the one-chip top-end to be able to counter an AFR system which is quite ineffective in it's memory usage.
Agreed.

A smarter way is to have less costly solution with the same performance -- and 384-bit GDDR5 might do the trick here.
I think that's a step too far though, as GTX280 is outclassed by HD4850X2 - they're both using roughly the same grade of GDDR3, 512-bit versus 2x256-bit, 141.7 against 127.1GB/s,

Jawed

DegustatoR
04-Feb-2009, 18:58
I think that's a step too far though, as GTX280 is outclassed by HD4850X2 - they're both using roughly the same grade of GDDR3, 512-bit versus 2x256-bit, 141.7 against 127.1GB/s,
I'm not sure that GT200(b) vs RV770 situation is a typical one. GT200 is too slow for its die size and not because of bandwidth shortage but more because of inefficient design (especially for MSAA 8x) and low transistor density.
While RV870 may be more or less RV770+DX11+more units (not much left to do there actually since it's already 10.1 and has a tesselator), GT300 supposedly have massively updated architecture (if it's not then i'll be wondering were all that R&D money went between G80 and GT300). So it's probably useless to try to guess GT300's performance from GT2xx numbers.

trinibwoy
04-Feb-2009, 20:25
Don't be so sure that AMD's DX11 architecture would resemble RV770 that closely. Who knows how they're going to rejig their shader core to better handle general computing workloads. They're definitely suffering in that regard at the moment in anything that isn't very coherent and multi-component to nicely fill those 5-way ALUs (It does quite well in FFT's for example).

ninelven
04-Feb-2009, 20:43
I am pretty sure CarstenS was talking about the SP:TMU ratios, which is 2:1 on G8x/G9x, 3:1 on G200 and he expects it to be 4:1 on GT21x. Since you were quoting his words, I supposed (and probably everyone else did as well) you're talking about the same thing. I still don't see how the 6:1 number is relevant here... He is saying he expects 32 SPs per TPC for GT21x (32/8 = 4:1). I simply said 32 or 40 does seem likely, but that I will be disappointed if GT3xx does not have 48 SPs per TPC (48/8 = 6:1). Clear enough for you?

Jawed
04-Feb-2009, 20:48
I'm not sure that GT200(b) vs RV770 situation is a typical one. GT200 is too slow for its die size and not because of bandwidth shortage but more because of inefficient design (especially for MSAA 8x) and low transistor density.
Yeah, the ROPs are in dire need of an overhaul - been saying it for years now.

Jawed

trinibwoy
04-Feb-2009, 20:50
He is saying he expects 32 SPs per TPC for GT21x (32/8 = 4:1). I simply said 32 or 40 does seem likely, but that I will be disappointed if GT3xx does not have 48 SPs per TPC (48/8 = 6:1). Clear enough for you?

Disappointed from a technical or practical sense? Are games sufficiently shader bound to warrant that kind of increase in the ratio?

btw - looks like Hardware-info has put up a nice little table with KonKort's GT218 info (http://www.hardware-infos.com/news.php?news=2715). So basically a 9400GT with twice the shaders.

ninelven
04-Feb-2009, 21:07
Disappointed from a technical or practical sense? Are games sufficiently shader bound to warrant that kind of increase in the ratio? Both really.... For the high-end I would say yes it is warranted... 2560x1600 and 1920x1200 are quite demanding. Even then, it would be a less severe ratio than what AMD is currently using.

Jawed
04-Feb-2009, 21:41
Don't be so sure that AMD's DX11 architecture would resemble RV770 that closely. Who knows how they're going to rejig their shader core to better handle general computing workloads. They're definitely suffering in that regard at the moment in anything that isn't very coherent and multi-component to nicely fill those 5-way ALUs (It does quite well in FFT's for example).
I seriously doubt AMD will be changing the 5-lane configuration any time soon.

Comparing performance per mm2:

http://forum.beyond3d.com/showpost.php?p=1260895&postcount=67
http://forum.beyond3d.com/showpost.php?p=1260970&postcount=69

HD4870 as a percentage of GTX285, both on 55nm:

float MAD serial - 68%
float4 MAD parallel - 327%
float SQRT serial - 265%
Float 5-inst. Issue - 287%
int MAD serial - 164%
int4 MAD parallel - 335%Then there's double-precision.

I'm certainly intrigued to find out if LDS/GDS are enough for "high performance" in D3D11 CS and OpenCL shared memory. I suspect more work's needed, but since we know so damn little about these things...

And dynamic branching performance is still an open question when comparing the two architectures, as there's so little data :cry:

---

Rather than increasing the MAD:MI ratio as Arun keeps suggesting, I think (somewhat idly, of course) NVidia would be better off just deleting MI entirely and doing those calculations (transcendentals and attribute interpolation) in software. It would reduce the register file bandwidth problems they have and remove a whole load of instruction dependency complexity from both the compiler and the scoreboards. Use the area saved to add SIMDs...

Jawed

trinibwoy
04-Feb-2009, 22:15
Both really.... For the high-end I would say yes it is warranted... 2560x1600 and 1920x1200 are quite demanding. Even then, it would be a less severe ratio than what AMD is currently using.

I've always considered an increase in resolution to be a linear increase in both shading and texturing workload. But maybe you're right.

I seriously doubt AMD will be changing the 5-lane configuration any time soon.

Comparing performance per mm2:

http://forum.beyond3d.com/showpost.php?p=1260895&postcount=67
http://forum.beyond3d.com/showpost.php?p=1260970&postcount=69


HD4870 as a percentage of GTX285, both on 55nm:
float MAD serial - 68%
float4 MAD parallel - 327%
float SQRT serial - 265%
Float 5-inst. Issue - 287%
int MAD serial - 164%
int4 MAD parallel - 335%

Well a wide SIMD architecture is always going to look good in pure throughput tests like those. But what about more realistic workloads like these (http://www.bit-tech.net/hardware/graphics/2008/09/02/ati-radeon-4850-4870-architecture-review/9)? And doesn't more general code have a lot more scalar dependencies by nature since it's not working against vectorized data as much as a typical 3D process would?


I'm certainly intrigued to find out if LDS/GDS are enough for "high performance" in D3D11 CS and OpenCL shared memory. I suspect more work's needed, but since we know so damn little about these things...

Well F@H performance seems to say that they're not enough....

And dynamic branching performance is still an open question when comparing the two architectures, as there's so little data :cry:

True, it's just that right now a branch costs AMD at least 5x what it costs Nvidia in terms of idle resources. That's gotta catch up to them at some point.

Lukfi
04-Feb-2009, 22:42
He is saying he expects 32 SPs per TPC for GT21x (32/8 = 4:1). I simply said 32 or 40 does seem likely, but that I will be disappointed if GT3xx does not have 48 SPs per TPC (48/8 = 6:1). Clear enough for you?
It is now. Sorry for misunderstanding you the first time :embarassed:

=>Jawed & trinibwoy: Shouldn't ATI and nVidia focus on graphics in the first place, GPGPU in the second? Or did I miss the GPU transforming itself from "a graphics processor that can do some general computing by the way" into "a general purpose processor that can do graphics by the way"?

Arun
04-Feb-2009, 22:56
Rather than increasing the MAD:MI ratio as Arun keeps suggesting, I think (somewhat idly, of course) NVidia would be better off just deleting MI entirely and doing those calculations (transcendentals and attribute interpolation) in software.Have you seen the Larrabee graphs indicating how much of the processing power would go to interpolation (and I think that included transcendentals?) - it's terrifying. Something like 25% of the entire workload... So I'm not really convinced that's an option! ;) We discussed that combo SFU/Interpolator patent a lot way back in the day, and I really don't think it's easy to do that with sufficient quality and low enough cost without a dedicated unit.

However...
It would reduce the register file bandwidth problems they have and remove a whole load of instruction dependency complexity from both the compiler and the scoreboards. Use the area saved to add SIMDs...Remember the register bandwidth problem isn't related to interpolation or SFU. In that case, it's just fine; that's what it was designed for! The problem is for the MUL which requires *two* register reads, instead of 0 (!!) for interpolations and theoretically 1 for the SFU.

An argument could easily be made for the removal of the MUL from the SFU/Interpolation unit, offloading that to the main ALU. Whether that is actually an ideal use of resources given the overhead of a programmable processor, I'm not sure. It depends on how expensive tricks to somehow still expose that unit are, and I have no idea there.

Jawed
04-Feb-2009, 23:47
Well a wide SIMD architecture is always going to look good in pure throughput tests like those. But what about more realistic workloads like these (http://www.bit-tech.net/hardware/graphics/2008/09/02/ati-radeon-4850-4870-architecture-review/9)?
:?:

Which of those is an ALU-specific test? I know 3DMark06 Perlin Noise is ALU-bound (just about).

And doesn't more general code have a lot more scalar dependencies by nature since it's not working against vectorized data as much as a typical 3D process would?
HD4870 can't get any slower than the serial MAD test I linked, i.e. 68% performance per mm2 or 37% of the absolute performance of GTX285.

As to the "nature" of more general code, the issue is really about the memory system. Some general code is so compute bound it barely uses any kind of memory resources, either video RAM or on-die shared RAM - just registers, basically. That code will be quite happy in naive scalar form.

But any time bandwidth/latency are part of performance you have to forget about programming a scalar machine in purely scalar terms. You're now programming a vector memory architecture. Gathers should be maximally coherent, you don't want to induce waterfalls in register/constant fetches and the memory system needs nice aligned operations to maximise memory controller and cache performance.

The SIMDness of the GPU, the 32-wide batches, is simply not enough to save you. By vectorising your use of data you're naturally making it work well on a vector GPU. It's why texturing is in quads, because the cost of not doing so is terrible.

Well F@H performance seems to say that they're not enough....
Eh? Until AMD re-writes the core to use LDS/GDS, F@H tells us precisely nothing.

True, it's just that right now a branch costs AMD at least 5x what it costs Nvidia in terms of idle resources. That's gotta catch up to them at some point.
I think large batch sizes are a far more pressing problem. Oh, by the way, I've realised that the batch size of RV770 is really double what I've been thinking. Because a pair of batches runs together in the ALUs in AAAABBBBAAAABBBB etc., any incoherency in either batch kills the other batch, too.

---

If GTX285's ALUs are ~25% of the die, that's about 118mm2. Meanwhile HD4870's ALUs are ~30% of the die, about 77mm2.



So a purely ALU-based comparison of performance per mm2 for HD4870 against GTX280:
float MAD serial - 57%
float4 MAD parallel - 273%
float SQRT serial - 221%
Float 5-inst. Issue - 239%
int MAD serial - 137%
int4 MAD parallel - 279%Worst case, AMD's ALUs are 76% bigger than NVidia's when running serial scalar code. Most of the time they're effectively 50% of the size in terms of performance per mm2.

Jawed

Jawed
05-Feb-2009, 00:00
=>Jawed & trinibwoy: Shouldn't ATI and nVidia focus on graphics in the first place, GPGPU in the second? Or did I miss the GPU transforming itself from "a graphics processor that can do some general computing by the way" into "a general purpose processor that can do graphics by the way"?
GPUs as we know them are in a losing race with things like Larrabee. Just working out where the losing line is the fun bit :lol:

2012?

Meanwhile I'm hoping that the irregular data-structures and read-modify-write pixel shader functionality of D3D11 will force GPUs to rapidly junk a load of fixed function hardware: let's get rid of colour operations in the ROPs, pretty please. I admit, I dunno how much space that'd save (or how many extra FLOPs the GPU would gain using that space), but the infrastructure requirements, i.e. caching and data-paths that reach further into the GPU should speed-up the generalisation of GPUs.

Jawed

Jawed
05-Feb-2009, 01:32
Have you seen the Larrabee graphs indicating how much of the processing power would go to interpolation (and I think that included transcendentals?) - it's terrifying. Something like 25% of the entire workload...
Are you referring to the "Pixel Setup" data point in figures 13 and 14 in the Siggraph paper? That's about 10%.

And, I'm still looking, high and low, for any sign of a transcendental ALU in Larrabee. I'm assuming the Pixel Setup cost is running on a simulation of Larrabee's vector ALU without any dedicated interpolation/transcendental functionality.

So I'm not really convinced that's an option! ;) We discussed that combo SFU/Interpolator patent a lot way back in the day, and I really don't think it's easy to do that with sufficient quality and low enough cost without a dedicated unit.
I should go digging through Nick's description of his software renderer to see what he said about this stuff running on CPUs - but not tonight...

However...
Remember the register bandwidth problem isn't related to interpolation or SFU. In that case, it's just fine; that's what it was designed for! The problem is for the MUL which requires *two* register reads, instead of 0 (!!) for interpolations and theoretically 1 for the SFU.
Well, forgetting the register file for a second, all ALU operands have to come through the operand collectors, whether they're from the register file, shared memory, the constant cache, video memory or attribute parameter buffer.

Regardless, the operand collector is still bigger simply to deal with the increased bandwidth of a MAD+MI configuration.

An argument could easily be made for the removal of the MUL from the SFU/Interpolation unit, offloading that to the main ALU. Whether that is actually an ideal use of resources given the overhead of a programmable processor, I'm not sure. It depends on how expensive tricks to somehow still expose that unit are, and I have no idea there.
Really the argument comes down to how often is graphics bottlenecked on interpolation or transcendental operations. Currently NVidia has a 4:1 MAD:SF ratio - you're proposing an 8:1 ratio. ATI's ratio is lower since interpolation has dedicated ALUs, while transcendentals are 1/4 rate.

The way I see it both are legacies of GPU history, accelerated interpolation was a key part of getting texturing to work when most rendering cycles were texturing bottlenecked and fast transcendentals were needed to get vertex shading at decent speeds (especially given how few vertex pipes there were). I wonder if there's any analysis of this stuff out there?

http://www.crhc.illinois.edu/TechReports/2008reports/08-2208.visarch.pdf

Will read properly tomorrow.

Jawed