NVIDIA Maxwell Speculation Thread

Is there even any 7.4Gbps GDDR5 available? (honest question). Let's just assume there is:

GTX680 = 195W TDP, 6.0 Gbps GDDR5
GTX770 = 230W TDP, 7.0 Gbps GDDR5

Real time power consumption between the two is fairly close, but it's definitely not a coincidence that the latter has such a high TDP despite core frequency differences being small.

Well, the GDDR speed should play a role, but there's 2 major difference between 680 and 770 related to core speed. the baseclock is 40mhz higher, the turbo clock is only ~30mhz higher ). But the 770 is rarely between the base and the turbo boost clock,
in reality nearly all 770 i have seen are working largely over 1100mhz .. The reason of this higher rated TDP by nvidia is just there for allow the turbo to clock higher ...
Its why with 770 you hit first the temperature limit set by Nvidia and not the TDP limit. ( ofc i speak about the reference design ).
 
Last edited by a moderator:
Rationale for a 20nm GPU : you run into an area concern with GM200, as posters previously said here.
The GM204 hypothesized above is Itanium-sized already, so GM200 is getting impossible or area limited.

If GM200 is made on 20nm instead, it make sense for the 204 / 206 GPUs to be made on 20nm so that experience, development, qualification, debugging and other such words can happen before fabbing the big big chip.
 
Rationale for a 20nm GPU : you run into an area concern with GM200, as posters previously said here.
The GM204 hypothesized above is Itanium-sized already, so GM200 is getting impossible or area limited.

If GM200 is made on 20nm instead, it make sense for the 204 / 206 GPUs to be made on 20nm so that experience, development, qualification, debugging and other such words can happen before fabbing the big big chip.

That would be true and neither I would be questioning anything if the 20nm process development was going smoothly, something we know its not.
In any case, I do not believe those specs are real. What I believe though, is that some kind of GMx04 chip is coming on 28nm.
 
GF114 was at 350 something; as for the rest not bad but I prefer round GPC amounts for that type of chip category.

According to some benchmarks, the GM107 has far stronger front end, than previous generation. In Unigine Heaven the GM107 places itself between GTX650Ti(2GPCs; 128Bit) and GTX660(3GPCs;192Bit), but closer to the GTX660. IMO Maxwells GPCs are far more powerful, than the GPCs on Kepler. 3 should be enough, but 4 is, of course, a nicer number and the same as GK104.
 
According to some benchmarks, the GM107 has far stronger front end, than previous generation. In Unigine Heaven the GM107 places itself between GTX650Ti(2GPCs; 128Bit) and GTX660(3GPCs;192Bit), but closer to the GTX660. IMO Maxwells GPCs are far more powerful, than the GPCs on Kepler. 3 should be enough, but 4 is, of course, a nicer number and the same as GK104.

Hmm, and you would also think each GPC would have a higher number of SMMs (6?) than GM107? Because 3 GM107 GPCs would give a total of 15 SMMs, and thus 1920 cores, not the 2304 you wrote before?
 
GK106 has 3 GPCs, but 5 SMX.

I don't believe, that SMs are forced directly into one GPC. There should be some load balancing functionality. For 28nm more than 20 SMMs for GM204 are unreal, especially any performance chip from NV never really xceeded the 350mm² range.

€: Any generations performance chip was able to outclass the last generations highend monster by a small margin. I don't think 1920SPs would be enough.
 
GK106 has 3 GPCs, but 5 SMX.

I don't believe, that SMs are forced directly into one GPC. There should be some load balancing functionality. For 28nm more than 20 SMMs for GM204 are unreal, especially any performance chip from NV never really xceeded the 350mm² range.

€: Any generations performance chip was able to outclass the last generations highend monster by a small margin. I don't think 1920SPs would be enough.

Ok, I was asking for clarification, not doubt :)
 
Ailuros, my point is that this card is supposedly replacing GK104, not GK110. Yes, GTX770 has 230W TDP but that it's already pushing GK104 a lot. If they do the same to GM204, on 20nm, what does that tell us about Maxwell perf/watt? I know we still have to know the chip's performance, but why would they push the TDP so hard from the beginning? It all doesn't make much sense to me, unless Maxwell is another Fermi type event...

My point was and is that for some reason anything >6.0 Gbps GDDR5 seems to pump up peak power consumption way too much. I never said anywhere that those specs are real or believable; it's just that the TDP given all other data is the last thing that's absurd.

3200 cores is only roughly 10% more than GK110 2880. Gm107 packed roughly 66% more cores than gk107, while increasing the die size much less and no big increase in TDP. Ignoring for the moment the conspiracy theories about no interconnect on gm107, why couldn't they do the same for another Maxwell chip on 28nm?

Who says that GM204 ISN'T on 28HP? :p For the record's sake with perfect scaling you should get with 20 SMMs hypothetically GK110+20%.

GK110 - 7.1B Transistors in 561mm2.
GM204 (28nm) would have roughly more 11% more Transistors, so die size would be 5.83% larger at 593 mm2. It is quite large, but this does not include further optimizations nVIDIA could eventually do to pack more Transistors, plus we are talking about a 256 bit memory controller, versus GK110 384 bit.

551mm2 for the hairsplitting for GK110.

In short, I still think this chip could, maybe, be doable in 28nm, justifying the 230W TDP. Additionally, IF these specs are real, normally they leak not so far from the chip, launch, say 3 months. Would it be reasonable that a 20nm reasonably large chip to be coming in June/July?

Again I never said or implied they're real ;)
 
According to some benchmarks, the GM107 has far stronger front end, than previous generation. In Unigine Heaven the GM107 places itself between GTX650Ti(2GPCs; 128Bit) and GTX660(3GPCs;192Bit), but closer to the GTX660. IMO Maxwells GPCs are far more powerful, than the GPCs on Kepler. 3 should be enough, but 4 is, of course, a nicer number and the same as GK104.
Don't forget, GPC don't do all that much, most of the logic for handling tris is in the Polymorph Engines in the SMX / SMM. (The GPCs will determine the max rendered tris / clock, but it has to be said they don't actually scale all that well according to benchmarks 4 GPCs don't even really push twice the rendered tris / clock as 1.)
SMX could do 1 tri per 2 clock, the SMM can do 1 tri per 3 clock - but of course the GTX 750 Ti has 5 SMM (5/3 tris / clock) whereas the 650 GTX just had 2 (1 tri / clock). So if tris are culled then gm107 can reach more than 1 tri / clock despite having only 1 GPC. gk106 is actually quite inbalanced, since it has 3 GPCs but its max theroretical tri rate is just 2.5 tris / clock anyway due to only having 5 SMX.
(As a side note, gk20a having 0.5 tris / clock is actually not an indication of a changed frontend, though it probably is different in any case, since a gk107/gk208 with one SMX disabled would also have that same rate.)
Damien had some good numbers and explanations for this: http://www.hardware.fr/articles/916-5/performances-theoriques-geometrie.html
 
Good to know. For some reason I thought triangle rate was GPC-based instead of SMX-based.
Well that's not completely untrue - since the hw cannot exceed 1 tri / clock per rasterizer, for actually drawn tris. Though for Kepler (everything non-gk110) it's true that actually GPCs are never the only limit as there's only 2 SMX per GPC which cannot handle more tris neither. For GK110 and plain non-culled tris it _should_ be GPC limited. I'm not sure if there's any tests where gk110 can reach anywhere close to its theoretical peak throughput of 5 tris / clock however. It makes sense that you can exceed rasterizer limits with culled tris, since of course culling doesn't require synchronization as this doesn't depend on other tris, so this distributed scheme (introduced with Fermi essentially) where everything that can be done locally is done so makes a lot of sense in theory. With Maxwell it should definitely be more GPC limited if most of the tris aren't culled (as long as nvidia sticks to 5 SMM per GPC, but I can't see any reason why nvidia would want to use less SMM per GPC in other chips, even more so because the rasterizer actually is improved - still one tri / clock but 16 pixels instead of 8 pixels / clock).
 
Haven't been keeping up with the latest, so roughly what timeframe is GM200 or "Big Maxwell" expected, maybe around fall 2015 ?

I think it largely depends on the performance of GCN 2.0. If AMD pushes something spectacular which cannot be combat by NVidia with lower specced GPUs, then they will need to use the biggest die.

Fall 2015 is ~18 months from now. I hope, they better release it in H1 2015 as latest.
 
Last edited by a moderator:
I think it largely depends on the performance of GCN 2.0. If AMD pushes something spectacular which cannot be combat by NVidia with lower specced GPUs, then they will need to use the biggest die.
The big Maxwell will come as soon as it is fiscally viable. The HPC crowd will eat it up.

You seem to think that AMD is driving Nvidia's decisions in releasing products which is far far off the mark especially related to the Big Maxwell. Case in point the Maxwell 1.0 GM107 was released, not because of something AMD did, but was released to hit the specific notebook build cycle. Since it was also much better than the current GK106 that part was retired and the GTX 750 Ti was released and it was AMD who had to respond to that release.
 
I got this from Damien's excellent article: http://www.hardware.fr/articles/916-2/maxwell-1st-gen-28nm-apercu-global.html
I don't know if this is confirmed from nvidia or just based on measurements. I'd say though the measurements definitely would support that theory.
It's a bit complex. I've previously asked NV about the issue and have been told that the per-Polymorph rate is similar to Kepler, but there are other bottlenecks that cap the peak culling rate for GM107. Ultimately GM107 supports a sustained culling rate of 1.66 polygons/clock (which as NV likes to note, is nearly 2x that of GK107).
 
It's a bit complex. I've previously asked NV about the issue and have been told that the per-Polymorph rate is similar to Kepler, but there are other bottlenecks that cap the peak culling rate for GM107. Ultimately GM107 supports a sustained culling rate of 1.66 polygons/clock (which as NV likes to note, is nearly 2x that of GK107).
I am not sure what else could cap the peak culling rate if not the Polymorph engines so I suspect nvidia is a bit creative with the term "similar rate" compared to Kepler (could be similar per flop, maybe?). It is obvious the peak culling rate is not dependent on GPC (even gk110 already exceeds 5 tris / clock easily). Also, apart from the less alus the SMX themselves don't seem to be cut down in any other obvious way (things like shared memory access etc.). Damien's article actually specifically mentions gtx 750 non-ti having 1.33 tris / clock peak culling rate (but did not actually measure it or if he did at least it's not shown). Of course, there's a lot more things the Polymorph Engines do, and it's possible most things aren't slower at all.
(In any case, even if they are only 2/3 the performance in general, I wouldn't really see that as a disadvantage at all since obviously you've got more SMM in a GPC, so if that saved some die space and power all the better.)
 
Back
Top