NVIDIA Maxwell Speculation Thread

So, some sources (PCTuning, Tyden.cz, Expreview) claim that GTX 880 will be based on 20nm GM204 with following further specs:

7.9 billion transistors
3200 CUDA cores
200 TMUs
32 ROPs
5.7 TFLOP/s single-precision floating-point throughput
256-bit wide GDDR5 memory interface
4 GB standard memory amount
238 GB/s memory bandwidth
Clock speeds of 900 MHz core, 950 MHz GPU Boost, 7.40 GHz memory
230W board power

20nm, 7.9 million transistors and 230W board power??? I would believe it more if it was 28nm... Also, only 40 ROPS? They must be very powerful then to perform on 4K resolutions..
 
20nm, 7.9 million transistors and 230W board power??? I would believe it more if it was 28nm... Also, only 40 ROPS? They must be very powerful then to perform on 4K resolutions..

I wouldn't be surprised at all if 4K is not a priority yet. :LOL:

See, they have even prices for it. :LOL: Might be all wrong all over the place!
 
20nm, 7.9 million transistors and 230W board power??? I would believe it more if it was 28nm... Also, only 40 ROPS? They must be very powerful then to perform on 4K resolutions..

TSMC on their website is advertising 20nm as only a 25% power improvement over 28nm, so my guess real world savings are worse so this would be inline with that.
 
TSMC on their website is advertising 20nm as only a 25% power improvement over 28nm, so my guess real world savings are worse so this would be inline with that.

So, given that 20nm is more expensive than 28nm, what would be the point for nVIDIA to do it in 20nm? Yes, it can be a smaller chip, but since it is also more expensive, where are the advantages of going with it? IMO, I think that, if there is any truth to this data, they only had the chip specs and extrapolated the process, just "because".

Plus, GM204 is the successor to GK104, not GK110. So there would be ZERO power saved, quite the contrary, since GK104 is sub 200W TDP chip. This would be a huge retrocess in perf/watt not the advance Maxwell is supposed to bring on that front.

I still bet this is a 28nm chip, and a big one at that, although smaller than GK110, given the absence of many DP units like on GK104.

EDIT - Most people saw the fact that the chip is called GM204, instead of GM104, as a sign that it should be a 20nm chip. What IF there was an original GM104, targeted for 20nm, which was canned/backported for 28nm, and thus the rename to GM204?
 
Last edited by a moderator:
The good news is that 3200 is divisible by 128. 3200 is the smallest number that's both a multiple of 100 and 128! What's more, multiply it by three and then consecutively by two, you get 9600, 19200, 38400. Closely related are the numbers 14400, 28800, 57600 and 115200 : these latter numbers are divisible by 360, which is an awesomely divisible number.

Titan Z's advantage is it has 5760 little things, and that's the least common multiple of 360 and 128.
 
It seems that you should really [strike]use Mantle[/strike] push the CPU utilization when developing your games if you want the best performance on NVIDIA hardware… :D

Fixed.

Mantle, ironically, only help highlight one of the strong point of NV DirectX driver; more efficient/better CPU utilization. This has been evident long before the 'graphics API gets in the way' discourse surfaced. But few take notice.

This old thread, which blames the developer instead of IHV who implement D3D API in their driver inefficiently, comes into my mind.
 
Those specs seem not to be very legit.

GM204 should be around ~300mm2(GF114:332mm²; GK104: 294mm²).

I'll take a 28nm approach:
3 GPCs
18 SMMs
~2304SPs
4-6MB L2 Cache
256-384Bit MI
~320mm²
~Titan + ~15%
 
The good news is that 3200 is divisible by 128. 3200 is the smallest number that's both a multiple of 100 and 128! What's more, multiply it by three and then consecutively by two, you get 9600, 19200, 38400. Closely related are the numbers 14400, 28800, 57600 and 115200 : these latter numbers are divisible by 360, which is an awesomely divisible number.

Titan Z's advantage is it has 5760 little things, and that's the least common multiple of 360 and 128.

I am hesitating between considering your post serious or irony/parody. :???:
 
So, given that 20nm is more expensive than 28nm, what would be the point for nVIDIA to do it in 20nm? Yes, it can be a smaller chip, but since it is also more expensive, where are the advantages of going with it? IMO, I think that, if there is any truth to this data, they only had the chip specs and extrapolated the process, just "because".

Performance. If you're at the limit of die size and performance at 28nm and you need to increase that, what do you do? You have to go to the lower process even if it's more expensive and offers little power savings. You can offset the cost of the process by charging more and they seem to have no problem doing that at the high end where performance is the driver.

EDIT - Most people saw the fact that the chip is called GM204, instead of GM104, as a sign that it should be a 20nm chip. What IF there was an original GM104, targeted for 20nm, which was canned/backported for 28nm, and thus the rename to GM204?

It's a possibility, but I'm not sure how likely. GK110 is already 561mm / 7.1B transistors (at least accoring to Wikipedia). A 7.9B transistor chip would be over 600mm at 28nm, I'm not sure that's feasible. The fact that it's supposed to be on a 256-bit bus may offer some area savings, but still that's a very, very large chip. At 20nm, it would probably be in the 250-350 die size range.
 
So there are 10 ROPs per memory controller (or some related configuration)? That's a bit of an odd number.

Plus, GM204 is the successor to GK104, not GK110. So there would be ZERO power saved, quite the contrary, since GK104 is sub 200W TDP chip. This would be a huge retraces in perf/watt not the advance Maxwell is supposed to bring on that front.
Yes in general but remember that the 770 is also a 230 W part, although I got the impression that it was stretched in that area.

EDIT - Most people saw the fact that the chip is called GM204, instead of GM104, as a sign that it should be a 20nm chip. What IF there was an original GM104, targeted for 20nm, which was canned/backported for 28nm, and thus the rename to GM204?
I think the "1" vs. "2" refers to the 1st gen vs. 2nd gen* Maxwell chips, and they don't automatically indicate which process they are on. I wouldn't be surprised if there was a Maxwell performance chip originally planned for 20 nm. I wonder when they decided on multiple generations of Maxwell.

* The "1st gen" Maxwell generally implies at least the plan of a 2nd gen, but off the top of my head I don't recall anything straight from NVIDIA about the existence of any 2nd gen.
 
The good news is that 3200 is divisible by 128. 3200 is the smallest number that's both a multiple of 100 and 128! What's more, multiply it by three and then consecutively by two, you get 9600, 19200, 38400. Closely related are the numbers 14400, 28800, 57600 and 115200 : these latter numbers are divisible by 360, which is an awesomely divisible number.

Titan Z's advantage is it has 5760 little things, and that's the least common multiple of 360 and 128.


Reading this post I couldn't get Dial-up from my mind! Wonder why :devilish:
 
Performance. If you're at the limit of die size and performance at 28nm and you need to increase that, what do you do? You have to go to the lower process even if it's more expensive and offers little power savings. You can offset the cost of the process by charging more and they seem to have no problem doing that at the high end where performance is the driver.

It's a possibility, but I'm not sure how likely. GK110 is already 561mm / 7.1B transistors (at least accoring to Wikipedia). A 7.9B transistor chip would be over 600mm at 28nm, I'm not sure that's feasible. The fact that it's supposed to be on a 256-bit bus may offer some area savings, but still that's a very, very large chip. At 20nm, it would probably be in the 250-350 die size range.

If GM107 is an indication for anything, they managed to increase density a bit. It might allow them to have a decently sized chip on 28nm with those specs. Plus, there might be much less DP units, which saves die space.

GK104 is 294 mm2 with 1536 cores. Double that for 3072 cores and it gives 588 mm2. Take out the other 256 bit of memory bus and some ROPs/TMUs, factor in the improved transistor density from GM107 and it might not be so far fetched, IMO.
 
20nm, 7.9 million transistors and 230W board power??? I would believe it more if it was 28nm... Also, only 40 ROPS? They must be very powerful then to perform on 4K resolutions..

Is there even any 7.4Gbps GDDR5 available? (honest question). Let's just assume there is:

GTX680 = 195W TDP, 6.0 Gbps GDDR5
GTX770 = 230W TDP, 7.0 Gbps GDDR5

Real time power consumption between the two is fairly close, but it's definitely not a coincidence that the latter has such a high TDP despite core frequency differences being small.

Now that doesn't verify or dispell of course anything. I think but am not sure that the original text under that table also claimed a number of Denver cores... *lifts shoulders* :rolleyes:

Not that it is of any importance but your 28 vs. 20nm point is a bit weird. Whether real or not the config for that hypothetical GM204 suggest 7.9b transistors; considering the TDP is set at 230W and even a GTX780Ti with 7.1b transistors under 28HP is at 250W, there's nothing absurd considering that, at least not from a few miles distance I'd like to take from that table :devilish:
 
Those specs seem not to be very legit.

GM204 should be around ~300mm2(GF114:332mm²; GK104: 294mm²).

I'll take a 28nm approach:
3 GPCs
18 SMMs
~2304SPs
4-6MB L2 Cache
256-384Bit MI
~320mm²
~Titan + ~15%

GF114 was at 350 something; as for the rest not bad but I prefer round GPC amounts for that type of chip category.
 
Not that it is of any importance but your 28 vs. 20nm point is a bit weird. Whether real or not the config for that hypothetical GM204 suggest 7.9b transistors; considering the TDP is set at 230W and even a GTX780Ti with 7.1b transistors under 28HP is at 250W, there's nothing absurd considering that, at least not from a few miles distance I'd like to take from that table :devilish:

Ailuros, my point is that this card is supposedly replacing GK104, not GK110. Yes, GTX770 has 230W TDP but that it's already pushing GK104 a lot. If they do the same to GM204, on 20nm, what does that tell us about Maxwell perf/watt? I know we still have to know the chip's performance, but why would they push the TDP so hard from the beginning? It all doesn't make much sense to me, unless Maxwell is another Fermi type event...

I know you are giving relevance to GDDR5 power consumption, but as I think we have seen with gm107, the larger L2 cache has helped the chip to overcome somewhat the 128 bit memory bus and low speed memory. GM107 can approach GTX650 Ti Boost performance levels with 60% of the latter memory bandwidth. Does nVidia really need 7.4 Gbps memory on gm204? A rough extrapolation would put 60% of GK110 memory bandwidth at 200 Gbps, not that for off from GTX770's 224.

3200 cores is only roughly 10% more than GK110 2880. Gm107 packed roughly 66% more cores than gk107, while increasing the die size much less and no big increase in TDP. Ignoring for the moment the conspiracy theories about no interconnect on gm107, why couldn't they do the same for another Maxwell chip on 28nm?

GK107 - 1.3B Transistors in 118mm2
GM107 - 1.87B Transistors in 146mm2

So, 43% more Transistors in 23% larger die size, or a 0.53% increase in die size per each 1% increase in Transistors number.

GK110 - 7.1B Transistors in 561mm2.
GM204 (28nm) would have roughly more 11% more Transistors, so die size would be 5.83% larger at 593 mm2. It is quite large, but this does not include further optimizations nVIDIA could eventually do to pack more Transistors, plus we are talking about a 256 bit memory controller, versus GK110 384 bit.

In short, I still think this chip could, maybe, be doable in 28nm, justifying the 230W TDP. Additionally, IF these specs are real, normally they leak not so far from the chip, launch, say 3 months. Would it be reasonable that a 20nm reasonably large chip to be coming in June/July?
 
Last edited by a moderator:
Nvidia could improve the ROP performance like they did with Fermi over Tesla. Less units but same or better performance. Obviously the Maxwell 2nd gen ROPs will be tuned for performance & power efficiency and 4K res.

http://www.anandtech.com/show/3973/nvidias-geforce-gt-430/16

The end result is that GT 430 is effectively tied with these previous-generation cards, which is actually quite a remarkable feat for having half the ROPs.

NVIDIA worked on making the Fermi ROPs more efficient and it has paid off by letting them use 4 ROPs to do what took 8 in the last generation.

Same with the Polymorph Engine 2.0 tessellator, less units in GK104, 8 tessellators but outperforms GF110 16 tessellators significantly.

http://www.geeks3d.com/20140409/asus-geforce-gtx-750-gtx750-phoc-1gd5-review/

3.3 – GpuTest: TessMark X64

Settings: 1920×1080 fullscreen, no AA.

GTX 480: 4814 points, 80 FPS
GTX 750: 5740 points, 95 FPS

Interesting, with high level of tessellation, the GTX 750 is faster than the GTX 480.
 
Back
Top