NVIDIA Maxwell Speculation Thread

Picao84 · Feb 12, 2014

DSC said:
http://videocardz.com/49557/exclusive-nvidia-maxwell-gm107-architecture-unveiled

Thanks.

Anyone else is finding this tidbit a bit too much for a 28nm 60W chip?

GM107 will replace GK107 with a performance of GeForce GTX 480
You should find this particularly interesting. While GM107 utilizes 4 times less power than Fermi GF100, it will offer the same performance (actually even slightly better).

EDIT 1 - If it is true, WOW having the power of GTX480 on a decent, not so expensive, laptop

WANT!!!!

EDIT 2 - However, with such low memory bandwidth, it will probably will be quite a bit slower than GTX480 at high resolutions/4x AA.

fellix · Feb 12, 2014

Weird.

So, the new GPC configuration is 5 multiprocessors now? How much for the big Maxwell -- 4xGPC & 2560 ALUs? That would make for 450~480mm² die on 28nm, if the GPC share is roughly 50% of the whole IC logic.

mczak · Feb 12, 2014

Picao84 said:
Anyone else is finding this tidbit a bit too much for a 28nm 60W chip?

Yes I don't think it can quite reach GTX 480 performance in general, the numbers just don't add up. There might be some benchmarks where it's really close though.

And I really had to laugh about this:

Larger L2 cache.
This is the main difference between Kepler and Maxwell. Larger L2 cache will limit the queries to the GPU. GM107 L2 cache has 2MB. GK107′s cache has 256KB.

You'd think the SMX reorganization would be a much bigger change compared to a (rather trivial) increase of cache size (compared to gk208 which already has quadrupled L2 cache size per MC over gk107 anyway it's only a doubling in any case)... Seems to imply though gpus follow the way of cpus - traditionally gpus had tiny l2 caches (but lots of "cache" as registers). Maybe it really helps for some new framebuffer compression tricks (I'm still amazed if these products really use sub-6Ghz gddr5 memory). GF100 just had 768kB (and that was considered a lot already as GT200 only had 256kB) so seeing 2MB in some midrange offering certainly ups the stakes. Heck even Hawaii only has 1MB... That of course assumes the 2MB is actually true (I have no idea if this source is trustworthy, it certainly sounds like a lot!).

dnavas · Feb 12, 2014

DSC said:

The hierarchy is logical, considering they care about data locality.
If the light blue is cache, and the darker blue are tex/rop, red is dispatch and orange/yellow is scheduler, where are the sfus?

iMacmatician · Feb 12, 2014

fellix said:
Weird.

So, the new GPC configuration is 5 multiprocessors now? How much for the big Maxwell -- 4xGPC & 2560 ALUs? That would make for 450~480mm² die on 28nm, if the GPC share is roughly 50% of the whole IC logic.

My guess is 2 GPCs for GM206, 4 for GM204, and around 6 for GM200, depending on the number of SMMs per GPC (it changed from GK104 to GK110).

mczak said:
And I really had to laugh about this:

You'd think the SMX reorganization would be a much bigger change compared to a (rather trivial) increase of cache size (compared to gk208 which already has quadrupled L2 cache size per MC over gk107 anyway it's only a doubling in any case)...

How significant architecturally would the SMX reorganization be?

dnavas said:
The hierarchy is logical, considering they care about data locality.
If the light blue is cache, and the darker blue are tex/rop, red is dispatch and orange/yellow is scheduler, where are the sfus?

Is it possible that there are no SFUs anymore?

DSC · Feb 12, 2014

256KB - GK107
384KB - GK106
512KB - GK208
512KB - GK104
768KB - GF100/GF110
1536KB - GK110

Is it really possible for 2MB L2 in a low-end GPU on the same 28nm without ballooning the die size?

silent_guy · Feb 12, 2014

Larger L2 cache will limit the queries to the GPU.

You never know with these journalists, but I'll charitable assume that this is a typo and that it should be 'memory' instead of 'GPU'.

mczak · Feb 12, 2014

iMacmatician said:
How significant architecturally would the SMX reorganization be?

Well if they changed as much as implied by those diagrams that looks quite like a significant architectural change to me.

Is it possible that there are no SFUs anymore?

I hope so as I predicated that for Kepler already

.

DSC said:
Is it really possible for 2MB L2 in a low-end GPU on the same 28nm without ballooning the die size?

I can't see why not. Kabini's 2MB L2 cache (which I don't think is anything special or particularly dense) is below 20mm² including tags I believe (never saw a number for that, just a guess from die shot). Granted you'd probably need more cache bandwidth than what Kabini provides but I don't think it should be a particular problem from a size point of view. The problem with large l2 caches in gpus just has been that they didn't offer that much of a performance benefit presumably (hence instead of larger l2 cache they rather put one more smx on the die or something along these lines). But maybe they are good for perf/w...

dnavas · Feb 12, 2014

iMacmatician said:
Is it possible that there are no SFUs anymore?

Seems more likely that they just aren't shown, but who knows. I'm also intrigued by "the number of instructions per clock cycle has been increased" because "holy hot clock cycle reincarnation" and "wait, this is like Fermi++, wth was Kepler then"....

AnarchX · Feb 12, 2014

At this high level view, SFUs were also not visible on Kepler GPUs: http://anandtech.com/show/5699/nvidia-geforce-gtx-680-review/2

So ALUs are now "SIMD16" instead of "SIMD32" in Kepler? Maybe some hot-clocking is also back?

Alexko · Feb 12, 2014

silent_guy said:
You never know with these journalists, but I'll charitable assume that this is a typo and that it should be 'memory' instead of 'GPU'.

Or "from" instead of "to".

DSC said:
256KB - GK107
384KB - GK106
512KB - GK208
512KB - GK104
768KB - GF100/GF110
1536KB - GK110

Is it really possible for 2MB L2 in a low-end GPU on the same 28nm without ballooning the die size?

I don't think it's that big of a deal. If you look at a Kaveri die shot you'll see that the 4MB of L2 don't take up a very large part of the die, and that's for cache with tighter latency requirements than what you'd need in a GPU.

Granted it's GloFo's 28nm process instead of TSMC's, but they probably have similar SRAM densities.

http://cdn2.wccftech.com/wp-content/uploads/2014/01/AMD-Kaveri-Die-Shot1.jpg

iMacmatician · Feb 12, 2014

AnarchX said:
At this high level view, SFUs were also not visible on Kepler GPUs: http://anandtech.com/show/5699/nvidia-geforce-gtx-680-review/2

Ah, I forgot about that.

Picao84 · Feb 12, 2014

What about:

SM has been redesigned into four processing blocks (as explained above).

Looking at Fermi and Kepler this was not the case? Each SM was a "monoblock"?

If this is true, any idea about the consequences?

Arun · Feb 12, 2014

Since they support 48KB Shared Memory + 16KB L1 per 192 ALU SMX on Kepler, each 32 ALU SM will need to have at least 48KB Shared Memory for backwards compatibility. That's a LOT more shared memory (and associated bandwidth) than on Kepler!

OlegSH · Feb 12, 2014

DSC said:
Is it really possible for 2MB L2 in a low-end GPU on the same 28nm without ballooning the die size?

Off couse it is, with 6T SRAM and the same banks/ports count(as in GK107) additional 1792 Kbytes of cache will require 88080384 transistors. 88 mln transistors are very cheap on 28nm and they are even cheaper in terms of area since sram could have more dense layout than the rest of the chip, it's likely just 7-8 mm2 of additional area on 28nm

tviceman · Feb 13, 2014

Maxwell is going to be a beast. If it's this good on 28nm, I can't wait to see it shrunk down to 20nm, or 16nm / finfet.

tviceman · Feb 13, 2014

fellix said:
Weird.

So, the new GPC configuration is 5 multiprocessors now? How much for the big Maxwell -- 4xGPC & 2560 ALUs? That would make for 450~480mm² die on 28nm, if the GPC share is roughly 50% of the whole IC logic.

I highly doubt we'll get big maxwell on 28nm. However, if 20nm isn't worth the trouble and finfets are still a ways off, then we'll definitely get GM104 on 28nm.

trinibwoy · Feb 13, 2014

AnarchX said:
So ALUs are now "SIMD16" instead of "SIMD32" in Kepler? Maybe some hot-clocking is also back?

It does seem like a throwback to simpler times.

UniversalTruth · Feb 13, 2014

If GTX 480 is as fast as a modern GTX 660, then it means that by launch time the videocard (GTX 750 Ti) would be actually showing higher performance results compared to what we have already seen with performance below GTX 650 Ti Boost.. Looks goodie..

tviceman · Feb 13, 2014

NEVERMIND

NVIDIA Maxwell Speculation Thread

Picao84

fellix

mczak

dnavas

iMacmatician

DSC

silent_guy

mczak

dnavas

AnarchX

Alexko

iMacmatician

Picao84

Arun

Unknown.

OlegSH

tviceman

tviceman

trinibwoy

Meh

UniversalTruth

tviceman

Similar threads