NVIDIA Maxwell Speculation Thread

Ailuros · Feb 18, 2014

mczak said:
I thought the only 28nm Maxwell products would be GM107/GM108. But who knows.

I always read Damien Triolet's reviews first (depite that online translation is painful since I don't speak french) and it wasn't different this time. If monsieur Triolet is correct (which I have no real reason to doubt, since he's usually very well informed) then I'm not so sure there will be any further Maxwell chips on 28nm.

Dave Baumann · Feb 18, 2014

I think the branding would give a definite indicator as to timing of other parts. In many respects this is very similar to Bonaire's launch.

A1xLLcqAgt0qc2RyMz0y said:
It will if the GM106/GM204 & GM200 turn out to be better than any other solution available in the future including ASICs.

While the claims need to be proven (though they were with SHA-256/Bitcoin mining) dedicated Scrypt miners are quoting 5M Hash/sec for 70W and I've seen others quoting similar (0.9Mh/s for 5W); by comparison Hawaii is ~0.8-0.9Mh/s.

Picao84 · Feb 18, 2014

Ailuros said:
I always read Damien Triolet's reviews first (depite that online translation is painful since I don't speak french) and it wasn't different this time. If monsieur Triolet is correct (which I have no real reason to doubt, since he's usually very well informed) then I'm not so sure there will be any further Maxwell chips on 28nm.

I was reading it, both through google translate and using my mostly rudimentary knowledge of French and could not find that hint. Could you quote it please?

Ailuros · Feb 18, 2014

Dave Baumann said:
I think the branding would give a definite indicator as to timing of other parts. In many respects this is very similar to Bonaire's launch.

I've figured as much.

Picao84 said:
I was reading it, both through google translate and using my mostly rudimentary knowledge of French and could not find that hint. Could you quote it please?

Le GPU GM107 intègre un ensemble de 5 SMM regroupés dans un unique GPC. Cela implique un débit limité à 1 triangle rendu par cycle mais permet de simplifier au maximum le tissu d'intercommunication, comme cela a été fait sur Tegra K1.

Superb

Picao84 · Feb 18, 2014

Ailuros said:
I've figured as much.

Superb

I still cannot get why that means we wont see more 28nm Maxwell's... :???:

Its just a remark about it being limited to 1 triangle per clock, as expected from having just one GPC?

Ailuros · Feb 18, 2014

Picao84 said:
I still cannot get why that means we wont see more 28nm Maxwell's...
Its just a remark about it being limited to 1 triangle per clock, as expected from having just one GPC?

http://forum.beyond3d.com/showpost.php?p=1827815&postcount=961

Don't get too hung up in my ramblings since I essentially was wrong since Maxwell still has dedicated FP units; however the entire 2x times efficiency thing sounded a wee bit strange and I was reconsidering tviceman's questions about the interdie connect.

Now what Mr. Triolet here implies is that when you have 1 GPC (ie 1 raster/1 trisetup) is that you don't need any highly complex interdie connect and can cut back severely in that department. I assume that that difference could be absorbed by a smaller process like 20SoC (+30% improvement over 28HP at best?) and hence my gut feeling that no bigger Maxwell cores under 28nm since GM206 cannot obviously have just one GPC can it?

iMacmatician · Feb 18, 2014

A1xLLcqAgt0qc2RyMz0y said:
Correct, the GM106 will be next.

So it's GM106 not GM206? (or will there be both?)

mczak · Feb 18, 2014

Well no larger Maxwell 28nm chips would mean no GM106 though there were rumors about GM108 which would still be possible. Though I think it's a bit too far fetched to conclude there would be no bigger 28nm Maxwell chips based on that bit (not that I think there really will be a GM106 28nm chip).
btw I'm wondering how large a SMM is vs. a gk1xx and gk2xx SMX? Presumably it ought to be a good bit smaller, but I wonder how large the difference really is.

Picao84 · Feb 18, 2014

Ailuros said:
http://forum.beyond3d.com/showpost.php?p=1827815&postcount=961

Don't get too hung up in my ramblings since I essentially was wrong since Maxwell still has dedicated FP units; however the entire 2x times efficiency thing sounded a wee bit strange and I was reconsidering tviceman's questions about the interdie connect.

Now what Mr. Triolet here implies is that when you have 1 GPC (ie 1 raster/1 trisetup) is that you don't need any highly complex interdie connect and can cut back severely in that department. I assume that that difference could be absorbed by a smaller process like 20SoC (+30% improvement over 28HP at best?) and hence my gut feeling that no bigger Maxwell cores under 28nm since GM206 cannot obviously have just one GPC can it?

OK, true.

mczak · Feb 18, 2014

DSC said:
http://www.anandtech.com/show/7764/the-nvidia-geforce-gtx-750-ti-and-gtx-750-review-maxwell/5

Ah good catch I missed that.
computerbase.de tried to get that information, but nvidia refused to tell apparently (http://www.computerbase.de/artikel/grafikkarten/2014/nvidia-geforce-gtx-750-ti-maxwell-im-test/), so I wonder what anandtech's source is.

DSC · Feb 18, 2014

Anandtech said:
The increased efficiency it affords improves performance alongside the other IPC improvements NVIDIA has worked in, plus it means that some of GK110’s more exotic features such as dynamic parallelism and HyperQ are now a baseline feature.

Dynamic Parallelism and HyperQ is supported in GM107? Interesting......

http://images.anandtech.com/doci/5840/DyPar.png

http://images.anandtech.com/doci/5840/HyperQ2.png

mczak · Feb 18, 2014

DSC said:
Dynamic Parallelism and HyperQ is supported in GM107? Interesting......

Why is that interesting? These features were already present in the gk20x series (gk208/gk20a, though the latter is just a guess), just like the higher bit shifter capability.

DSC · Feb 18, 2014

http://international.download.nvidi...tional/pdfs/GeForce-GTX-750-Ti-Whitepaper.pdf

Whitepaper doesn't really reveal much, might have to wait until real Maxwell aka 2nd Generation is launched.

mczak · Feb 18, 2014

I'm wondering what kind of new compression scheme nvidia uses. Anandtech's pixel fill number (http://www.anandtech.com/show/7764/the-nvidia-geforce-gtx-750-ti-and-gtx-750-review-maxwell/20) is probably the most impressive benchmark I've seen yet. This is 100% bandwidth limited on all amd cards you see there, and I'm pretty sure that's the case for GTX 750 / 750Ti as well. Yet the 750 manages a score which is 40% higher than the r7 260x, even though the latter has 20% more bandwidth available (and yes kepler was more efficient there already than SI/CI but certainly not to that extent). I dunno maybe the large cache helps, but surely this has to be the reason why this card doesn't seem to need all that much memory bandwidth to perform still quite well.

fellix · Feb 18, 2014

mczak said:
I'm wondering what kind of new compression scheme nvidia uses. Anandtech's pixel fill number (http://www.anandtech.com/show/7764/the-nvidia-geforce-gtx-750-ti-and-gtx-750-review-maxwell/20) is probably the most impressive benchmark I've seen yet. This is 100% bandwidth limited on all amd cards you see there, and I'm pretty sure that's the case for GTX 750 / 750Ti as well. Yet the 750 manages a score which is 40% higher than the r7 260x, even though the latter has 20% more bandwidth available (and yes kepler was more efficient there already than SI/CI but certainly not to that extent). I dunno maybe the large cache helps, but surely this has to be the reason why this card doesn't seem to need all that much memory bandwidth to perform still quite well.

It could be the effect of the 8-fold increase of the L2 size. :???:

It's simply more efficient, but still only gets a bit closer to its theoretical maximum.

This particular 3DMark test measures FP16 blending rate, so it hammers the memory writes quite heavy on all architectures. It's a simple bandwidth deficit issue.

dnavas · Feb 18, 2014

Ailuros said:
Don't get too hung up in my ramblings since I essentially was wrong since Maxwell still has dedicated FP units;

Yeah, well, I took both sides of the issue and was wrong twice, so ;^/
I find it interesting how the dp units are hanging out with the tmus. It's difficult for me to imagine that they'll adopt that for hpc...?

As I'm in the apparent target market for this (I currently have a 5750, and I AM looking to upgrade), I should probably say a little something. As someone who upgrades infrequently, one of the things I look for are feature sets that will survive the next four years. No HDMI 2.0, no hevc encode OR decode, optional displayport, and 128bit bus made me sad. Low idle power and noise make me happy. :shrug: YMMV. I don't play games, I do video editing, so I'm not a perfect match.

mczak · Feb 18, 2014

fellix said:
It could be the effect of the 8-fold increase of the L2 size.
It's simply more efficient, but still only gets a bit closer to its theoretical maximum.

This particular 3DMark test measures FP16 blending rate, so it hammers the memory writes quite heavy on all architectures. It's a simple bandwidth deficit issue.

Yes that why this test is so impressive. It requires 16 bytes / clock (8 read / 8 write). Hence gm107 reaching a bandwidth efficiency of 140% or so (40% over theoretical maximum). That's what I call efficient

.
It's possible l2 cache helps, though traditionally it does not seem to do much. Or there's some rather impressive lossless compression going on. But maybe it really is just L2 cache size - kepler also had higher bandwidth efficiency (in this test at least) compared to GCN, and it could be for this same reason (since GCN does not use L2 cache for ROPs, and the ROP caches themselves are tiny).

revan · Feb 18, 2014

Hold your green horses for a moment

http://www.hardware.fr/news/13568/nvidia-lance-geforce-gtx-750-ti-750-maxwell.html

psurge · Feb 18, 2014

I'm not sure I believe hardware.fr's diagrams on that point. I don't see any justification for their claims in the article, and they've also got the texture cache/L1 size at 24KB per SMM (half the amount per SMX), despite the fact that it is now apparently servicing memory reads/writes from the shader cores. Hopefully they follow up with details on how they came to their conclusions.

fellix · Feb 18, 2014

Since the L1 is now part of the texture cache, there could be some major changes to it that we don't know for sure yet. It could be larger unified pool or a split design again. If it is the first case, that means now the texture units in Maxwell can access the cache for both read and write op's.

NVIDIA Maxwell Speculation Thread

Ailuros

Epsilon plus three

Dave Baumann

Gamerscore Wh...

Picao84

Ailuros

Epsilon plus three

Picao84

Ailuros

Epsilon plus three

iMacmatician

mczak

Picao84

mczak

DSC

mczak

DSC

mczak

fellix

dnavas

mczak

revan

psurge

fellix

Similar threads