NVIDIA Maxwell Speculation Thread

I thought the only 28nm Maxwell products would be GM107/GM108. But who knows.

I always read Damien Triolet's reviews first (depite that online translation is painful since I don't speak french) and it wasn't different this time. If monsieur Triolet is correct (which I have no real reason to doubt, since he's usually very well informed) then I'm not so sure there will be any further Maxwell chips on 28nm.
 
I think the branding would give a definite indicator as to timing of other parts. In many respects this is very similar to Bonaire's launch.

It will if the GM106/GM204 & GM200 turn out to be better than any other solution available in the future including ASICs.

While the claims need to be proven (though they were with SHA-256/Bitcoin mining) dedicated Scrypt miners are quoting 5M Hash/sec for 70W and I've seen others quoting similar (0.9Mh/s for 5W); by comparison Hawaii is ~0.8-0.9Mh/s.
 
I always read Damien Triolet's reviews first (depite that online translation is painful since I don't speak french) and it wasn't different this time. If monsieur Triolet is correct (which I have no real reason to doubt, since he's usually very well informed) then I'm not so sure there will be any further Maxwell chips on 28nm.

I was reading it, both through google translate and using my mostly rudimentary knowledge of French and could not find that hint. Could you quote it please?
 
I think the branding would give a definite indicator as to timing of other parts. In many respects this is very similar to Bonaire's launch.

I've figured as much.

I was reading it, both through google translate and using my mostly rudimentary knowledge of French and could not find that hint. Could you quote it please?

Le GPU GM107 intègre un ensemble de 5 SMM regroupés dans un unique GPC. Cela implique un débit limité à 1 triangle rendu par cycle mais permet de simplifier au maximum le tissu d'intercommunication, comme cela a été fait sur Tegra K1.

Superb :p
 
I still cannot get why that means we wont see more 28nm Maxwell's... :???:
Its just a remark about it being limited to 1 triangle per clock, as expected from having just one GPC?

http://forum.beyond3d.com/showpost.php?p=1827815&postcount=961

Don't get too hung up in my ramblings since I essentially was wrong since Maxwell still has dedicated FP units; however the entire 2x times efficiency thing sounded a wee bit strange and I was reconsidering tviceman's questions about the interdie connect.

Now what Mr. Triolet here implies is that when you have 1 GPC (ie 1 raster/1 trisetup) is that you don't need any highly complex interdie connect and can cut back severely in that department. I assume that that difference could be absorbed by a smaller process like 20SoC (+30% improvement over 28HP at best?) and hence my gut feeling that no bigger Maxwell cores under 28nm since GM206 cannot obviously have just one GPC can it?
 
Well no larger Maxwell 28nm chips would mean no GM106 though there were rumors about GM108 which would still be possible. Though I think it's a bit too far fetched to conclude there would be no bigger 28nm Maxwell chips based on that bit (not that I think there really will be a GM106 28nm chip).
btw I'm wondering how large a SMM is vs. a gk1xx and gk2xx SMX? Presumably it ought to be a good bit smaller, but I wonder how large the difference really is.
 
http://forum.beyond3d.com/showpost.php?p=1827815&postcount=961

Don't get too hung up in my ramblings since I essentially was wrong since Maxwell still has dedicated FP units; however the entire 2x times efficiency thing sounded a wee bit strange and I was reconsidering tviceman's questions about the interdie connect.

Now what Mr. Triolet here implies is that when you have 1 GPC (ie 1 raster/1 trisetup) is that you don't need any highly complex interdie connect and can cut back severely in that department. I assume that that difference could be absorbed by a smaller process like 20SoC (+30% improvement over 28HP at best?) and hence my gut feeling that no bigger Maxwell cores under 28nm since GM206 cannot obviously have just one GPC can it?

OK, true.
 
Last edited by a moderator:
Dynamic Parallelism and HyperQ is supported in GM107? Interesting......
Why is that interesting? These features were already present in the gk20x series (gk208/gk20a, though the latter is just a guess), just like the higher bit shifter capability.
 
I'm wondering what kind of new compression scheme nvidia uses. Anandtech's pixel fill number (http://www.anandtech.com/show/7764/the-nvidia-geforce-gtx-750-ti-and-gtx-750-review-maxwell/20) is probably the most impressive benchmark I've seen yet. This is 100% bandwidth limited on all amd cards you see there, and I'm pretty sure that's the case for GTX 750 / 750Ti as well. Yet the 750 manages a score which is 40% higher than the r7 260x, even though the latter has 20% more bandwidth available (and yes kepler was more efficient there already than SI/CI but certainly not to that extent). I dunno maybe the large cache helps, but surely this has to be the reason why this card doesn't seem to need all that much memory bandwidth to perform still quite well.
 
I'm wondering what kind of new compression scheme nvidia uses. Anandtech's pixel fill number (http://www.anandtech.com/show/7764/the-nvidia-geforce-gtx-750-ti-and-gtx-750-review-maxwell/20) is probably the most impressive benchmark I've seen yet. This is 100% bandwidth limited on all amd cards you see there, and I'm pretty sure that's the case for GTX 750 / 750Ti as well. Yet the 750 manages a score which is 40% higher than the r7 260x, even though the latter has 20% more bandwidth available (and yes kepler was more efficient there already than SI/CI but certainly not to that extent). I dunno maybe the large cache helps, but surely this has to be the reason why this card doesn't seem to need all that much memory bandwidth to perform still quite well.
It could be the effect of the 8-fold increase of the L2 size. :???:
It's simply more efficient, but still only gets a bit closer to its theoretical maximum.

This particular 3DMark test measures FP16 blending rate, so it hammers the memory writes quite heavy on all architectures. It's a simple bandwidth deficit issue.
 
Don't get too hung up in my ramblings since I essentially was wrong since Maxwell still has dedicated FP units;

Yeah, well, I took both sides of the issue and was wrong twice, so ;^/
I find it interesting how the dp units are hanging out with the tmus. It's difficult for me to imagine that they'll adopt that for hpc...?

As I'm in the apparent target market for this (I currently have a 5750, and I AM looking to upgrade), I should probably say a little something. As someone who upgrades infrequently, one of the things I look for are feature sets that will survive the next four years. No HDMI 2.0, no hevc encode OR decode, optional displayport, and 128bit bus made me sad. Low idle power and noise make me happy. :shrug: YMMV. I don't play games, I do video editing, so I'm not a perfect match.
 
It could be the effect of the 8-fold increase of the L2 size. :???:
It's simply more efficient, but still only gets a bit closer to its theoretical maximum.

This particular 3DMark test measures FP16 blending rate, so it hammers the memory writes quite heavy on all architectures. It's a simple bandwidth deficit issue.
Yes that why this test is so impressive. It requires 16 bytes / clock (8 read / 8 write). Hence gm107 reaching a bandwidth efficiency of 140% or so (40% over theoretical maximum). That's what I call efficient :).
It's possible l2 cache helps, though traditionally it does not seem to do much. Or there's some rather impressive lossless compression going on. But maybe it really is just L2 cache size - kepler also had higher bandwidth efficiency (in this test at least) compared to GCN, and it could be for this same reason (since GCN does not use L2 cache for ROPs, and the ROP caches themselves are tiny).
 
I'm not sure I believe hardware.fr's diagrams on that point. I don't see any justification for their claims in the article, and they've also got the texture cache/L1 size at 24KB per SMM (half the amount per SMX), despite the fact that it is now apparently servicing memory reads/writes from the shader cores. Hopefully they follow up with details on how they came to their conclusions.
 
Since the L1 is now part of the texture cache, there could be some major changes to it that we don't know for sure yet. It could be larger unified pool or a split design again. If it is the first case, that means now the texture units in Maxwell can access the cache for both read and write op's.
 
Back
Top