NVIDIA Maxwell Speculation Thread

Latest update from Guru3D:

Update #3 - The issue that is behind the issue
New info surfaces, Nvidia messed up quite a bit when they send out specs towards press and media like ourselves. As we now know, the GeForce GTX 970 has 56 ROPs, not 64 as listed in their reviewers guides. Having fewer ROPs is not a massive thing here but it exposes a thing or two about effects in the memory subsystem and L2 cache. Combined with some new features in the Maxwell architecture herein we can find the answers of the cards being split up in 3.5GB/0.5GB partions as noted above.
index.php

Look above, (and I am truly sorry to make this so complicated, as it really is just that .. complicated). You'll notice that for GTX 970 compared to 980 there are three disabled SMs giving the GTX 970 13 active SM (clusters with things like shader processors). The SMs shown at the top are followed by 256KB L2 caches and then pairs with 32-bit memory controllers located at the bottom. The crossbar is responsible for communication inbetween the SM's, cache en and memory controllers.
You will notice that greyed-out right-hand L2 for this GPU right ? That is a disabled L2 block and each L2 block is tied to ROPs, GTX 970 does not have 2,048KB but instead has 1,792KB of L2 cache. Disabling ROPs and thus L2 like that is actually new and Maxwell exclusive, on Kepler disabling a L2/ROP segment would disable the entire section including a memory controller. So while the L2/ROP unit is disabled, that 8th memory controller to the right still is active and in use.
Now that we know that Maxwell can disable smaller segments and keep the rest activated, we just learned that we can still use the 64-bit memory controllers and associated DRAM, but the final 1/8th L2 cache is missing/disabled. As you can see the DRAM controller actually need to buddy up into the 7th L2 unit, that it the root cause of a big performance issue. The GeForce GTX 970 has a 256-bit bus over a 4GB framebuffer, the memory controllers are all active and in use, but disabling that L2 segment tied to the 8th memory controller will result in the fact that overall L2 performance would operate at half of its normal performance.
Nvidia needed to tackle that problem and did so by splitting the total 4GB memory into a primary (196 GB/sec) 3.5GB partition that makes use of the first seven memory controllers and associated DRAM, then there is a (28 GB/sec) 0.5GB tied to the last 8th memory controller. Nvidia could have and probably should have marketed the card as 3.5GB, or they probably could even have deactivated an entire right side quad and go for a 192-bit memory interface tied to just 3GB of memory but did not pursue that as alternative as this solution offers better performance. Nvidia's claims that games hardly suffer from this design / workaround.
In a rough simplified explanation the disabled L2 unit causes a challenge, an offset performance hit tied to one of the memory controllers. To divert that performance hit the memory is split up into two segments, bypassing the issue at hand, a tweak to get the most out of a lesser situation. Both memory partions are active and in use, the primary 3.5 GB partion is very fast, the 512MB secondary partion is much slower.
Thing is, the quantifying fact is that nobody really has massive issues, dozens and dozens of media have tested the card with in-depth reviews like the ones here on my site. Replicating the stutters and stuff you see in some of the video's, well to date I have not been able to reproduce them unless you do crazy stuff, and I've been on this all weekend. Overall scores are good, and sure if you run out of memory at one point you will see perf drops. But then drop from 8 to like 4x AA right ?
Nvidia messed up badly here .. no doubt about it. The ROP/L2 cache count was goofed up and slipped through the mazes and ended up in their reviewers guides and spec sheets, and really ... they should have called this a 3.5 GB card with an extra layer of L3 cache memory or something. Right now Nvidia is in full damage control, however I will stick to my recommendations, the GeForce GTX 970 is still a card we like very much in the up-to 2560x1440 (WHQD) domain, but it probably should have been called a 3.5 GB product with an added 512MB L3 cache.
To answer my own title question, does Nvidia have a memory allocation bug ? Nope, this all was done per design, Nvidia however failed to communicate this completely with the tech-media and thus in the end, the people that buy the product.
http://www.guru3d.com/news-story/does-the-geforce-gtx-970-have-a-memory-allocation-bug.html
 
And if they did where should they send the email to? For example there's an error on first page (hint 3/8) do you assume you colleagues messed up or do you assume the reviewer made a mistake?
If I find something is wrong with the product I support(and we distribute far, far more data/information than Nvidia), whether it is brought to my attention or I find it myself, it is on "us" to prove it isn't "our" fault.
 
Occam says: they lied.

You mean this version needs least assumptions? If they wanted to lie, shouldn't they try harder to succeed? Like hiding real specs in their own tools and hope that nobody involved will speak? And even then, why take such risks with a product that does not really need to look better?
 
Anandtech's article says both memory segments cant be read at once. Is it same for writes too? You dont want the slower segment accessed much, so writes may not be written at once too(unless it is a low priority or kinda access). So from their words, if 8KB needs to be written and slower segment is low priority, one of other fast segments will have to cary that work too. Means one of the MC / mem chips have to write twice(ie. 1KB x 8 write access, since cant access to slow segment 8 KB have to be written via 7 MC's)?

If so, may this explain why some 970 owners observe some swapping after 3.2-3.5 GB, even before all 4 GB is comsumed(ie. algo/infrastructure tries to allocate more VRAM, but since 3.5gb is hit and a hi priority read-write is needed, slower segment is blocked since it need a low priority access, so cant allocate more VRAM, starts swapping )?
 
It's pretty obvious they knew, and chose to not disclose because not doing so would obviously make their product look better. Convoluted hypothetical explanations after the fact are hard to believe, because usually, NV marketing does get the specs of their hardware right, and the one instance they don't is when it hides a deficiency in their chip.

Occam says: they lied.

Hanlon says: they be stupid.
 
Agreed but I can easily believe Technical Marketing didn't notice while the actual engineers wrongly assumed it was intentional and there was no need to explain this - i.e. marketing would have wanted to talk about this in minimal detail to avoid future drama, but they honestly didn't know because engineering didn't think they needed to know or didn't realise they didn't know.

If true, it is unfortunate that technical marketing at NVIDIA doesn't have the time/desire/ability to look into that kind of information in sufficient depth, however, and does reveal a problem there.

Arun..I find it hard to believe that even Technical marketing missed it. If you think about it..this chip taped out sometime in April 2014. Are we expected to believe that right through testing, validation, product development, driver development, planning marketing..and through the course of meetings between the cross-functional teams between all the departments involved..that this never came up? This reeks of a cover up. If they had been open about it from the start..this would have been a non-issue. Heck..the press may have lauded the 970's performance even more if they knew how hobbled it was.
 
Last edited:
Too much sugar coating here. They didn't know? Damn sure they did. And they lied, nobody in a court would buy that "marketing team messed up", and we shouldn't either.

Time for the class action suit.

That said, my 970 is the best card I've had in a long time!
 
Arun..I find it hard to believe that even Technical marketing missed it. If you think about it..this chip taped out sometime in April 2014. Are we expected to believe that right through testing, validation, product development, driver development, planning marketing..and through the course of meetings between the cross-functional teams between all the departments involved..that this never came up? This reeks of a cover up. If they had been open about it from the start..this would have been a non-issue. Heck..the press may have lauded the 970's performance even more if they knew how hobbled it was.
Why would Nvidia stage an elaborate coverup for such a non-issue? It's far more likely the marketing team just made a mistake. This is a minor second order issue - we're quibbling over a few percent memory capacity.
 
Why would Nvidia stage an elaborate coverup for such a non-issue? It's far more likely the marketing team just made a mistake. This is a minor second order issue - we're quibbling over a few percent memory capacity.

It seems fairly likely that NVIDIA's marketing team made a mistake, because lying about such a small detail would be stupid, but it's also pretty certain that someone noticed it afterwards. NVIDIA could have cleared things up but apparently decided not to, perhaps to avoid the embarrassment and associated hassle.
 
We've gone into extra-time pursuing this issue and trying to find tests to reveal whether or not it has major performance implications in the bracket between 3.5 and 4.0 GiBytes really used. It wasn't easy. If you indulge in this, take a look here.
 
We've gone into extra-time pursuing this issue and trying to find tests to reveal whether or not it has major performance implications in the bracket between 3.5 and 4.0 GiBytes really used. It wasn't easy. If you indulge in this, take a look here.

Thanks for taking the time to do this. From what I understand you underclocked the GTX 980 to match the GTX 970's theoretical performance and compared the two in memory-constrained cases. Based on your results for Watch Dogs at 4K, the 970's reduced bandwidth beyond 3.5GB really does hurt quite a bit. Of course, even the 980 can't get playable framerates, so it's not ideal, but then again few games really require more than 3.5GB at playable framerates, at least for now.
 
I hope no one sues nVidia over this relatively little omission and causes them to mimic Apple or Qualcomm in their reluctance to disclose technical details.
 
It seems fairly likely that NVIDIA's marketing team made a mistake, because lying about such a small detail would be stupid, but it's also pretty certain that someone noticed it afterwards. NVIDIA could have cleared things up but apparently decided not to, perhaps to avoid the embarrassment and associated hassle.
I agree with you generally. However, I bet most articles about GPUs contain similar errors. GPUs are complicated, there's a lot of misunderstanding out there. The technical people that understand these issues are likely too busy making new GPUs to talk to marketing people. =)
 
I hope no one sues nVidia over this relatively little omission and causes them to mimic Apple or Qualcomm in their reluctance to disclose technical details.

If someone filing a suit is all it takes to break a commitment to openness, it wouldn't be much of a commitment. Winning such a suit would be more damaging, but this is a really arcane technical distinction in a marketing field with some seriously big whoppers. (GPU core counts anybody?)

Any such reluctance would have to be teased out of the general trend of increasing opacity that was underway or already established. Mobile/embedded components did not start with a culture of openness or good disclosure, and we see that vendors that make forays towards that realm tend to regress towards that mean (Intel's SDP, use of turbo clocks as headline clocks for chips facing off against tablet chips, AMD's sparse optimization information for Bobcat, limited disclosure of console APU details, and so on).
Nvidia's Denver architecture is sparse on details, although that may very well be a legacy of its Transmeta link.
In general, the PC-compatible desktop and server space transparency for architectural details and errata was something of an anomaly, and even that has slowly ratcheted up the level of obfuscation as time has gone on.

Perhaps a more close parallel here is the AMD's slowness to admit R600 did not have hardware UVD.
 
I agree with you generally. However, I bet most articles about GPUs contain similar errors. GPUs are complicated, there's a lot of misunderstanding out there. The technical people that understand these issues are likely too busy making new GPUs to talk to marketing people. =)

Ohh please, the number of ROPS is not complicate,.. every article cite the number of ROP of the GPU, even before they are released it is a major concern of the fake leak we get every week.. ( Do i need to cite the leaked or fake spec we have of the GM200, AMD Fidji etc ... ) the memory controller speed and the amount of memory is not complicate..

Ofc for the "Public" masses we only understand the smartphones buizness and what is 4G this will not speak much to them.. but since when a discussion of a forum about a new architecture have not count the Rop, memory controller speed and amount of Ram in their discussion..

Why do you think everyone or every fake leak is speaking about HBM memory when going after the 380-390X for AMD.

Technically, i got the feeling without this modification, the 970 will have end with only a 192bit bus and 3GB memory ( something similar to the mobile version ).. Nvidia seems have decided to go further and try keep the most bandwith possible, and basically, you end with a 224bit bus 3.5GB at full speed instead, who allow the use of 512Mb more and the last 32bit bus at a slower rate in some situation case... it can be used, it is not fast, it is only possible to use it under certain data condition. but it is here.

I dont think anyone should sue Nvidia, its more like a marketing failure decision, but well, now we will maybe put more attention about thoses technical question..

More seriously, we get a lot aimed technical questions thoses last years, who have been put in front of the masses, frametimes, smoothness in dual gpu-solution, frame display on the monitor, and we should not look at particular specifications the way they should be ?

a 56ROP gpu is not a 64ROPS gpu, it have less ROP and will be less capable in certain situation.

Hopefully we are not speaking about the GTX980 full chips and a problem, its only the cutted down version..

Is it a problem for the little brother ? ofc not, it is not made for have the capacitiy, the performance of his big brother ..

For the technical question of the 3.5GB fast memory and the " not much used"; slow 512MB left, well the partititon and allocation difference with the memory between the 970 and the 980, somewhere, even if it dont speak much to the public ( who look more at the average fps + price ), it should not do any difference, but the difference is there, and shorten this by, they are " the same in some different way" is a bit too easy.

I remember here, in this forum, peoples was a bit surprised to see that this gpu have the full ROP / MC setup , we have all gone over it, thinking Nvidia have decoupled them from the SM in a way or the other. And that we was miss information on the architecture.

We have, i have been completely dumb to believe it. If we had know, that the ROPs was not decoupled.. we will maybe have got the story differently and start ask us how it was possible for Nvidia to have keep the MC and DRAM fully available..

3Dilettante, you are right, but i remember lately some blog post of OpenGL guys, complaining about Nvidia who dont communicate at all since Kepler on their architectures and how it was begin to be really complicate to work with them, at contrario of AMD; where you have thousands of research on GCN who float on the net. ( How to do trigonometric inversion in GCN, how to do this, and that, how to faster this and that, etc etc ). This have even increase since they got the console design, as more develppers who was interested and work with the old "architecture" try now to understand the new possibilities offer to them.. Only follow some twitter account ( not made by guys of AMD ) is a real source of information about how GCN work, every week they post some find, some assumption, some codes, etc..

Dont get me wrong, for me this have no importance, it is not something who change my mind about the 970, its is a good gpu, performance wise, and it is not an hardware problem like if the gpu could die in the next minute...

It change only a bit the perspective i have about Nvidia communication and their leader team and not in a good way.. ( again ...... )
 
Last edited:
A class action suit here would be the epitome of ridiculousness. All you have to do is decide if you'd still buy the SKU with its measured price/performance if you'd known about it up front. How could anyone that bought it originally now say they wouldn't, with a straight face? It's an incredible discrete graphics card with excellent performance for the money and frankly we've never had it better.

They've handled it fairly farcically, but given the state of the technology press today they had few options other than the ridiculous diagram and the call with Jonah. I feel sorry for their technical marketing team, who were thrown right under the bus for no reason.
 
Last edited:
A class action suit here would be the epitome of ridiculousness. All you have to do is decide if you'd still buy the SKU with its measured price/performance if you'd known about it up front. How could anyone that bought it originally now say they wouldn't, with a straight face? It's an incredible discrete g

Graphics card with excellent performance for the money and frankly we've never had it better.


They've handled it fairly farcically, but given the state of the technology press today they had few options other than the ridiculous diagram and the call with Jonah. I feel sorry for their technical marketing team, who were thrown right under the bus for no reason.

.

Completely agree..

In other way, if this story could put the brands like Nvidia, AMD; Intel,, all the IT brands coming back to reality, and stop communicate and do their buizness like if they was a smartphone vendor company, i will be really happy. ( I have not really any hope about this )

Because, Nvidia is here in cause, but this only the last story of a long long list since thoses last years and every IT company is concerned.
 
Last edited:
A class action suit here would be the epitome of ridiculousness. All you have to do is decide if you'd still buy the SKU with its measured price/performance if you'd known about it up front. How could anyone that bought it originally now say they wouldn't, with a straight face? It's an incredible discrete graphics card with excellent performance for the money and frankly we've never had it better.

They've handled it fairly farcically, but given the state of the technology press today they had few options other than the ridiculous diagram and the call with Jonah. I feel sorry for their technical marketing team, who were thrown right under the bus for no reason.

I disagree throughly.
 
I disagree throughly.

About the class action question or how they have have handled all this story ?

Because lets be honest, with or without class action, Nvidia is now in a damage control slide for their image..

Maybe this was a misstake on the first time, but on the end, nobody have wanted correct it ( i absolutely dont believe that nobody have see it, Nvidia is not Apple lol )
 
Back
Top