D
Deleted member 2197
Guest
Latest update from Guru3D:
Update #3 - The issue that is behind the issue
http://www.guru3d.com/news-story/does-the-geforce-gtx-970-have-a-memory-allocation-bug.htmlNew info surfaces, Nvidia messed up quite a bit when they send out specs towards press and media like ourselves. As we now know, the GeForce GTX 970 has 56 ROPs, not 64 as listed in their reviewers guides. Having fewer ROPs is not a massive thing here but it exposes a thing or two about effects in the memory subsystem and L2 cache. Combined with some new features in the Maxwell architecture herein we can find the answers of the cards being split up in 3.5GB/0.5GB partions as noted above.
Look above, (and I am truly sorry to make this so complicated, as it really is just that .. complicated). You'll notice that for GTX 970 compared to 980 there are three disabled SMs giving the GTX 970 13 active SM (clusters with things like shader processors). The SMs shown at the top are followed by 256KB L2 caches and then pairs with 32-bit memory controllers located at the bottom. The crossbar is responsible for communication inbetween the SM's, cache en and memory controllers.
You will notice that greyed-out right-hand L2 for this GPU right ? That is a disabled L2 block and each L2 block is tied to ROPs, GTX 970 does not have 2,048KB but instead has 1,792KB of L2 cache. Disabling ROPs and thus L2 like that is actually new and Maxwell exclusive, on Kepler disabling a L2/ROP segment would disable the entire section including a memory controller. So while the L2/ROP unit is disabled, that 8th memory controller to the right still is active and in use.
Now that we know that Maxwell can disable smaller segments and keep the rest activated, we just learned that we can still use the 64-bit memory controllers and associated DRAM, but the final 1/8th L2 cache is missing/disabled. As you can see the DRAM controller actually need to buddy up into the 7th L2 unit, that it the root cause of a big performance issue. The GeForce GTX 970 has a 256-bit bus over a 4GB framebuffer, the memory controllers are all active and in use, but disabling that L2 segment tied to the 8th memory controller will result in the fact that overall L2 performance would operate at half of its normal performance.
Nvidia needed to tackle that problem and did so by splitting the total 4GB memory into a primary (196 GB/sec) 3.5GB partition that makes use of the first seven memory controllers and associated DRAM, then there is a (28 GB/sec) 0.5GB tied to the last 8th memory controller. Nvidia could have and probably should have marketed the card as 3.5GB, or they probably could even have deactivated an entire right side quad and go for a 192-bit memory interface tied to just 3GB of memory but did not pursue that as alternative as this solution offers better performance. Nvidia's claims that games hardly suffer from this design / workaround.
In a rough simplified explanation the disabled L2 unit causes a challenge, an offset performance hit tied to one of the memory controllers. To divert that performance hit the memory is split up into two segments, bypassing the issue at hand, a tweak to get the most out of a lesser situation. Both memory partions are active and in use, the primary 3.5 GB partion is very fast, the 512MB secondary partion is much slower.
Thing is, the quantifying fact is that nobody really has massive issues, dozens and dozens of media have tested the card with in-depth reviews like the ones here on my site. Replicating the stutters and stuff you see in some of the video's, well to date I have not been able to reproduce them unless you do crazy stuff, and I've been on this all weekend. Overall scores are good, and sure if you run out of memory at one point you will see perf drops. But then drop from 8 to like 4x AA right ?
Nvidia messed up badly here .. no doubt about it. The ROP/L2 cache count was goofed up and slipped through the mazes and ended up in their reviewers guides and spec sheets, and really ... they should have called this a 3.5 GB card with an extra layer of L3 cache memory or something. Right now Nvidia is in full damage control, however I will stick to my recommendations, the GeForce GTX 970 is still a card we like very much in the up-to 2560x1440 (WHQD) domain, but it probably should have been called a 3.5 GB product with an added 512MB L3 cache.
To answer my own title question, does Nvidia have a memory allocation bug ? Nope, this all was done per design, Nvidia however failed to communicate this completely with the tech-media and thus in the end, the people that buy the product.