Care to run this one as well? It's slightly modified version of the slightly modified original. This one will report amount of free memory after each allocation. I still think that parts of the later allocated buffers end up in pinnned memory.
Care to run this one as well? It's slightly modified version of the slightly modified original. This one will report amount of free memory after each allocation. I still think that parts of the later allocated buffers end up in pinnned memory.
This is more mystery to me, since L2 cache is tied to MCs (and ROPs), how can they disable L2 but not ROPs/MCs? It's true that the GTX 970 has always shown some odd results compared to GTX 980 in some synthetic tests which is likely related to that (check hardware.fr fillrate results for instance), but I still fail to understand how this actually works. I guess the nice diagrams need some more details to make sense there...On 12 SMM - 2MiB@SiSoft - 256-Bit GTX 980M you get a full 4/8GB memory pool. Its probably this additional deactivation of L2-Cache on GTX 970 GM204-200, which brings this problem. SiSoft read through CUDA from the first day, that there are only 1.8MB L2 on GTX 970.
It now seems likely GM200's die is 600+ mm^2. That very large size gives room for extra pins, possibly enough to allow a 512 bit memory bus. NVidia hasn't used such a wide bus since GT200's similar sized 572 mm^2 die. Fermi and Kepler largest chips were both smaller (529 and 561 respectively) and both had only had room for 384 bit busses. A 512 bit bus would give big Maxwell a rather dramatic and welcomed memory bandwidth boost.
A 512 bit wide bus would give big Maxwell memory capacities of 4, 8, or even 16 GB. 384 wide gives 3, 6, and 12.
GT200 was using GDDR3, not GDDR5, but I doubt that change limited bus widths.
I never understood why bw and memory capacity are linked each others.. somebody can explain to me?
(2nd page) said:This in turn is why the 224GB/sec memory bandwidth number for the GTX 970 is technically correct and yet still not entirely useful as we move past the memory controllers, as it is not possible to actually get that much bandwidth at once on the read side. GTX 970 can read the 3.5GB segment at 196GB/sec (7GHz * 7 ports * 32-bits), or it can read the 512MB segment at 28GB/sec, but not both at once; it is a true XOR situation. Furthermore because the 512MB segment cannot be read at the same time as the 3.5GB segment, reading this segment blocks accessing the 3.5GB segment for that cycle, further reducing the effective memory bandwidth of the card. The larger the percentage of the time the crossbar is reading the 512MB segment, the lower the effective memory bandwidth from the 3.5GB segment.
Now this makes sense (anandtech's article is pretty good too). So L2 cache is still directly tied to ROPs, but instead of 4x16 ROPs as was believed initially it is really still octo-rops like previous generations (8x8). And of course the new ability to have only one L2/ROP partition active per 2x32bit MC channel is pretty interesting.http://www.pcper.com/reviews/Graphi...Full-Memory-Structure-and-Limitations-GTX-970
There's more on it
GTX 970 was advertised to have 64 ROPs and 2048KB L2 (on reviewer's guide at least), in reality it has 56 ROPs and 1792KB L2 (and of those 56 ROPs, only 52 are effectively in use due limitations in SMM department)
So, does that mean the access to the smaller segment is essentially "uncached" (or partially cached)?
So, what it boils down to is that NVidia engineering found a way to make a <256-bit bus chip, after de-activation, that could be advertised as a 256-bit chip. Pretty useful for the marketing department, eh? "Same bus width as the GTX980."
What about a chip where half the L2s are turned off?... Still 256-bit...That Anandtech article was good and seems to have solved most questions.
But what i take away from this is that Nvidia has a lot of options for releasing new products based on GM204 in the future.
For example a GTX970Ti and GTX960Ti...
GTX980 2048cores, 64rops, 256bit, 224GB/s, 4GB, $549
GTX970Ti 1792cores, 64rops, 256bit, 224GB/s, 4GB, $399-449
GTX970 1664cores, 56rops, 224+32bit, 196+28GB/s, 3.5+0.5GB, $329
GTX960Ti 1536cores, 56rops, 224bit, 196GB/s, 3.5GB, $249-279