NVIDIA Maxwell Speculation Thread

When I read that, I seriously saw in my head Jen-Hsun on the stage showing the "puppy Fermi"
 
and our software engineers can keep less frequently used data in the 512MB segment
D3D12 gives control of all GPU memory to the developer (apart from the chunk the OS won't let go) - one good reason being that developers are sick of IHV software engineers moving the goalposts with every driver release.
 
Did Jen-Hsun really try to pull the "it's a feature" card?
Does he even surf the internet to know from moment zero that people have been vaccinated against that move for a while?
 
D3D12 gives control of all GPU memory to the developer (apart from the chunk the OS won't let go) - one good reason being that developers are sick of IHV software engineers moving the goalposts with every driver release.
That does not in any way prevent strange memory layouts. It also does not give developer any control about how resources get laid out within GPU memory. What it does is it gives developer option to control lifetime of resources so that driver/d3d does not have to figure that out on its own.

Did Jen-Hsun really try to pull the "it's a feature" card?
Does he even surf the internet to know from moment zero that people have been vaccinated against that move for a while?
Well people are vaccinated against many things. That does not mean that this isn't a feature though. It has been noted in this very thread that without this feature there would card would have to be sold as advertised (4GB, 64 ROPs, 2MB L2) or cut down to 3GB, 48ROPs 1.5MB L2. Given that this is a quite a tech forum and that NV obviously lied and GTX 970 is not as advertised it would be nice to figure out how much this lie actually hurts GTX 970 owners. I think we can all agree that 3GB option would be worse and since we can't have a GTX 980 with 13 SMs this requires some creativity.
All the creativity seems to end at looking at GPU-Z memory usage graphs though. Where people seem to believe that it directly indicates how much resources does a particular game create. And the conclusion is that since GTX 980 will often go to full 4GB while GTX 970 will stay at 3.5GB at same settings that there must be 512MB of spilled over resources. Which is false. The few actual comparisons with GTX 980 that we do have online don't show drastic differences in behaviour between two cards.
This does not excuse NV about lying about the specs but this whole jihad is getting way above what we saw with GeForce FX and that was an issue that affected users much more then what this one does.
 
First of all, in case I didn't make it clear earlier in this thread, I agree that the reactions all around the web are exaggerated.
But to call this a "consumer-oriented" feature is quite the stretch.

Back in Kepler days, nVidia sold both the GTX680 and 670 with all the ROPs, L2 cache, memory channels and memory amount.
The GK104 with assumedly damaged and disabled ROP, memory controller and L2 partitions was made as a third card, the GTX 660 Ti.

Maxwell came and they now have a GTX x70 with disabled partitions in the backend and L2 cache, probably so they can harvest more chips for the x70's price point, ultimately making more money from the same wafer.
In the end, the GTX 970 is actually a midway of what would be a GTX 960 Ti and a "full" 970. Similarly, the 970's release price is also halfway between the release prices they had for the 660 Ti and 670.
I guess the 670 proved to be a lot more popular so they probably figured out they would make more money selling a weathered down and cheaper GTX x70 than two different cards around it this time.

Is it a feature? Yes. But make no mistake: it's not a feature for the end consumer. It's a feature for nVidia to make more money out of the same GM104 wafers.
 
If NVIDIA hadn't lied about the specs of the 970 and released two versions, one with the full L2 at $400 and one without at $330, I wonder how many folks would spring for the $400 version. I'm a bit OCD when it comes to this stuff so I probly would have lol.
 
First of all, in case I didn't make it clear earlier in this thread, I agree that the reactions all around the web are exaggerated.
But to call this a "consumer-oriented" feature is quite the stretch.

Back in Kepler days, nVidia sold both the GTX680 and 670 with all the ROPs, L2 cache, memory channels and memory amount.
The GK104 with assumedly damaged and disabled ROP, memory controller and L2 partitions was made as a third card, the GTX 660 Ti.

Maxwell came and they now have a GTX x70 with disabled partitions in the backend and L2 cache, probably so they can harvest more chips for the x70's price point, ultimately making more money from the same wafer.
In the end, the GTX 970 is actually a midway of what would be a GTX 960 Ti and a "full" 970. Similarly, the 970's release price is also halfway between the release prices they had for the 660 Ti and 670.
I guess the 670 proved to be a lot more popular so they probably figured out they would make more money selling a weathered down and cheaper GTX x70 than two different cards around it this time.

Is it a feature? Yes. But make no mistake: it's not a feature for the end consumer. It's a feature for nVidia to make more money out of the same GM104 wafers.
That I mostly agree with.

I still have a problem with the "midway" point though. Technically yes, GTX 970 is about as close to midway between "GTX 960Ti" and "full GTX 970" as you could possibly get. However as it was already pointed out 8 ROPs that are missing are essentially dead weight as shader core can't possibly keep up anyway. Memory controllers are still all there and though last ROP block is missing half it's L2 cache the remaining cache is still twice the size of what a Kepler ROP block had. It all comes down to how well the partially disabled block can handle simultaneous reads and writes over its one crossbar connection.

We already know that a single resource can seamlessly spawn across the 3.5GB border (the unintentional side effect of that cuda benchmark). So driver probably doesn't have to restructure a bunch of resources when this happens. So we're left with a fact that GTX 970 on paper should still be able to write 28 bytes into striped partition and read 4 bytes from slow partition in one cycle. And while this ratio might sound like a lot we do live in a world of deferred renderers where writing 32 bytes per pixel into gbuffer is something quite normal.

If this works "as advertised" it could open some interesting options for future GM200 salvage parts (for example 1 fully disabled ROP cluster vs. 2 partially disabled).
 
Technically, Nvidia have provide a features, after disable the SMX's in the 970, they have been able to keep a 224bit MC and 3.5GB at full speed.... as you cant disable it by lasercut, they have decide to do it by software.. this said, the last 512MB and the last 32bit memory controller could be accessed.. but in practice if you want to use both partition.. the performance will be dramatically bad.... So, even if you could access it, you dont want to do it.

Now, the real problem is not to have been able to use a bus of 224bit instead of 192bit, and save 512MB of memory ( for a total of really usable 3.5GB ), the problem is to have sold it like a gpu with a 256 bit bus and 4GB memory available " normally " all the time "....

If Nvidia have, since the first day explain the work they have done for increase the capability of the 970, who, with his unit disabled, should have end as a 192bit bus 3GB gpu.... good... but that was not the case when they have release the card..

Specifcations of the bus memory is wrong, specification of the memory amount usable is wrong, Rop's number are wrong, and the L2 cache size is wrong.. the problem is not what is is the 970 and what is his performance, but the marketing and what was the specifications of the card at launch..


Could they use this last 32mb of MC and the last 512mb, yes.. show me a games right now, where it can be the case.. why will you that Nvidia driver developper start to do complex optimization for use this last partition of memory.. 2x more time to optimize a games for a 970 vs the 980 ?

Please no company with half a brain will do this ..


If this works "as advertised" it could open some interesting options for future GM200 salvage parts (for example 1 fully disabled ROP cluster vs. 2 partially disabled).

Explain me, what you will store in a "partition segment, who is 32x slower than the rest of the storage in the gpu" .. and how, on the driver side, you want to specially loose the developper time for it..

Nvidia have do a good thing, a real enginner work, by finally been able, looking at the configuration of Fermi, kepler, and Maxwell, to been able to finish with a 224bit bus memory controller and a 3.5GB memory ....

The problem lay on the last segment, that they cant lasercut, and is here, but it will never been usable in an optimized condition...

I still dont understand why they have not do like AMD.. bring independant the memory controllerom the SM.. damn limitation from Nvidia base design.. they are so tied to this..

AMD can disable any SMX in a chip, and can keep the full ROP partition, the memory controller and the "memory partition " related to it..

Seriously, " your if it work as advertized", make me ask me, just why they are not able to provide something similar of AMD, instead of trying to find just "solution" to a problem who should not exist at first.
 
Last edited:
Technically, Nvidia have provide a features, after disable the SMX's in the 970, they have been able to keep a 224bit MC and 3.5GB at full speed.... as you cant disable it by lasercut, they have decide to do it by software.. this said, the last 512MB and the last 32bit memory controller could be accessed.. but in practice if you want to use both partition.. the performance will be dramatically bad.... So, even if you could access it, you dont want to do it.
It's arguably faster than PCI-E and much lower latency. Using the same heuristics as for what you spill into host memory and read/write via PCI-E seems sensible to me (just don't spill to PCI-E before you run out of this memory pool). The one place where this is a real problem and *might* NOT work reliably is for compute (i.e. what many people are using to test using CUDA). where it's much harder to predict access patterns.

As an extension to this mechanism, it would be possible for the driver to change what is in that memory pool *dynamically*, e.g. if the driver detects a texture is never used, put it in that pool instead of something else (realistically games are going to have plenty of rarely used textures). The main complexity would be avoiding any slowdown while copying those textures from one pool to the other, but assuming there is a limit to this per frame, it should be fairly negligible (e.g. maybe 1 second to completely change what is in that pool with no slowdown).

That doesn't change the fact it is NOT as good as a true 4GB GPU and NVIDIA downright lied about the presence of 64 ROPs (or was misinformed supposedly) - unlike the memory controller story, there is no way to spin that one, there's only 56 ROPs enabled on the die and that's that. Given the excellent performance of the GTX 970 and the number of crazy absurd low-level hardware bugs on *ANY* modern GPU that require obscure workarounds to avoid the chip from creating small black holes though (and reduce performance in ways mere mortals may never comprehend), I feel all of this is a severe overreaction and I have a hard time sympathising with greedy lawyers who will likely be the only ones getting any significant money out of this charade.

I hope this leads to NVIDIA finally getting more honest and precise with their technical marketing. They should realise that this scandal could only have been avoided by being more open from the start, rather than disclosing less information so there's less rope to hang them with - after all, it would have been quite hard *not* to specify the amount of video memory on the card...
 
It's arguably faster than PCI-E and much lower latency. Using the same heuristics as for what you spill into host memory and read/write via PCI-E seems sensible to me (just don't spill to PCI-E before you run out of this memory pool). The one place where this is a real problem and *might* NOT work reliably is for compute (i.e. what many people are using to test using CUDA). where it's much harder to predict access patterns.
And this is precisely why advanced techniques under D3D12 (mixed compute/traditional graphics pipeline, developer managed sub-buffers of persistent memory allocations) are going to fall foul.

NVidia wants a future with, say, 30 SKUs, each of which has 2 or more regions of memory of varying sizes and varying performance levels (rare will be the SKUs that have "flat" memory, they'll be the premium pieces). And NVidia expects developers to trust them to get the best from this nightmare maze of performance considerations...

Alternatively, one could argue that within 18 months we could be looking at graphics cards equipped with 4 or 8GB of HBM, backed by say 16 or 32GB of cheap as chips memory, so we better get used to "tiered-performance VRAM". Well, I'm not going to argue for that, but maybe someone else wants to run with it...
 

http://www.pcgameshardware.de/Total-War-Attila-PC-259548/Specials/Test-Benchmarks-1151602/

Nothing to see here folks, move along now. GTX970 is working as intended.
8jEuGMj.jpg
 
And this is precisely why advanced techniques under D3D12 (mixed compute/traditional graphics pipeline, developer managed sub-buffers of persistent memory allocations) are going to fall foul.
Yup. Personally, I would never mind having only 3.5GB instead of 4GB. Or 224bit instead of 256bit. Seriously, who cares?
Jittery performance because of games not being aware that some of the memory is slower (as evidenced in the previous post) is the real problem.
Nvidia is being a bit dishonest – but only in the sense that with today's games and APIs, it would be best to just expose 3.5GB/224bit and nothing more. It is still better than 3GB/192bit, while avoiding potential trouble. Having those last 512MB is more about marketing, less about allowing maximum performance, as there is no way to be certain that future games won't have problems with this.
 
Yup. Personally, I would never mind having only 3.5GB instead of 4GB. Or 224bit instead of 256bit. Seriously, who cares?
Jittery performance because of games not being aware that some of the memory is slower (as evidenced in the previous post) is the real problem.
Nvidia is being a bit dishonest – but only in the sense that with today's games and APIs, it would be best to just expose 3.5GB/224bit and nothing more. It is still better than 3GB/192bit, while avoiding potential trouble. Having those last 512MB is more about marketing, less about allowing maximum performance, as there is no way to be certain that future games won't have problems with this.

The problem is they dont sell you the gpu with 52 rop 224bit and 3.5GB.. they sell you the gpu as a 256bit bus, 64rop and 4GB of memory. Personally, if i had a 970 i will surely not care about the difference.

But the problem is if we are ok with that, the next time, they will sell you a 8GB gpu with only 6GB available, a 4096 shader gpu with only 3072SP available with funny slide telling you that they have design the gpu for 8K gaming ...
 
Last edited:
The problem is they dont sell you the gpu with 52 rop 224bit and 3.5GB.. they sell you the gpu as a 256bit bus, 64rop and 4GB of memory. Personally, if i had a 970 i will surely not care about the difference.

But the problem is if we are ok with that, the next time, they will sell you a 8GB gpu with only 6GB available, a 4096 shader gpu with only 3072SP available with funny slide telling you that they have design the gpu for 8K gaming ...
That's what reviews are for. One determines the performance from reviews and buys if it's what one needs. Whether the card has 3.5 or 4GB changes almost nothing, even in terms of futureproofing.
If need be, I'll just lower some performance settings. With the weird memory layout, there's no certainty that lowering quality settings will help.
 
That's what reviews are for. One determines the performance from reviews and buys if it's what one needs. Whether the card has 3.5 or 4GB changes almost nothing, even in terms of futureproofing.
If need be, I'll just lower some performance settings. With the weird memory layout, there's no certainty that lowering quality settings will help.

I completely agree that it will not change much on performance side... the review have been done with the same gpu. But im not right with the fact we should let company do this without worrying about it, or we open the door to all marketing tricks possible in the future...
Its just my personal opinion and this involve only me.

This said, i think it is really time we close this case and go back to more interessant and constructive discussion.
 
Last edited:
In the future NVIDIA can claim their card has up to 4GB of memory serving up to 64 ROPs via an up to 256-bit bus, then everybody should be happy.
 
Back
Top