NVIDIA Maxwell Speculation Thread

32bits is the maximum for the last 512MB, because of the way they have disabled the ROP/L2 (which normally would make them also disable that MC) to which it should be connected and it operates separately from the other MCs, so the maximum bandwidth is just for one MC (32bits) while reading/writing to that, while the other 3.5GB can use the full 224 bits from the 7 MCs combined (and the 980 can use the full 8 MCs = 256bits)

I can see why you think (effective) bus width is between 224 and 256 bits. But each 32 bit MC is connected "only" to 512 MB memory chip. The problem with 970 is that read or write cannot be performed by all eight of them at once. They can still claim 256 bit bus performance if they ensure last partition will do the opposite. Whether it is possible I cannot imagine. But if they can handle it, then it also implies all the memory chips will be in use at once.. aren't 4 GB still possible?
 
I can see why you think (effective) bus width is between 224 and 256 bits. But each 32 bit MC is connected "only" to 512 MB memory chip. The problem with 970 is that read or write cannot be performed by all eight of them at once. They can still claim 256 bit bus performance if they ensure last partition will do the opposite. Whether it is possible I cannot imagine. But if they can handle it, then it also implies all the memory chips will be in use at once.. aren't 4 GB still possible?

7 can be combined, an effective 224bits bus, which is the best case (ignoring the isolated 32bits 512MB, like this card works under 3.5GB usage), 1 cannot, it looks pretty simple to me, this card is not a 256bits card like the 980 which can combine all 8 for a read or whatever, it's 224 at best, and if you want to use the other 512MB you are limited for that portion to 32bits, you cannot read both at the same time as a 256bits bus like the 980!?
 
7 can be combined, an effective 224bits bus, which is the best case (ignoring the isolated 32bits 512MB, like this card works under 3.5GB usage), 1 cannot, it looks pretty simple to me, this card is not a 256bits card like the 980 which can combine all 8 for a read or whatever, it's 224 at best, and if you want to use the other 512MB you are limited for that portion to 32bits, you cannot read both at the same time as a 256bits bus like the 980!?
As pointed out shortly after at GTX970 introduction by hardware.fr (and subsequently picked up by TechReport): even with a GTX980 memory system and ROPs, the 13 SMs are not sufficient to feed the ROPs at full speed for regular read and write operations. It can only do so for some ROP blending operations. So the peak BW reduction is much more academic than the slow access to the top 0.5GB.

(http://techreport.com/blog/27143/here-another-reason-the-geforce-gtx-970-is-slower-than-the-gtx-980)
 
As pointed out shortly after at GTX970 introduction by hardware.fr (and subsequently picked up by TechReport): even with a GTX980 memory system and ROPs, the 13 SMs are not sufficient to feed the ROPs at full speed for regular read and write operations. It can only do so for some ROP blending operations. So the peak BW reduction is much more academic than the slow access to the top 0.5GB.

(http://techreport.com/blog/27143/here-another-reason-the-geforce-gtx-970-is-slower-than-the-gtx-980)


the article you posted was written in October, before they knew the card's real specs, don't you think the testing is also showing the effect of the disabled stuff we know about now (ROP/L2/Mem bandwidth)? and not just the SMs limitation to 52? it looks like they assumed the difference was just because of the SMs limitation and were wrong
 
7 can be combined, an effective 224bits bus, which is the best case (ignoring the isolated 32bits 512MB, like this card works under 3.5GB usage), 1 cannot, it looks pretty simple to me, this card is not a 256bits card like the 980 which can combine all 8 for a read or whatever, it's 224 at best, and if you want to use the other 512MB you are limited for that portion to 32bits, you cannot read both at the same time as a 256bits bus like the 980!?

If you want all reads or writes, then 224 bits really is the best case. Because two controllers are managed by one ROP partition, there is only one read and one write path to the crossbar. But if one MC will do read and other write, you can pass both through at once and get 256 bit transfer. It is big if, seeing how halved ROP and cache drives twice many controllers, and needs workload behaving consistently to not kill overall performance. But still not disproved.
 
the article you posted was written in October, before they knew the card's real specs, don't you think the testing is also showing the effect of the disabled stuff we know about now (ROP/L2/Mem bandwidth)? and not just the SMs limitation to 52? it looks like they assumed the difference was just because of the SMs limitation and were wrong
If you look at the ratios that pointed out in the email of Damien, they'd change from 64/52/64 to 64/52/56. That would still keep the bottleneck in the SMs for operations that are limited by this bottleneck. That last part of my sentence is an important detail, of course. My speculation is that pure BW tests such as the CUDA program that showed the issue of the upper 0.5GB would still be limited?
Pure ROP operations such as blending and MSAA would not. Whether or not those last would matter is a differently story: you'd need a very simple shader to reach peak BW to observe it, but how common is that these days in games?

Edit: another factor is the clock domain in which the various units are located...
 
Last edited:
They are making the consumer whole again if that's what they want.

As to having a right to know what you're buying, nobody is obliged to disclose the architectural details of their processor up front. You're within your rights to buy it and return it if it doesn't do what you were expecting for the money you paid, but having the details NV are in trouble over, as part of that expectation, is luxury.

The details are disclosed to reviewers because the enthusiast segment "sells" boards to other segments via recommendation. As soon as that stops being profitable for the vendor, disclosure will go away. This furore puts that in danger, in my opinion anyway.

I agree that no company is obliged to disclose all the architectural details of their products. By if the company chooses to do so (and Nvidia did), it is responsible for the accuracy of the information.
 
I agree that no company is obliged to disclose all the architectural details of their products. By if the company chooses to do so (and Nvidia did), it is responsible for the accuracy of the information.
So you want them to stop disclosing information? That's the likely outcome.
 
There needs to be some discussion of the material impact of the inaccuracy, and just how far you want that responsibility to go.
Nvidia has had differently disabled chips go into the same salvage boards, review guides are not advertisement copy (as close they get these days), and architectural disclosure comes in many forms.

How should we characterize inaccuracies in ISA documents or architectural whitepapers that frequently get referenced in tech articles but not in advertisements, and how strong is that responsibility?
 
There needs to be some discussion of the material impact of the inaccuracy, and just how far you want that responsibility to go.
Nvidia has had differently disabled chips go into the same salvage boards, review guides are not advertisement copy (as close they get these days), and architectural disclosure comes in many forms.

How should we characterize inaccuracies in ISA documents or architectural whitepapers that frequently get referenced in tech articles but not in advertisements, and how strong is that responsibility?

Review guides can be seen as marketing tool, since they're providing the information to reviewers, which in turn provide it for customers.

In (most of, if not all) EU at least customers are eligible to return their cards on false advertisement, which has already been put to effect for GTX 970's by many.
http://www.techpowerup.com/209409/p...s-being-returned-over-memory-controversy.html
 
I gather from the article that customers are returning their cards to retailers citing false advertising, and board sellers are not contesting it. At least one contributing reason is that they are not getting any help from Nvidia in explaining what is going on with an arcane ASIC design parameter.

I find it somewhat ironic that, besides not seeing an indication of any official finding, the article is citing regulations allowing the return of a product due to defects--for a SKU that is for chips with defects.
At least from the standpoint of winning a US case of false advertising, this is a weaker point than the more obvious "GPU core" count lie.
 
Is it really possible to treat the 512MiB pool like some memory between VRAM and RAM@PCIe? If the application is allowed to use full 4GiB memory, it should not be aware that some of this memory is 7-times slower, than other parts?
 
Right, bandwidth arguably doesn't even matter at that point. There are very few situations where, if the GPU needs something that isn't in its directly addressable memory, latency isn't what actually murders performance. It's nuanced (the two are inextricably linked), but finding things to do for thousands of clocks is very hard and will almost guarantee to starve the chip.

Have there been recent tests of PCIe latency?
Some tests a few years back measured transfer latency in microseconds, sometimes over 10 depending on driver and device settings.
Thousands of clocks might be generous by an 1-2 orders of magnitude for a bus transfer, and that is in the case of straightforward microbenchmarks rather than a less structured real-world environment.
 
They are making the consumer whole again if that's what they want.

As to having a right to know what you're buying, nobody is obliged to disclose the architectural details of their processor up front. You're within your rights to buy it and return it if it doesn't do what you were expecting for the money you paid, but having the details NV are in trouble over, as part of that expectation, is luxury.

The details are disclosed to reviewers because the enthusiast segment "sells" boards to other segments via recommendation. As soon as that stops being profitable for the vendor, disclosure will go away. This furore puts that in danger, in my opinion anyway.

Because it is optional doesn't mean that you can't be sued for false advertising if you use it as part of the advertising of your product. For example, it isn't mandatory to label gluton free food products as gluton free. It's completely optional. However, if you do market it as gluton free because you'll sell more product due to that fact, but it isn't gluton free, you're going to be in a world of trouble. Similarly if you market a car (because everyone loves a car analogy, right? :p) as having 8 cylinders but it actually only has 6 (optional disclosure), you're going to be in trouble with various marketing laws.

I'm not saying Nvidia should or shouldn't be sued over this. Just that by being optionally disclosed, it does not then grant it immunity from having to follow the laws within any given country. It is pretty trivial. But I've seen things get sued successfully for even more trivial things (Nutella in the US, for example.)

Regards,
SB
 
Is it really possible to treat the 512MiB pool like some memory between VRAM and RAM@PCIe? If the application is allowed to use full 4GiB memory, it should not be aware that some of this memory is 7-times slower, than other parts?
Since you can't read to both pools at once and copying from 1 to the other is as slow as the 32bit mc and block the whole card from accessing vram most likely, I don't see how that would make an effecting cache.
 
Because it is optional doesn't mean that you can't be sued for false advertising if you use it as part of the advertising of your product. For example, it isn't mandatory to label gluton free food products as gluton free. It's completely optional. However, if you do market it as gluton free because you'll sell more product due to that fact, but it isn't gluton free, you're going to be in a world of trouble.
For the US, food marketing is monitored by the FDA, so there is a legal framework and a governmental agency that sets down a large number of rules for what can be labelled on thing or another.
It's more than just a personal lawsuit or class action to break those limits. However, the reality is that gluten-free is not "absolutely no molecules of gluten".
There is leeway even then.
http://www.usatoday.com/story/news/...ten-free-labeling-rules-take-effect/13618741/

If a food with a significant amount of gluten is labeled gluten-free, there would very likely be a number of personal-injury or possibly some wrongful death lawsuits, given why such foods came to be products in the first place.

I do not know of the legal framework for whether a discrete board uses its RAM appropriately. The RAM is there, and the device's peak numbers besides the ROP count can be hit. The scenarios where it can do so are limited, but they exist.

Similarly if you market a car (because everyone loves a car analogy, right? :p) as having 8 cylinders but it actually only has 6 (optional disclosure), you're going to be in trouble with various marketing laws.
What about how hybrid cars advertise having Atkinson cycle engines, even though their use of valve timing for otherwise unmodified engines is more a Miller cycle?

It can be harder to make a successful suit when it comes down to technical distinctions with unclear material impact.
 
I don't remember that.
What about the video engines on anything pre geforce 8600GT/8500GT/8400GS : there was a video engine advertised but it was pretty useless, such as MPEG2 assistance in specific video players and partial assistance in H264 and VC-1 which maybe hardly anyone ever used. It took years for video decoders to get support in video players, flash plugin etc. (DXVA and VDPAU) by which time the early video engines were 100% useless and deprecated.
Those old midrange NV chips do support full H.264 and Flash, and most of VC1. Maybe you're thinking of G80 since it had the same video processing as G7x.
 
Last edited:
Have there been recent tests of PCIe latency?
Some tests a few years back measured transfer latency in microseconds, sometimes over 10 depending on driver and device settings.
Thousands of clocks might be generous by an 1-2 orders of magnitude for a bus transfer, and that is in the case of straightforward microbenchmarks rather than a less structured real-world environment.
I don't know what it is on modern systems, could be interesting to test it out. Modern GPUs have ~250 cycles of latency tolerance, which at ~1GHz is around 0.25uS before the chip is starved. So you're right, if PCIe transfer latency is at least as high as a microsecond, if you have to wait for it then you stall your chip.
 
I gather from the article that customers are returning their cards to retailers citing false advertising, and board sellers are not contesting it. At least one contributing reason is that they are not getting any help from Nvidia in explaining what is going on with an arcane ASIC design parameter.

I find it somewhat ironic that, besides not seeing an indication of any official finding, the article is citing regulations allowing the return of a product due to defects--for a SKU that is for chips with defects.
At least from the standpoint of winning a US case of false advertising, this is a weaker point than the more obvious "GPU core" count lie.

I'm not so sure, since NVIDIA only talks about "CUDA cores"—a concept they invented and probably never formally defined. It's a bit like saying their GPUs have billions of thingamajigs.
 
True, the fluff does allow for them to argue their claim is technically correct, which is the best kind of correct--some would say.
 
Back
Top