NVIDIA Maxwell Speculation Thread

Discussion in 'Architecture and Products' started by Arun, Feb 9, 2011.

Tags:
  1. Putas

    Regular Newcomer

    Joined:
    Nov 7, 2004
    Messages:
    428
    Likes Received:
    90
    I can see why you think (effective) bus width is between 224 and 256 bits. But each 32 bit MC is connected "only" to 512 MB memory chip. The problem with 970 is that read or write cannot be performed by all eight of them at once. They can still claim 256 bit bus performance if they ensure last partition will do the opposite. Whether it is possible I cannot imagine. But if they can handle it, then it also implies all the memory chips will be in use at once.. aren't 4 GB still possible?
     
  2. HMBR

    Regular

    Joined:
    Mar 24, 2009
    Messages:
    417
    Likes Received:
    105
    7 can be combined, an effective 224bits bus, which is the best case (ignoring the isolated 32bits 512MB, like this card works under 3.5GB usage), 1 cannot, it looks pretty simple to me, this card is not a 256bits card like the 980 which can combine all 8 for a read or whatever, it's 224 at best, and if you want to use the other 512MB you are limited for that portion to 32bits, you cannot read both at the same time as a 256bits bus like the 980!?
     
  3. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,380
    As pointed out shortly after at GTX970 introduction by hardware.fr (and subsequently picked up by TechReport): even with a GTX980 memory system and ROPs, the 13 SMs are not sufficient to feed the ROPs at full speed for regular read and write operations. It can only do so for some ROP blending operations. So the peak BW reduction is much more academic than the slow access to the top 0.5GB.

    (http://techreport.com/blog/27143/here-another-reason-the-geforce-gtx-970-is-slower-than-the-gtx-980)
     
  4. HMBR

    Regular

    Joined:
    Mar 24, 2009
    Messages:
    417
    Likes Received:
    105

    the article you posted was written in October, before they knew the card's real specs, don't you think the testing is also showing the effect of the disabled stuff we know about now (ROP/L2/Mem bandwidth)? and not just the SMs limitation to 52? it looks like they assumed the difference was just because of the SMs limitation and were wrong
     
  5. Putas

    Regular Newcomer

    Joined:
    Nov 7, 2004
    Messages:
    428
    Likes Received:
    90
    If you want all reads or writes, then 224 bits really is the best case. Because two controllers are managed by one ROP partition, there is only one read and one write path to the crossbar. But if one MC will do read and other write, you can pass both through at once and get 256 bit transfer. It is big if, seeing how halved ROP and cache drives twice many controllers, and needs workload behaving consistently to not kill overall performance. But still not disproved.
     
  6. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,380
    If you look at the ratios that pointed out in the email of Damien, they'd change from 64/52/64 to 64/52/56. That would still keep the bottleneck in the SMs for operations that are limited by this bottleneck. That last part of my sentence is an important detail, of course. My speculation is that pure BW tests such as the CUDA program that showed the issue of the upper 0.5GB would still be limited?
    Pure ROP operations such as blending and MSAA would not. Whether or not those last would matter is a differently story: you'd need a very simple shader to reach peak BW to observe it, but how common is that these days in games?

    Edit: another factor is the clock domain in which the various units are located...
     
    #2906 silent_guy, Jan 29, 2015
    Last edited: Jan 29, 2015
  7. fbomber

    Newcomer

    Joined:
    Jun 9, 2004
    Messages:
    156
    Likes Received:
    17
    I agree that no company is obliged to disclose all the architectural details of their products. By if the company chooses to do so (and Nvidia did), it is responsible for the accuracy of the information.
     
  8. RecessionCone

    Regular Subscriber

    Joined:
    Feb 27, 2010
    Messages:
    501
    Likes Received:
    178
    So you want them to stop disclosing information? That's the likely outcome.
     
  9. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,365
    Likes Received:
    3,955
    Location:
    Well within 3d
    There needs to be some discussion of the material impact of the inaccuracy, and just how far you want that responsibility to go.
    Nvidia has had differently disabled chips go into the same salvage boards, review guides are not advertisement copy (as close they get these days), and architectural disclosure comes in many forms.

    How should we characterize inaccuracies in ISA documents or architectural whitepapers that frequently get referenced in tech articles but not in advertisements, and how strong is that responsibility?
     
  10. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,128
    Likes Received:
    3,018
    Location:
    Finland
    Review guides can be seen as marketing tool, since they're providing the information to reviewers, which in turn provide it for customers.

    In (most of, if not all) EU at least customers are eligible to return their cards on false advertisement, which has already been put to effect for GTX 970's by many.
    http://www.techpowerup.com/209409/p...s-being-returned-over-memory-controversy.html
     
  11. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,365
    Likes Received:
    3,955
    Location:
    Well within 3d
    I gather from the article that customers are returning their cards to retailers citing false advertising, and board sellers are not contesting it. At least one contributing reason is that they are not getting any help from Nvidia in explaining what is going on with an arcane ASIC design parameter.

    I find it somewhat ironic that, besides not seeing an indication of any official finding, the article is citing regulations allowing the return of a product due to defects--for a SKU that is for chips with defects.
    At least from the standpoint of winning a US case of false advertising, this is a weaker point than the more obvious "GPU core" count lie.
     
  12. AnarchX

    Veteran

    Joined:
    Apr 19, 2007
    Messages:
    1,559
    Likes Received:
    34
    Is it really possible to treat the 512MiB pool like some memory between VRAM and RAM@PCIe? If the application is allowed to use full 4GiB memory, it should not be aware that some of this memory is 7-times slower, than other parts?
     
  13. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,365
    Likes Received:
    3,955
    Location:
    Well within 3d
    Have there been recent tests of PCIe latency?
    Some tests a few years back measured transfer latency in microseconds, sometimes over 10 depending on driver and device settings.
    Thousands of clocks might be generous by an 1-2 orders of magnitude for a bus transfer, and that is in the case of straightforward microbenchmarks rather than a less structured real-world environment.
     
  14. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    17,082
    Likes Received:
    6,418
    Because it is optional doesn't mean that you can't be sued for false advertising if you use it as part of the advertising of your product. For example, it isn't mandatory to label gluton free food products as gluton free. It's completely optional. However, if you do market it as gluton free because you'll sell more product due to that fact, but it isn't gluton free, you're going to be in a world of trouble. Similarly if you market a car (because everyone loves a car analogy, right? :p) as having 8 cylinders but it actually only has 6 (optional disclosure), you're going to be in trouble with various marketing laws.

    I'm not saying Nvidia should or shouldn't be sued over this. Just that by being optionally disclosed, it does not then grant it immunity from having to follow the laws within any given country. It is pretty trivial. But I've seen things get sued successfully for even more trivial things (Nutella in the US, for example.)

    Regards,
    SB
     
  15. Esrever

    Regular Newcomer

    Joined:
    Feb 6, 2013
    Messages:
    768
    Likes Received:
    532
    Since you can't read to both pools at once and copying from 1 to the other is as slow as the 32bit mc and block the whole card from accessing vram most likely, I don't see how that would make an effecting cache.
     
  16. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,365
    Likes Received:
    3,955
    Location:
    Well within 3d
    For the US, food marketing is monitored by the FDA, so there is a legal framework and a governmental agency that sets down a large number of rules for what can be labelled on thing or another.
    It's more than just a personal lawsuit or class action to break those limits. However, the reality is that gluten-free is not "absolutely no molecules of gluten".
    There is leeway even then.
    http://www.usatoday.com/story/news/...ten-free-labeling-rules-take-effect/13618741/

    If a food with a significant amount of gluten is labeled gluten-free, there would very likely be a number of personal-injury or possibly some wrongful death lawsuits, given why such foods came to be products in the first place.

    I do not know of the legal framework for whether a discrete board uses its RAM appropriately. The RAM is there, and the device's peak numbers besides the ROP count can be hit. The scenarios where it can do so are limited, but they exist.

    What about how hybrid cars advertise having Atkinson cycle engines, even though their use of valve timing for otherwise unmodified engines is more a Miller cycle?

    It can be harder to make a successful suit when it comes down to technical distinctions with unclear material impact.
     
  17. swaaye

    swaaye Entirely Suboptimal
    Legend

    Joined:
    Mar 15, 2003
    Messages:
    8,591
    Likes Received:
    673
    Location:
    WI, USA
    Those old midrange NV chips do support full H.264 and Flash, and most of VC1. Maybe you're thinking of G80 since it had the same video processing as G7x.
     
    #2917 swaaye, Jan 29, 2015
    Last edited: Jan 29, 2015
  18. Rys

    Rys Moderator
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,165
    Likes Received:
    1,466
    Location:
    Beyond3D HQ
    I don't know what it is on modern systems, could be interesting to test it out. Modern GPUs have ~250 cycles of latency tolerance, which at ~1GHz is around 0.25uS before the chip is starved. So you're right, if PCIe transfer latency is at least as high as a microsecond, if you have to wait for it then you stall your chip.
     
  19. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,515
    Likes Received:
    934
    I'm not so sure, since NVIDIA only talks about "CUDA cores"—a concept they invented and probably never formally defined. It's a bit like saying their GPUs have billions of thingamajigs.
     
    homerdog and Grall like this.
  20. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,365
    Likes Received:
    3,955
    Location:
    Well within 3d
    True, the fluff does allow for them to argue their claim is technically correct, which is the best kind of correct--some would say.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...