NVIDIA Maxwell Speculation Thread

Discussion in 'Architecture and Products' started by Arun, Feb 9, 2011.

Tags:
  1. MDolenc

    Regular

    Joined:
    May 26, 2002
    Messages:
    696
    Likes Received:
    446
    Location:
    Slovenia
    Care to run this one as well? It's slightly modified version of the slightly modified original. :) This one will report amount of free memory after each allocation. I still think that parts of the later allocated buffers end up in pinnned memory.
     

    Attached Files:

    • Rec.zip
      File size:
      7.5 KB
      Views:
      11
  2. kukreknecmi

    Newcomer

    Joined:
    Nov 14, 2013
    Messages:
    7
    Likes Received:
    0
    By crippling 970, they may have reduced the avaible connections for crossbar(s) so that after a certain amount of ram is consumed / adressed, the crossbar(s) dont have enough free connections to connect SMM's with L2, causing stalls / delays.
     
  3. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    This is more mystery to me, since L2 cache is tied to MCs (and ROPs), how can they disable L2 but not ROPs/MCs? It's true that the GTX 970 has always shown some odd results compared to GTX 980 in some synthetic tests which is likely related to that (check hardware.fr fillrate results for instance), but I still fail to understand how this actually works. I guess the nice diagrams need some more details to make sense there...
     
  4. elect

    Newcomer

    Joined:
    Mar 21, 2012
    Messages:
    50
    Likes Received:
    5
    I never understood why bw and memory capacity are linked each others.. somebody can explain to me?
     
  5. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,541
    Likes Received:
    964
    Bus width is tied to the number of memory chips, because a typical GDDR5 chip has a bus width of 32 bits. So a 512-bit bus would lead to 512/32 = 16 memory chips, while a 384-bit bus would lead to 12 chips.

    Memory chips usually (always?) have a capacity that is a power of two, e.g. 256MB or 512MB. So if the number of chips is a power of two, you'll have a memory capacity that's a power of two as well. And since 32 = 2^5, it really depends on whether the bus width is a power of two.

    Some GPUs support mixed capacities, e.g. with 12 chips (384-bit bus) you might have 8 256MB chips and 4 512MB chips, for a total of 8×256 + 4×512 = 4096MB or 4GiB, so it's not a hard rule. But memory controllers usually can't handle this optimally, so it may lead to reduced effective memory throughput.
     
    elect likes this.
  6. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
  7. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,176
    Location:
    La-la land
    @fellix
    That explanation doesn't jive with what has been said from NV officially. Also, the guy doesn't seem to be an NV employee, so his credentials are unknown to me - why should I trust what he says? Has he really any idea at all what he's talking about? ;)
     
  8. homerdog

    homerdog donator of the year
    Legend Subscriber

    Joined:
    Jul 25, 2008
    Messages:
    6,294
    Likes Received:
    1,075
    Location:
    still camping with a mauler
    My GTX970 is fantastic. Does this mean once NVIDIA fixes this it well get even better :?:
     
  9. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
    Grall and BRiT like this.
  10. iMacmatician

    Regular

    Joined:
    Jul 24, 2010
    Messages:
    797
    Likes Received:
    223
    AnandTech also explores the memory of the GTX 970.

    I found the following note on memory bandwidth particularly interesting:

     
    Grall likes this.
  11. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    So, does that mean the access to the smaller segment is essentially "uncached" (or partially cached)?
     
  12. Pressure

    Veteran

    Joined:
    Mar 30, 2004
    Messages:
    1,655
    Likes Received:
    593
    Hello Geforce FX (8x1 or 4x2), is that you?
     
    Ollo likes this.
  13. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    Now this makes sense (anandtech's article is pretty good too). So L2 cache is still directly tied to ROPs, but instead of 4x16 ROPs as was believed initially it is really still octo-rops like previous generations (8x8). And of course the new ability to have only one L2/ROP partition active per 2x32bit MC channel is pretty interesting.
     
  14. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    The Anandtech article states that the 7th crossbar port and its L2/ROP slice can access both memory controllers. That seems to indicate the ability to cache from the 8th channel.
    The way accesses to the 8th channel block the rest of the crossbar is interesting.
    Some kind of hash would be used to stripe addresses across the clients, but it's like some portion of the pipeline is wedded to the assumption that the memory partitions are an all or nothing affair.

    I wonder what would happen if the 7th channel were not included in the high-performance stride. If there's a routing concern at the partition level, perhaps the serialization penalty goes away if a strided access doesn't run into an ambiguous situation where a request needs to be interpreted with one hash function versus the other.
    There would be a larger hit to capacity (an unintended consequence of being able to get away with a narrower bus is that yield measures become more expensive in capacity and bandwidth), and possibly the ROPs in that partition may not be flexible enough to coordinate with the main stride.

    It's curious as well because AMD's Tahiti added a secondary crossbar between ROPs and memory channels (gone with Hawaii, if I recall correctly), but it may be the case that it was easier because those did not tie so tightly to the L2, and the assumed 1:1 link between L2 and controller was not broken.
     
  15. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    Looks like the disabling of an L2 partition is the reason for the split memory pools. That would mean the particular L2 partition in this case can't be shared between two channels with the short link interface at the same time, so there goes the XOR access switch.
     
  16. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,714
    Likes Received:
    2,135
    Location:
    London
    So, what it boils down to is that NVidia engineering found a way to make a <256-bit bus chip, after de-activation, that could be advertised as a 256-bit chip. Pretty useful for the marketing department, eh? "Same bus width as the GTX980."
     
  17. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    3GB is straddling an awkward threshold for available memory capacity before resorting bus transfers, so there is a decent benefit to keeping 25% of capacity, even if half of that remainder is significantly behind primary bandwidth.

    It does seem like tiptoeing around the limits of the secondary link is a fair amount of hassle for what appears to be, of all things, a cache yield recovery measure. It might not have so much benefit if the bus widths hadn't been so narrow and kept the card right at the cusp of capacity issues.
    A more flexible memory system may have been deferred for after Nvidia starts transitioning to newer memory types.
     
  18. nomedo

    Newcomer

    Joined:
    Jan 2, 2007
    Messages:
    5
    Likes Received:
    0
    That Anandtech article was good and seems to have solved most questions.
    But what i take away from this is that Nvidia has a lot of options for releasing new products based on GM204 in the future.
    For example a GTX970Ti and GTX960Ti...

    GTX980 2048cores, 64rops, 256bit, 224GB/s, 4GB, $549
    GTX970Ti 1792cores, 64rops, 256bit, 224GB/s, 4GB, $399-449
    GTX970 1664cores, 56rops, 224+32bit, 196+28GB/s, 3.5+0.5GB, $329
    GTX960Ti 1536cores, 56rops, 224bit, 196GB/s, 3.5GB, $249-279
     
  19. revan

    Newcomer

    Joined:
    Nov 9, 2007
    Messages:
    55
    Likes Received:
    18
    Location:
    look in the sunrise ..will find me
    Wonderful! They (NVidia) did such marvel with the GTX's 970 memory subsystem from a tech point of view!. But what about that they are damn liars (and persist with the lies for months!) , false advertising and misleading their own customers and even the IT press? Isn't that marvelous to?
     
  20. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,714
    Likes Received:
    2,135
    Location:
    London
    What about a chip where half the L2s are turned off?... Still 256-bit...

    Why is L2 so prone to failure that a SKU arises with one broken? Surely L2 should be easy to keep working. e.g. with Bulldozer, a 6-core processor has the full 8MB of cache.

    I wonder if this is NVidia's strategy to keep people/AIBs from overclocking 970 so that it exceeds 980? Hobble an L2 arbitrarily (it isn't actually broken) and the chips will always be slower than 980...
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...