Next Generation Hardware Speculation with a Technical Spin [pre E3 2019]

Discussion in 'Console Technology' started by TheAlSpark, Dec 31, 2018.

Thread Status:
Not open for further replies.
  1. anexanhume

    Veteran

    Joined:
    Dec 5, 2011
    Messages:
    2,078
    Likes Received:
    1,535
    It’s something AMD has talked about before too.

    https://gpuopen.com/texel-shading/
     
    chris1515, pharma and milk like this.
  2. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    My recollection of kkreiger's process was that it ran through the creation steps for the assets during the game's load time. The constructed game would then have much more than 96KB in RAM that allowed it to reuse those results in successive frames. An algorithm fetching instructions, inputs, and looping intermediate buffers is going to generate more accesses than it would take to later access those results, and the load time seemed significant enough that it's unlikely that the more significant serial component of running through a compressed list of steps could be hidden if done on the fly. It would be a net loss if those results were discarded nearly immediately and done again in the next frame.

    The effectiveness of a top-level cache would depend on how many accesses hit the top-level versus the acceleration structure in the bottom-level. It seems like the majority of accesses in decently complex objects would be in the bottom level.
    TLBs and page walker buffers tend to store a limited set of most recently used entries. The higher levels tend to change less frequently than the lower ones, and the buffers can leverage temporal locality to save misses to cache or memory.
    A large table of top-level instances may still be too big for the storage available in the L1 or local buffers of an RT core, but it's able to get some level of spatial or temporal locality, a buffer of the current object and some of the most recently traversed BVH nodes could be applied to multiple rays.

    RT does increase shading load and has a compute burden from BVH construction/update as well. Bandwidth use can increase, though it's apparently early days in finding out how games in general behave with it. There may be future optimizations beyond just conserving raw bandwidth, such as finding better ways of controlling the divergence of accesses. Disjoint accesses could potentially lead to stalls in the RT hardware or memory subsystem, which would in raw bandwidth terms look deceptively low.

    Perhaps at some point, the most recent products and announcements still had endurance fall short of SRAM and DRAM, which would make it less viable for on-die caches operating in the GHz range. Another trade-off for removing the standby current of SRAM besides endurance is write energy, which has historically been significantly higher.

    What access granularity problem would there be? HBM has 8 independent 128-bit channels with a burst length of 2, so 256 bits per burst. GDDR5 has a 32-bit channel and burst length of 8, so 256 bits as well.
    GDDR5X was the one that doubled prefetch on a 32-bit channel and got bursts of 512 bits.
    One of the reasons cited for GDDR6's transition to two channels was to stop the increase in the width of the internal array accesses, so GDDR6 drops back down to the 256 bits per access.

    The other kind of granularity is the page width of the DRAM, which is usually around 2KB. GDDR5X in some configurations also doubles this, whereas HBM's pseudo-channel mode can actually halve the page width.
     
  3. anexanhume

    Veteran

    Joined:
    Dec 5, 2011
    Messages:
    2,078
    Likes Received:
    1,535
    https://www.eetimes.com/document.asp?doc_id=1332180
     
  4. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    My interpretation is that it's not access granularity but granularity in how readily the designers could scale bandwidth up or down. HBM was expensive and could only be scaled up at the granularity of full stacks. If the bandwidth was in excess of the console's need, they'd be paying for the integration and full stack anyway. If insufficient, their next step could only be multiplying the expense by an integer factor.
     
    anexanhume likes this.
  5. anexanhume

    Veteran

    Joined:
    Dec 5, 2011
    Messages:
    2,078
    Likes Received:
    1,535
    That seems a fair point in that context.
     
  6. itsmydamnation

    Veteran

    Joined:
    Apr 29, 2007
    Messages:
    1,349
    Likes Received:
    470
    Location:
    Australia
    man this thread is circular :runaway:

    im going to quote myself from like march :-D
    I am probably less bullish on HBM for the next consoles, but i still think the logic applies.
     
  7. anexanhume

    Veteran

    Joined:
    Dec 5, 2011
    Messages:
    2,078
    Likes Received:
    1,535
    https://www.extremetech.com/computi...double-hbm2-manufacturing-fail-to-meet-demand

    HBM density is going to subject to the same costs that GDDR6 is, as they're fundamentally the same basic cells. Complicating things by making it a local cache and not adding to overall capacity doesn't seem to be the way to go.
     
  8. itsmydamnation

    Veteran

    Joined:
    Apr 29, 2007
    Messages:
    1,349
    Likes Received:
    470
    Location:
    Australia
    Its not to be a cache, you could choose to expose and developer managed or not expose it directly and have the memory system manage ( just like on the latest Xeon phi) and make that option developer controlled. The point is to keep the GPU write heavy addresses in the place with the most bandwidth. keep the cost of the high bandwidth in check by limiting its size.
     
  9. anexanhume

    Veteran

    Joined:
    Dec 5, 2011
    Messages:
    2,078
    Likes Received:
    1,535
    This would be the original Xbox One approach. The capacity is certainly higher, so it would eliminate that developer complaint at least.
     
    egoless likes this.
  10. MrFox

    MrFox Deludedly Fantastic
    Legend

    Joined:
    Jan 7, 2012
    Messages:
    6,488
    Likes Received:
    5,996
    Actually, the revised jedec doc now have added clear designation of HBM1 and HBM2. The front page also changed the document title to "High Bandwidth Memory Dram (HBM1, HBM2)". Further down it explains HBM1 are devices supporting legacy mode (full channel 128bit and BL 2 only) and HBM2 for devices supporting the split pseudo channel mode (2x 64bit and BL 4 at twice the clock). Maybe HBM3 will be a simple update with four pseudo channels, 32bit, Burst of 8 and a doubled clock again.
     
    BRiT and anexanhume like this.
  11. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    Interesting, perhaps my last review of the specs was in a time window between the addition of HBM2 and a later clarification. From what I recall at the time the separation hadn't been made.
    Some of the scaling challenges predicted back when Nvidia first started talking about future memory standards was the cost of DRAM array access. Post-HBM2 bandwidths had corresponding increases to the power consumption of row and column activation that had the arrays themselves consuming more power than the interfaces HBM replaced.

    HBM2's row access consumption was lower, which may have been a nod to pseudo-channel mode or the array organization that permits it. There wasn't much clarity on what was expected to be done for later HBM versions. A theoretical HBMx standard that tried to reduce array power would still be pulling prohibitive amounts of power if bandwidth scaled as wanted.
     
  12. anexanhume

    Veteran

    Joined:
    Dec 5, 2011
    Messages:
    2,078
    Likes Received:
    1,535
    Vega 64 is down to $400 at retail and has 8GB of HBM2. Is it approaching economic viability for consoles, supply issues aside?
     
  13. metacore

    Newcomer

    Joined:
    Sep 30, 2011
    Messages:
    111
    Likes Received:
    86
    If Samsung said , even if they would double production capacity there would still be shortages to anyone interested, and meanwhile nvidia opted to use gddr6 in 1200$ cards... ehh.gif
     
  14. anexanhume

    Veteran

    Joined:
    Dec 5, 2011
    Messages:
    2,078
    Likes Received:
    1,535
    By the time consoles launch, that Samsung quote could be over 2 years old. Taken in conjunction with high pricing it helps make the case, but it becomes a little weaker if viewed in isolation.

    Nvidia went with GDDR6 because they likely wanted a common memory controller design for their consumer products. Also, GDDR6 can equal HBM bandwidth with a 352 bit interface for 2 stacks or less. Once higher sped modules are available, it can reach 3-stack bandwidth levels. It’s definitely more cost effective.
     
  15. MrFox

    MrFox Deludedly Fantastic
    Legend

    Joined:
    Jan 7, 2012
    Messages:
    6,488
    Likes Received:
    5,996
    Yeah it's very recent, from Nov 2018 revision B.

    It's unclear whether hbm1 have been or is being used in other products, like networking custom silicon or high end fpga stuff.

    Some companies making controllers IP unofficially called the latest revision compatibility as HBM2E. Before that it was just a list of supported features. Hynix call it HBM2, and only Samsung is being a dick calling them a trademarked name like aquabolt.

    I suppose the array power that rise would somewhat be countered by hbm3 being on 7nm (hbm2 is 14nm-ish?). Do dram arrays scale that way?

    For the PHY, there was rambus saying hbm3 would be a 3D stacking instead of 2.5D. There's no data about 3d integration power saving, but maybe it could keep the array power the domimant figure.
     
    #115 MrFox, Jan 8, 2019
    Last edited: Jan 8, 2019
  16. anexanhume

    Veteran

    Joined:
    Dec 5, 2011
    Messages:
    2,078
    Likes Received:
    1,535
    This post is a bit old at this point, but it explains DRAM scaling fairly well, as 1xnm and 1ynm are rather inscrutable.

    https://semiengineering.com/whats-next-for-dram/
     
  17. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    DRAM processes don't use the same metrics as logic ones, and generally aren't marketed that clearly. The most recent notable transition was from roughly above 20 to some number somewhat below it. DRAM isn't as concerned with the sort of gate contact pitch measurements logic nodes used to be bound by, so there's not much clarity on how to compare the marketing name of a logic node to the marketing name of a DRAM node.

    If there were an apples to apples comparison, it seems like the consensus is that DRAM is stalling above 10nm.
     
  18. chris1515

    Legend

    Joined:
    Jul 24, 2005
    Messages:
    7,158
    Likes Received:
    7,966
    Location:
    Barcelona Spain
  19. DieH@rd

    Legend

    Joined:
    Sep 20, 2006
    Messages:
    6,387
    Likes Received:
    2,411
    Console games that fully takes advantage of 16 threads of Zen 2 [ok, 2 will be left for OS] just to produce 30fps gameplay will lead to massive upgrade waves on PC.

    I miss the times of old, when Crytek had the balls to create game that's actually taking in consideration unreleased hardware [and the game still looked phenomenal on medium settings]. Even before that game was released, there were report that gamers invested $1B in upgrades just for that game.
     
    egoless and McHuj like this.
  20. JPT

    JPT
    Veteran

    Joined:
    Apr 15, 2007
    Messages:
    2,505
    Likes Received:
    943
    Location:
    Oslo, Norway
    But did Crytek earn any extra money by doing it? Or was it just Crytek tooting their own horn?
     
    milk likes this.
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...