The ESRAM in Durango as a possible performance aid

Discussion in 'Console Technology' started by Rangers, May 4, 2013.

  1. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    The question is related to the particulars of how those links work in relation to the eSRAM, which doesn't exist in those documents.
    Some of the glaring performance deficiencies should hopefully be improved since Llano.
     
  2. loekf

    Regular

    Joined:
    Jun 29, 2003
    Messages:
    617
    Likes Received:
    65
    Location:
    Nijmegen, The Netherlands
    Some sites write 102 Gbit/s and not 102 Gbyte/s

    It's the same old discussion how you write bit or byte. I'm used to a capital "B" for byte.

    Maybe an open door (hose #11 in dutch), but 102 Gb/s divided by 64-bits = 1.6 GHz... close to
    the actual clock speed of the CPU cores....

    (assuming there's a 64-bits on-chip bus ... going to an array of single port SRAM)

    BTW... 32 MB SRAM in 28 nm is according to some info on the web:

    26 mm2 without routing overhead, approx. 36 mm2 with routing overhead (75% util). So that's approx. 10% of the total die ?
    I would assume there's some repair overhead and test logic to compensate for yield losses, so maybe closer to 40 mm2 is more realistic.
     
    #62 loekf, May 23, 2013
    Last edited by a moderator: May 23, 2013
  3. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,661
    Likes Received:
    1,114
    The 102GB/s write bandwidth is likely determined by the 16 ROPs writing a maximum of 8 bytes per cycle at 800MHz.

    Both ROPs and texture units can read data so the requirement for read bandwidth is always going to be larger.

    Cheers
     
  4. loekf

    Regular

    Joined:
    Jun 29, 2003
    Messages:
    617
    Likes Received:
    65
    Location:
    Nijmegen, The Netherlands
    Are you implying that the SRAM can only be accessed from the GPU side ?

    Throught it was both, but the numbers do match up:

    GPU: 800 MHz x 8 x 16 = ~102 GB/s

    CPU: 64-bits data bus x 1.6 GHz (or 64-bits R-bus + 64-bits W-bus x 800 MHz) = 102 GB/s
     
  5. Betanumerical

    Veteran

    Joined:
    Aug 20, 2007
    Messages:
    1,763
    Likes Received:
    280
    Location:
    In the land of the drop bears
    I don't remember reading that the CPU has a databus that goes the eSRAM it looks like it has to through the north bridge, with each CPU module (2 modules) has a max speed of 20.8GB/s. R/W. Or are you talking about something else, i cannot tell.
     
    #65 Betanumerical, May 23, 2013
    Last edited by a moderator: May 23, 2013
  6. Strange

    Veteran

    Joined:
    May 16, 2007
    Messages:
    1,698
    Likes Received:
    428
    Location:
    Somewhere out there
    Problem is how low latency compared to the DDRs is it and how much of a scratchpad is it?
    Does the current understanding of eSRAM fall safely under sebbbi's criteria for bandwidth savings?
     
  7. XpiderMX

    Veteran

    Joined:
    Mar 14, 2012
    Messages:
    1,768
    Likes Received:
    0
    From AnandTech:

     
  8. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Does XBox One have ROPs?

    EDRAM in XBox 360 was to support fixed function hardware, i.e. ROPs.

    Since it's quite possible to write a pixel shader that doesn't output pixels (instead it simply writes data to/from memory) it's possible this architecture is ROP-less.

    This would be so cool.
     
  9. Betanumerical

    Veteran

    Joined:
    Aug 20, 2007
    Messages:
    1,763
    Likes Received:
    280
    Location:
    In the land of the drop bears
    Yes, 8 of them, for a peak bandwidth of 102GB/s.
     
  10. Rangers

    Legend

    Joined:
    Aug 4, 2006
    Messages:
    12,791
    Likes Received:
    1,596
    isn't it 16?
     
  11. mosen

    Regular

    Joined:
    Mar 30, 2013
    Messages:
    452
    Likes Received:
    152

    4 DB and 4 CB

    http://www.vgleaks.com/durango-gpu-2/3/
     
  12. Betanumerical

    Veteran

    Joined:
    Aug 20, 2007
    Messages:
    1,763
    Likes Received:
    280
    Location:
    In the land of the drop bears
    Yeah it is, my bad. I remembered 8 for some reason.
     
  13. Urian

    Regular

    Joined:
    Aug 23, 2003
    Messages:
    622
    Likes Received:
    55
    Well I suppose that we are talking about 4 RBE units then
     
  14. Brad Grenz

    Brad Grenz Philosopher & Poet
    Veteran

    Joined:
    Mar 3, 2005
    Messages:
    2,531
    Likes Received:
    2
    Location:
    Oregon
  15. loekf

    Regular

    Joined:
    Jun 29, 2003
    Messages:
    617
    Likes Received:
    65
    Location:
    Nijmegen, The Netherlands
    Hmm... any idea why they didn't go for a kind of system cache ? Kind of L3 with configurable consumers sitting between the CPUs L2, CPU and DRAM.
     
  16. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,610
    Likes Received:
    825
    Takes a lot more space.
     
  17. Mandrion

    Newcomer

    Joined:
    May 15, 2013
    Messages:
    14
    Likes Received:
    0
    The whole system reservation article from Kotaku made me think about the memory bandwidth.

    Doenst the system also need to reserve some bandwidth from the Main RAM?
    I guess the same as for the GPU, roughly 90%.

    So i think 60GB/s for the game seems reasonable?

    Edit: argh wrong thread...
     
    #77 Mandrion, May 27, 2013
    Last edited by a moderator: May 27, 2013
  18. Homeles

    Newcomer

    Joined:
    May 25, 2012
    Messages:
    234
    Likes Received:
    0
    Say we have 100 GB/s eDRAM with 1/10th the latency (say 1ns vs 10ns) of 100 GB/s GDDR5.

    Where can low latency be useful with both frame rendering and with GPGPU? Could someone cite specific examples? I.e., what tasks would benefit?

    I just can't imagine that the memory subsystem of the Xbox One cannot outperform the PS4's GDDR5 in any metric. With graphics, bandwidth is far and away the more important resource to have, but there has to be some significant scenario where the 32MB static eDRAM holds an advantage. Is there really no reason for Microsoft to chose DDR3 + eDRAM over GDDR5 other than cost and power/heat?

    Disclaimer: from here on out I ramble a bit. If you are able to answer the questions above, it'd be much appreciated. The following isn't as important to me.
    ________________________________________

    Microsoft may be assuming that the cost of 32MB of static eDRAM is going to significantly decrease in cost as it goes onto 20nm and 14nm in the coming years. There has been an awful lot of noise about 20nm and 14nm offering little improvement, no improvement, or regression in cost per transistor compared to 28nm. I've heard that 14nm is of particular concern, because the R&D cost and wafer costs will raise to the point that only the very largest companies will be able to profit, leading to some crazy semiconductor mass extinction event. Perhaps the scaling of SRAM is large enough to overcome the relatively higher cost? Nvidia's claims of cost regression on TSMC's 20nm process are presumably based on average GPU transistor cost, while SRAM should fare better.

    GloFo's a bit of an oddball, though. They're moving to gate last with 20nm, which may help them (or their customers) in the cost department. Am I wrong on this? I understand gate last means lower density, but the higher yield would mean lower cost. I suppose it could end up meaning higher cost if the yield improvement is not large enough to counteract the density hit, or it could result in no ground being made in the cost department at all. Performance will move forward, of course, but I can't help but wonder whether it's density or yield that wins out.

    Their 14nm process is also some hybrid contraption. Does anyone know how their decision to shrink the transistors while standing pat on the interconnect will turn out? Which would benefit more, cost or the electrical performance? I really wish I understood more about the subject, but to me it seems like their decision would result in lowered cost, while performance would not be moving forward much.

    Does anyone know how the cost of 32 MB static eDRAM on a 28nm process + 8GB DDR3 2133 compares to 8GB GDDR5? Which is the more expensive memory solution on today's market and manufacturing processes? I doubt a measly 32MB of memory is enough to fully negate the cost gap between DDR3 and GDDR5. I wish there was more transparency when it came to part costs, so there would be some way to estimate how much Microsoft is saving by choosing eDRAM.

    Anand stated that eDRAM would be relatively inexpensive in the long run, thanks to node advancements. He also stated that Sony is married to the more expensive GDDR5, but is there any reason why Sony couldn't get GDDR5 that's been ported to a newer process? I'd imagine that their volume would be high enough to warrant such a move. I suppose it'd be up to either Samsung or Hynix to conduct the R&D for it, though.

    One final bit: obviously one of the biggest reasons consoles work so well is because they have a fixed set of hardware. Is there not any room for performance to improve over each console's respective lifespan? Surely something simple like increasing clock speed won't throw things off, would it?

    If we ignore the cost of doing so, would moving to DDR4 break compatibility between the Xbox One and a theoretical Xbox DDR4? If not, I suppose Sony could theoretically implement GDDR6. Would stacked DRAM require a revised memory controller? I'm sure GDDR6 and stacked DRAM would be hilarious overkill, but it's fun to imagine.

    How about doubling the eDRAM size? I've seen some questioning over the usefulness of such a small frame buffer. I believe I saw some criticism for Haswell GT3e's 128MB eDRAM as well. What would 64MB allow us to do, where 32MB would fall short? 128MB? 256MB? At what point does the advantage disappear, and we simply have too much memory to do anything interesting with?
     
  19. dobwal

    Legend

    Joined:
    Oct 26, 2005
    Messages:
    5,955
    Likes Received:
    2,326
    Lay person here. But isn't the latency between eDRAM/eSRAM and off chip GDDR5/DDR3 much larger than being discussed?

    Latency of video DRAM (time its takes to service a memory request) can be the same regardless if its in your PC or a server 100s of miles away. However, a memory request from your PC to a server isn't going to met as quickly as a memory request to your PC's memory. The temporal latency may be the same but spatial latency involved is drastically different.

    Isn't on-chip memory much faster simply because data has a shorter distance to travel as well as the ability to service requests faster (requiring less cycles)?
     
    #79 dobwal, May 28, 2013
    Last edited by a moderator: May 28, 2013
  20. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    I haven't heard the terms spatial and temporal latency before. Maybe you're thinking of latency vs bandwidth?

    Propagation delay does play a role in adding to memory latency but it's very small. Around 0.1ns per cm at most. The memory on PS4 shouldn't be more than a few cm away from the APU so I don't think it'll make a big difference.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...