The pros and cons of eDRAM/ESRAM in next-gen

Discussion in 'Console Technology' started by Shifty Geezer, Jan 8, 2012.

  1. Betanumerical

    Veteran

    Joined:
    Aug 20, 2007
    Messages:
    1,763
    Likes Received:
    280
    Location:
    In the land of the drop bears
    We know the details, they are normal DMA + LZ encode / decode + jpeg decode.
     
  2. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    15,134
    Likes Received:
    7,680
    I was just too lazy to look it up ;)
     
  3. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    44,106
    Likes Received:
    16,898
    Location:
    Under my bridge
    I think the principle customisation was to change the A to an E... ;)
     
  4. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    15,134
    Likes Received:
    7,680
    Isn't the swizzle and jpeg decode new? Are they extra DMA units? You're right though. It's not like a huge piece of new silicon.
     
  5. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,834
    Likes Received:
    18,634
    Location:
    The North
    So can I assert the following:

    In general this setup has higher theoretical bandwidth, harder to program for, harder to master, slow to maximize but with the pro of a higher ceiling than a simpler architecture. If you were to pinpoint a true weakness it would involve not having monumentally more bandwidth over the competition, instead esram has a 25% more bandwidth (over a simpler competing external architecture). They likely could have gone with higher bandwidth (edram) and more CUs but it would be less than 32mb of working space) which may have been much harder to program for.

    DMEs are there to help saturate the bus much like it would over PCIE.
     
  6. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    15,134
    Likes Received:
    7,680
    The DMEs are there to keep ESRAM filled with useful data as much as possible. Because you can read/write from the ESRAM concurrently, you can DMA data into into from DDR3 while the GPU is reading from ESRAM at the same time. You could also do the opposite. While the GPU writes to ESRAM, the DMEs can copy data over to DDR3.
     
  7. Betanumerical

    Veteran

    Joined:
    Aug 20, 2007
    Messages:
    1,763
    Likes Received:
    280
    Location:
    In the land of the drop bears
    edram is denser then esram, why would it be smaller? if anything it should be bigger.
     
  8. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,834
    Likes Received:
    18,634
    Location:
    The North

    I am assuming that the idea is to use a much smaller amount.
     
  9. Betanumerical

    Veteran

    Joined:
    Aug 20, 2007
    Messages:
    1,763
    Likes Received:
    280
    Location:
    In the land of the drop bears
    But if the density numbers I am looking are correct (roughly 3x) then even using the same amount would reduce it by a drastic size, also the edram doesn't have to be on the same die you should be able to do a MCM with it and have more of it.
     
  10. steveOrino

    Regular

    Joined:
    Feb 11, 2010
    Messages:
    549
    Likes Received:
    242
    Thats what Intel did in the end. Large pools of sram just didn't make economic sense. Apparently scaling 6T sram is really difficult.
     
  11. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    SRAM doesn't have to be on the same die either, but a second custom die and an MCM adds complexity and cost to the whole project.
    The DF interview indciated that Microsoft wanted a single-chip solution anyway.

    I'm not up on how many foundries are offering new eDRAM products, so I'm also not sure about the supply situation for even discrete components. A few of the known sources of not bleeding edge eDRAM did not do so well in the last year or so.

    Intel modified a version of its leading-edge process and put its best-in-class manufacturing resources behind it. Intel was basically its own answer to all the questions going with eDRAM would have posed to a foundry customer, but even then it didn't target systems with cost targets as lean as the consoles.
     
  12. Lalaland

    Regular

    Joined:
    Feb 24, 2013
    Messages:
    864
    Likes Received:
    693
    And even with their considerable process advantages the EDRAM pool is only for the most expensive SKUs Intel ships, I would hazard a guess that just the MCM + EDRAM is a significant proportion of the entire die cost for XB1
     
  13. Pixel

    Veteran

    Joined:
    Sep 16, 2013
    Messages:
    1,008
    Likes Received:
    477
    They like to talk about energy consumption/efficency being a huge factor in all areas of hardware design including esram/edram.
    If energy consumption was a factor, It was a very very wise decision that power savings of esram over the superior bandwidth and superior size of edram was a significant factor in the choice, and the overall development of the hardware as I dont think gamers anywhere who spend $400-500 on a console, $60 a year for online, and a hundred or more on games a year would tolerate a $2 year jump in their energy bill from a console with a 10% higher power consumption during gameplay gameplay.

    http://energyusecalculator.com/electricity_gameconsole.htm
     
    #813 Pixel, Sep 5, 2014
    Last edited by a moderator: Sep 5, 2014
  14. oldschoolnerd

    Newcomer

    Joined:
    Sep 13, 2013
    Messages:
    65
    Likes Received:
    8
    One of the major benefits over and above the higher bandwidth is that there will be a significant reduction in contention for the system RAM. Removing contention makes everything better.
     
  15. function

    function None functional
    Legend

    Joined:
    Mar 27, 2003
    Messages:
    5,854
    Likes Received:
    4,411
    Location:
    Wrong thread
    And even with their 1.6 gHz edram, Intel are topping out at less BW from their off-die memory than MS's on-die esram.

    I think the pie-in-the-sky "1000 GB/s" PS4 slide has been pretty successful at convincing people that MS's esram sucks. It's easier to make a powerpoint slide than engineer a processor.

    Electricity bill is only one factor, as AMD and Intel's continued focus on processor power draw shows. MS chose a power envelope and engineered a fast solution to fit within it, to be manufactured within their budget constraints.

    For better or worse, they wanted a silent console, and they could only spend so much on cooling. The heatsink in the Xbox One is already more expensive than the both of the heatsinks in the original 360 combined. I'd wager it's more expensive than the one in the more power-hungry PS4, too.

    How much would more performance would MS be able to extract from "monumentally more" esram BW? From within the esram, they have much more than +25% peak BW per CU, as they have fewer CUs. And there will be many situations where even this doesn't add much.

    More bandwidth from an off-die edram pool would have required a very wide off-chip path - much wider than the 360 used (and MS were specifically trying to get away from this design) and wider than even Intel use on their 22nm Iris enbled Uber processors.

    And if you're talking on-chip, then who's going to make that for them ....? Intel? Nope. Renesas on their 45 nm node? Nope. IBM on their 32nm node (at probably a larger die size and goodness knows what engineering cost)?

    I would assert that edram was not a realistic option within their constraints, and that off-die edram would have probably netted them less BW but possibly higher power draw, and that on-die would have been difficult to source from the possibly no-one that could have manufactured it for them.

    The esram's one real weakness is apparently its small size. Even another 16 MB (~40 mm^2) would significantly alter the proposition of using large g-buffers or texturing from it. And at an extra ~ $10 that still seems more attractive than current on or off die edram.

    Using DMEs to saturate the esram would likely also saturate the main memory bus and kill CPU performance through contention.

    DME's are there to allow "processor free" transfer of data between memory pools, and copies within the same pool (most likely main ram).
     
  16. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,834
    Likes Received:
    18,634
    Location:
    The North
    ROFL. Agreed, and I'm guilty of feeling that way too. I often assume that two companies will likely hit the same options, but ultimately select one over the other. There is a reason why both companies did not go with EDRAM, it's likely because it just wasn't the best solution at the time.


    Electricity bill is only one factor, as AMD and Intel's continued focus on processor power draw shows. MS chose a power envelope and engineered a fast solution to fit within it, to be manufactured within their budget constraints.

    Agreed, it's quite luxurious.

    Agreed, for the number of CUs you are correct, I believe I read a paper from AMD indicating 32 CUs require ~700 GB/s to be fully saturated; (if linear calculations apply) that 12 CUs could be fully saturated by approximately 250 GB/s (or very close to the entire system theoretical bandwidth of the system).
    You've more or less summarized everything that has been brought up quite well here.

    Yep, this definitely makes sense to me here. I'm feeling guilty about that PS4 slide now haha.

    This is the part that gets me, this is very much acting as a solution to emulate what one would do with huma correct? Either you have a fully shared address space so that you don't waste additional CPU or GPU cycles in copying data to two separate locations - or you have these DMAs whose job it is, to move data, without taking up those cycles.

    I'm not understanding exactly why having the DMAs go full tilt is necessarily a bad thing? The 4 DMAs are running peak ~25GB/s which is close to 1/2 the bandwidth on DDR and ~12% of ESRAM. According to this, it doesn't always need to contention? RAM -> RAM and ESRAM -> ESRAM (copy) should never contend with each other right?

    From VG Leaks.
    Copy Operation Peak throughput using move engine(s) Peak throughput using shader
    RAM ->RAM 25.6 GB/s 34 GB/s
    RAM ->ESRAM 25.6 GB/s 68 GB/s
    ESRAM -> RAM 25.6 GB/s 68 GB/s
    ESRAM -> ESRAM 25.6 GB/s 51.2 GB/s


    Read more at: http://www.vgleaks.com/world-exclusive-durangos-move-engines
     
    #816 iroboto, Sep 5, 2014
    Last edited by a moderator: Sep 5, 2014
  17. HTupolev

    Regular

    Joined:
    Dec 8, 2012
    Messages:
    936
    Likes Received:
    564
    I'm not sure what you mean, but there's no reason that DMAs wouldn't contend with other things on a bus.
     
  18. Jay

    Jay
    Veteran

    Joined:
    Aug 3, 2013
    Messages:
    4,033
    Likes Received:
    3,428
    Why would you do either of those in the first place?
    The only reason would be to copy it from gpu to cpu memory space and visa versa.
    But then you may as well have just flagged it as shared and use the shared coherency.
     
  19. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,834
    Likes Received:
    18,634
    Location:
    The North
    Right. Wait yes I'm stupid.

    As for copying back to the same pool of memory, yes, I imagine these scenarios aren't often, but certainly going to have someone tell me I'm wrong. As for your second point, makes sense to me but I don't think this is applicable for every scenario right
     
  20. Rangers

    Legend

    Joined:
    Aug 4, 2006
    Messages:
    12,791
    Likes Received:
    1,596

    Yeah, Anandtech talked about this. Basically the reason Intel does a lot of the design decisions they do is because they have a lot of foundries that they own sitting there and need to use them for something rather than let them go to waste. This is different than a lot of other players and all the cost complexities can be very different. Basically, Intel needs lots of things to fab. They may use EDRAM simply partly because of this. If they didn't use EDRAM, some capacity may have gone to waste.

    Intel even got into graphics originally for extraneous reasons. They had extra die space because they were perimeter IO limited from shrinking their CPU's below a certain size, so they started throwing GPU's on them to do something rather than wasting space. Of course now it's very important but originally it was more of an oddball decision.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...