The ESRAM in Durango as a possible performance aid

Discussion in 'Console Technology' started by Rangers, May 4, 2013.

  1. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    I don't know enough about UK retail sites to know how important this random non-technical pre-order PR text is, but I have serious reservations about having a "lightening fast" supercomputer.

    It seems dangerous. It will either lose mass too quickly and float away, losing my investment, or it will cause me to stop eating and waste away in mere nanoseconds.
     
  2. Ketto

    Newcomer

    Joined:
    Jul 30, 2012
    Messages:
    39
    Likes Received:
    0
    Location:
    Winter Park, Florida; and London UK.
    Just marketing words for the masses to gobble up. PS3 was a super computer at one time.

    Regardless, back to the discussion of the eSRAM, I'm actually wondering. Would it be helpful to majority of GPGPU functions in lieu of ACEs? In terms of granularity between compute functions and rendering functions? Not to make this a comparison, but I'd imagine a game that relied heavily on compute functions on PS4 might cause some problems when ported to Xbox One due to 18CU/8ACE/8CL vs 12CU/2ACE/2CL.
     
  3. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    Enhancing the memory pipeline and expanding the throughput of the compute front end lie along two semi-independent axes.

    There are scenarios where having one can make up for the lack of the other, such as finishing latency-sensitive compute jobs faster and obviating the need to queue up so many kernels, or having more kernels to handle the case when on-die storage runs out.

    On the other hand, having a flexible and low-latency memory pool can also increase the utility of having many small compute jobs, meaning there are also situations where more of one could enhance the effectiveness of the other.
     
  4. Michellstar

    Regular

    Joined:
    Mar 5, 2013
    Messages:
    662
    Likes Received:
    380
    Yes, that´s right, like the edram in 360, it´s was a node behind almost every iteration and couldn´t integrate it in the same die.
    We´ll see in oban.


    WiiU Edram is in the same die? Is it 32nm the whole soc?
     
  5. Cyan

    Cyan orange
    Legend

    Joined:
    Apr 24, 2007
    Messages:
    9,734
    Likes Received:
    3,460
    he he , I thought I had read lightning fast but it's obvious I am wrong.

    On a different note, something has been puzzling me as of late.

    The eSRAM is nice and all, but... why the bandwidth's speed is only 102GB/s? :smile:

    I mean, the EDRAM on the Xbox 360 was truly lightening fast -256GB/s- and Sony said that they could attach a small amount of EDRAM to the PS4 GPU featuring 1TB/s :shock: of bandwidth if they wanted to.

    I could have sworn that I read here, if I recall correctly, that 102GB/s is the perfect amount for the eSRAM to fill the ROPs with data on the Xbox One and that's why they didn't increase the speed / bandwidth.

    It seemed a scant amount of bandwidth compared to the X360, but it is all in the name of the much better and great efficiency of modern technology.

    Am I right? Or did I misunderstand a little detail and I am just missing it? Is 102GB/s the perfect amount of bandwidth in order to fill the 16 ROPs with data?
     
    #125 Cyan, Jun 5, 2013
    Last edited by a moderator: Jun 5, 2013
  6. Brad Grenz

    Brad Grenz Philosopher & Poet
    Veteran

    Joined:
    Mar 3, 2005
    Messages:
    2,531
    Likes Received:
    2
    Location:
    Oregon
    I believe the 360 lacked the hardware color and Z compression that are a standard part of modern GPU ROPs. It needed a lot more bandwidth comparatively. If I had to guess I'd say the ESRAM's bandwidth is a function of them not wanting to make the die bigger than it was already going to be just to make the memory faster than it strictly needed to be.
     
  7. Ketto

    Newcomer

    Joined:
    Jul 30, 2012
    Messages:
    39
    Likes Received:
    0
    Location:
    Winter Park, Florida; and London UK.
    #127 Ketto, Jun 5, 2013
    Last edited by a moderator: Jun 5, 2013
  8. MrFox

    MrFox Deludedly Fantastic
    Legend

    Joined:
    Jan 7, 2012
    Messages:
    6,488
    Likes Received:
    5,996
    I don't know much, but I'd say this is almost impossible. It's a completely ridiculous amount of downclock. It means they'd clock it between 480 and 600, instead of 800. Why? From what I read, SRAM is a among the easiest thing to make.
     
  9. (((interference)))

    Veteran

    Joined:
    Sep 10, 2009
    Messages:
    2,499
    Likes Received:
    70
    Yeah, I don't think a downclock is likely, especially such a large downclock.

    Would be pretty bad if true though.
     
  10. Bagel seed

    Veteran

    Joined:
    Jul 23, 2005
    Messages:
    1,533
    Likes Received:
    16
    Supposing this rumor is true, which would be more preferable? Lowering clocks at the cost of performance to get yields up, or staying the course and just launch in 1 or 2 less territories this year?
     
  11. eastmen

    Legend Subscriber

    Joined:
    Mar 17, 2008
    Messages:
    13,878
    Likes Received:
    4,727
    Well the yield issue could last months maybe or a respin or something they are working on could fix it before launch even happens.

    The lowering clocks would boost yields but they will be at a huge disadvantage for the entire console cycle.


    So i'm assuming the yield hit would be best.
     
  12. (((interference)))

    Veteran

    Joined:
    Sep 10, 2009
    Messages:
    2,499
    Likes Received:
    70
    Depends on how much money MS wants to lose on yields.
     
  13. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d

    256 GB/s is the internal bandwidth of the Xenos daughter die when performing color/Z/blend/MSAA operations, at least when a title is able to exercise all those at once. The link between the GPU and the daughter die is 32 GB/s, and the eDRAM is usable for a restricted set of roles.

    For Durango, the ROPs can amplify their bandwidth by going through their color and Z caches first, and then may output to eSRAM.
    No longer on the other side of a dedicated ROP partition, the storage can be used for more general read/write workloads, and in that scenario the eSRAM has three times the bandwidth.

    As for why there's not even more bandwidth, it doesn't seem like the rest of the APU is capable of utilizing much more, and there is a complexity and power cost to having even more connections or higher clocks to the eSRAM.
    ROPs performing Z writes seem to be the largest single client, and the bus is sized to match it.
     
  14. Betanumerical

    Veteran

    Joined:
    Aug 20, 2007
    Messages:
    1,763
    Likes Received:
    280
    Location:
    In the land of the drop bears
    So your saying that the is that the ROPs can fill to something other then the eSRAM? I'm not sure i understand the amplifying bandwidth bit fully.
     
  15. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    One of the scenarios had the ROPs writing to DDR3 memory, the system is described as having the flexibility to send data in either direction.

    The caches that lie between the ROPs and the rest of the pipeline can service memory requests themselves. With at least some reuse of data, fewer accesses need to move on to the next level of storage, freeing that bus for other uses.
     
  16. ERP

    ERP
    Veteran

    Joined:
    Feb 11, 2002
    Messages:
    3,669
    Likes Received:
    49
    Location:
    Redmond, WA
    The frame buffer is compressed, there are caches between the ROPS and the memory.
    So you can actually exceed the peak memory bandwidth.
    This is true of any modern GPU, though not 360.
    It's also worth noting that the 360 can only use all of it's bandwidth with 4XMSAA enabled.
    There are some additional advantages to the eSRAM when doing frame buffer operations the relatively short read/write transition time might be a win.
     
  17. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,724
    Likes Received:
    195
    Location:
    Stateless
    Looking at those figures, it should be possible to calculate the aggregate bandwidth figures for a 32 or 16 ROPs GPU, right?
    I'm not sure about how those dat path adds up, but I would think that the ROPs "internally" (and speaking of aggregate bandwidth) have at least the same amount of bandwidth as the L2 to L1 link (450GB/s at cayman clock speed,).
    So lots of bandwidth in case of data reuse?
     
  18. Hornet

    Newcomer

    Joined:
    Nov 28, 2009
    Messages:
    120
    Likes Received:
    0
    Location:
    Italy
    Aggregate bandwidth numbers for any recent AMD GPU are available in their OpenCL documentation.
    For instance, for Pitcairn XT:
    - 15360 GB/s to the register files
    - 2560 GB/s to the local memories
    - 320 GB/s to the constant memory
    - 1280 GB/s to the L1 caches
    - 512 GB/s to the L2 cache
    Bandwidth to the L2 cache in GCN is bounded to the number of memory channels so I expect it to be 512 GB/s in the Xbox One.
    Caches not relevant to GPGPU such as the ones inside the ROPs are not described in detail.
     
  19. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,724
    Likes Received:
    195
    Location:
    Stateless
    Thanks for your research, thought those numbers do not include the "inner" ROPs bandwidth (as you are pointing out).
    I don't think that ROPs changed much from Cayman to GCN, though they have been completely decoupled from the L2 iirc (which doesn't happen that often sadly... :lol: ).
     
    #139 liolio, Jun 5, 2013
    Last edited by a moderator: Jun 5, 2013
  20. Michellstar

    Regular

    Joined:
    Mar 5, 2013
    Messages:
    662
    Likes Received:
    380

    How can it be?

    Amd is producing L3 caches that clock past 2Ghz, not that big though.
    Durango ESRAM is below the gigaherz.

    If they have to downclock the apu to 800/900 Gflops, they would better off scraping Durango altogether, and go with a discrete config. The soc might come off cool at 22nm


    What strikes me is all those rumours of durango being hot (devs-kits actually), now the apu itself, 6 months delays, and any word/rumour from Sony camp.

    When they share: process, foundry, design, almost copycat config apart from the esram..

    What is going on?
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...