The pros and cons of eDRAM/ESRAM in next-gen

Discussion in 'Console Technology' started by Shifty Geezer, Jan 8, 2012.

  1. Rangers

    Legend

    Joined:
    Aug 4, 2006
    Messages:
    12,791
    Likes Received:
    1,596
    The decision was supposedly down to ease of fabbing. ESRAM has more options, supposedly. Also I'm not sure the situation of how easy on die EDRAM is.

    For 2 more CU's they could have just enabled the two redundant ones, which word is they strongly considered, but rejected the idea, perhaps in favor of an upclock though I see no reason for them to be exclusive other than perhaps penny pinching.
     
  2. Rockster

    Regular

    Joined:
    Nov 5, 2003
    Messages:
    973
    Likes Received:
    129
    Location:
    On my rock
    Regardless of EDRAM vs ESRAM or 14 vs 16 CU's, it was their choice to do what they did. I mean it would have only been nominally bigger anyway.

    Here is 64MB ESRAM @ 408GB/s, 32ROPS, 16CU's at the same process for comparison. Only slightly larger so you get a few less chips per wafer so at the end of the day it would have cost them a couple bucks more per chip. Hindsight being what it is they probably could have afforded it and should have done it. Could have been done even more efficiently than what I've shown here by spreading the mem controllers out along the edge to keep the redundant ESRAM in place to improve yields. Just trying to provide a sense for it.

    [​IMG]
     
  3. Pixel

    Veteran

    Joined:
    Sep 16, 2013
    Messages:
    1,008
    Likes Received:
    477
    Its the same node. The apu are on the 28nm node. I'm not talking about increasing the size of esram scratchpad. I'm talking about switching to edram in the hypotehtical situation it were a mature process at the foundries. Edram has roughly a 3x smaller realestate and transistor count than 8t sram. BTW its not a mistake and its not 8:1 difference.

    According to the interview with digital foundries the distiguished engineer said it wasn't an available option at the foundries at that node, and it sounds like they might have preferred edram. Also there are articles backing up that foundries were transitioning away from maturing their edram processes.
    Thanks.
     
    #243 Pixel, Mar 19, 2014
    Last edited by a moderator: Mar 19, 2014
  4. Rockster

    Regular

    Joined:
    Nov 5, 2003
    Messages:
    973
    Likes Received:
    129
    Location:
    On my rock
    To me the disappointing part was that there was no developer around this time pushing them to go the extra mile. Around the 360 launch EPIC screamed for more RAM and demo'd what Gears would look like w/ 256M vs 512M. Sony was forced to take stock of itself after this meeting and went all in for 8GB of DDR5.

    MS was simply too comfortable with their current position as the best dev environment and shot for good enough rather than best they could possibly muster. Slight altercation to their silicon budget would have made all the difference in the world and they wouldn't be in the situation they are in now. If PS4 games didn't look and run better, and devs had more wriggle room for render targets in ESRAM, then Kinect, multitasking, etc. looks much better everything else being equal.

    64MB ESRAM @ 408GB/s, 32ROPS and 16CU's. Think about it.
     
  5. kots

    Regular

    Joined:
    Oct 30, 2008
    Messages:
    394
    Likes Received:
    0
    It wouldn't change anything ... from the various pdf's about the future of xbox it is quite obvious that they knew they'd have the weaker console (based on the projected prices of both consoles) and they didn't care .
     
  6. Rockster

    Regular

    Joined:
    Nov 5, 2003
    Messages:
    973
    Likes Received:
    129
    Location:
    On my rock
    Things would be different if they were cheaper as per that preso. Trust me. If they knew then what they know now, the Xbox One would not be the same as it is.
     
  7. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d

    Changing RAM capacity goes as far as changing the chips put into an otherwise identical final console.
    Mess with the silicon and you mess with the end product of a process that is far longer and far more expensive.

    RAM capacity is a number that can be readily given, and it actually has direct bearing on things developers work with and can measure.
    Arbitrary silicon parameters on a chip that was designed years before devs can see it with a billion unknown variables provides no concrete or rational thing for developers to push on.

    How is that simple?
     
  8. Rangers

    Legend

    Joined:
    Aug 4, 2006
    Messages:
    12,791
    Likes Received:
    1,596
    Yup, that's major surgery. They were already at 5 billion transistors, that would have pushed them well over 7. Does even Titan/GK110 have 7?
     
  9. Cyan

    Cyan orange
    Legend

    Joined:
    Apr 24, 2007
    Messages:
    9,734
    Likes Received:
    3,460
    I dunno, but my dream machine was (still is) a sick Xbox One called Xbox One 3D or just called Talisman. -like the old Project Talisman, a TBDR GPU created by Microsoft-

    It would feature the exact same GPU that it has now, BUT a GPU for each eye (I miss the days when you could name a GPU, like say... Xenos).

    So it would be dual GPU, exactly the same SoC, with each GPU featuring 32 MB of eSRAM.
     
  10. Barbarian

    Regular

    Joined:
    Jun 27, 2005
    Messages:
    289
    Likes Received:
    15
    Location:
    California, USA
    Actually a much cheaper/smarter modification would've been to turn the ESRAM into full on L3 cache similar to what Intel did with the Iris Pro. It is pretty shocking they didn't think of this, given that (allegedly) the ESRAM is full blown 6-transistor on-chip memory. It's just missing the cache controller!!!
     
  11. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    An L3 cache would be CPU coherent, however.
    Coherent bandwidth for the chip is 30 GB/s, and that is the high water mark for every other design using AMD's current architecture.
     
  12. Barbarian

    Regular

    Joined:
    Jun 27, 2005
    Messages:
    289
    Likes Received:
    15
    Location:
    California, USA
    The GPU already has an L2 cache that I believe is non-coherent with the CPU. Can't the ESRAM just be an L3 cache to that L2 cache?
    I admit I don't fully understand the hardware implications so perhaps what I'm suggesting is not that feasible or cheap to accomplish but it does seem having 1 billion sram transistors sit on a chip and be manually addressable somewhat of a waste.
     
  13. TheAlSpark

    TheAlSpark Moderator
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    22,146
    Likes Received:
    8,533
    Location:
    ಠ_ಠ
    hm... where does the ROP cache fit into the hierarchy :?:

    The L2 on the GPU is for the shader/tex.
     
  14. Rangers

    Legend

    Joined:
    Aug 4, 2006
    Messages:
    12,791
    Likes Received:
    1,596
    Would this have been on balance a good or bad thing?

    You would lose the ability to program/manually tune.

    I dont see Iris Pro necessarily tearing up the performance benchmarks.
     
  15. TheAlSpark

    TheAlSpark Moderator
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    22,146
    Likes Received:
    8,533
    Location:
    ಠ_ಠ
    Iris Pro is in a different performance bracket altogether, and the L4 eDRAM bandwidth is ~100GB/s aggregate.
     
  16. Barbarian

    Regular

    Joined:
    Jun 27, 2005
    Messages:
    289
    Likes Received:
    15
    Location:
    California, USA
    Indeed the Iris Pro is meant as a laptop level graphics chip. It has about 50% of the ALU and bandwidth of Xbox One I believe. But the embedded ram is better design all around - 128Mb of it, it's EDRAM (so way more compact) and more importantly it's part of the cache hierarchy.
    In general most cache memories have some ability to lock part of it so it's manually addressable if needed, but being a true cache makes it just-work on everything out of the box.
     
  17. Rangers

    Legend

    Joined:
    Aug 4, 2006
    Messages:
    12,791
    Likes Received:
    1,596
    Wll Iris Pro is 852 GFLOPS, which is ~2/3 as XOne GPU. Seems as much in XOne class as XOne is in PS4's then.

    I just mean I dont see the 128MB of L4 cache in Iris Pro making it somehow perform like a 1.5GFLOP GPU. Or even any above it's weight really, other than bandwidth.

    I guess if the goal is just ultimate ease of use maybe cache is the way to go. But would I be wrong in thinking manual control is the best way to chase performance, which is what you really want in a console that's under fixed specs for 6 years?

    But I'm out of my knowledge league here. Just my ill informed ideas from 10,000 feet.
     
  18. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    The GPU's memory hierarchy appears to be too primitive to extend in that way.
    The L2 is already physically sliced to match memory controllers, and it serves as the common coherence client for the CUs so that the GPU can be at least weakly coherent within itself.

    The GPU's idea of coherence works because the CU L1s are write-through and the physically partitioned L2 means data can only spill to one place. No coherence checking is needed because there is only one place data can be cached.
    An L3 being pasted on creates another place data could be cached, and that would break the GPU as it is.

    The eSRAM basically stands on the other side of a crossbar as if it's a sort-of memory controller, so by manually addressing it and treating it like a spot of main memory, it's a unique non-cached piece of memory that basically means most of the GPU can operate without a redesign.

    The ROP caches are separate from the vector/texture cache path, and they get their data over an export bus from the CUs instead of the load/store units. They're similarly aligned with memory channels like the L2 slices, but the two cache types don't really operate together.
     
  19. TheAlSpark

    TheAlSpark Moderator
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    22,146
    Likes Received:
    8,533
    Location:
    ಠ_ಠ
    And now the obvious questions :p, why not increase the size of both the L2s and the ROP caches? Or is it that any practical size increases wouldn't be anywhere as useful as just having the scratchpad?
     
  20. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,724
    Likes Received:
    195
    Location:
    Stateless
    I would avoid based completely the difference on FLOPS counts, last Nvidia GPU shows that FLOPs tell a very limited part of the story.
    Not that I expect iris pro to punch above its weight but I'm not sure that its perform as it does because of FLOPS count or bandwidth.
    Usually Intel GPU do terrible with AA, I suspect other issues holding performances like sucky ROPS or fixed function hardware.

    If the question is about esram vs edram I think MSFT's answer is pretty clear they had no choice because of costs and available tech.
    It costs Crystalwell cost INtel "peanuts" and they sell it with huge margins. Now for anybody else I could see those 80mm2 of silicon on a pretty advanced lithography cost a lot of money.
    Then could have AMD done something worthy out of it leaving price alone for a second? I wonder, they have yet to fix their L3 on their main line CPU, the "unification" of the memory subsystem in their APU should come with their next round of products, etc.

    My pov is that the core i7 4770r is a better chip than durango or Liverpool overall, the CPU perfs are not in the same ballpark, power consumption is better, it is still not a decent gaming rig and the price sucks wrt to the gaming perfs it provides.

    Intel solution is great but for others actors I wonder if GDDR5 is a better bet, as the cost of EDRAM and R&D associated with CW might very well cover the extra pennies.

    Microsoft solution, aka using esram, could be a good one though I wonder about the implementation. Say Devs gets their head wrapped around the limitation of the esram wrt to size we are still looking at
    something that is some regards performs as 16 ROPs GPU stuck to GDDR5 through a 128 bit bus though with lot more RAM. I think it will be a while before the 2GB of RAM of cards like the R7 260x and GTx 750i turns into a severe limitation.

    I guess it's going to work yet the whole thing looks costly: lots of silicon dedicated to esram, 256bit bus to the main ram, fast DDR3.
    ------------------------------------

    Overall I wonder if actually the issue is not esram vs edram but UMA vs NUMA design.
    The inner of AMD's APU still seems a bit messy too me, even Sony stated that the bandwidth available to the GPU dropped significantly when the CPU accessed the RAM (iirc and I don't know to which extend it affect perfs in real world usage /over my head).
    Was UMA ready for prime,especially for MSFT for which an all GDDR5 system was out of the picture?

    Looking at the perfs of low mid range GPUs fare against this generation of consoles, I wonder if
    NUMA would have turned into such an issue.

    I think of something like this:
    6/8 GB of DDR3 on a 128 bit bus, cheap and standard/cheap one 1600.
    1/2 GB of fast GDDR5 on a 128 bit bus.
    a single chip but the CPU and GPU are connected through a fast on chip PCI express type of link, a "discrete GPU on chip" type of set-up more than a "not that ready for prime heterogeneous processor wannabee".

    The chip would have been a lot tinier, and cheaper. Depending on the memory set-up selected they may also have saved on the memory price. DDR3 2133 still come at a nice premium over its vanilla 1600 ancestor.
    If they were willing to cut corners, 1GB could have done at the cost of enforcing the use of virtual texturing /tile resources (may be not a great idea).
    It may also have save quite some R&D expenses.
     
    #260 liolio, Mar 21, 2014
    Last edited by a moderator: Mar 21, 2014
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...