The ESRAM in Durango as a possible performance aid

Discussion in 'Console Technology' started by Rangers, May 4, 2013.

  1. Rangers

    Legend

    Joined:
    Aug 4, 2006
    Messages:
    12,563
    Likes Received:
    1,387
    So, yeah. I've been wondering about this for a while, spurred by some of ERP's posts and a post or two by fafalada (programmer who posts in neogaf/here as well i believe)

    The crux seems to be that low latency of the ESRAM can help the GPU be more efficient.

    Of course I'm not a programmer or know that much about tech, so maybe some others can weigh in I hope.

    Basically, what kind of cool things could be done PS2 style with the ESRAM?
     
  2. Rangers

    Legend

    Joined:
    Aug 4, 2006
    Messages:
    12,563
    Likes Received:
    1,387
    To start, I have read a post somewhere on here that Nvidia GPU's often outperform AMD ones per flop because they have lower latency caches. I did some rough research and, comparing some Nvidia GPU's to AMD GCN ones (GCN only because VLIW would have skewed the efficiency too much) I was able to calculate that GTX 660 seems about 18% more efficient per flop than Southern Islands (you have to be careful here and account for average boost clocks mind you). But the real star was Fermi, I calculated that GTX 570 (rated at 1.41 TF's) compared to the 1.8TF 7850, was 12% faster at 1080P (570>7850) according to tech power up performance summary across a suite of games. So the 570 is 43% faster than 7850 per flop! I specifically chose 570 because it features near identical memory bandwidth to the 7850. Is that because of low latency cache, or why is fermi and to a lesser extent Kepler so much more efficient?
     
  3. Silenti

    Regular

    Joined:
    May 25, 2005
    Messages:
    498
    Likes Received:
    87
    Forgive the ignorance of my question, but in the tech threads has this not already been discussed and the efficiency gains possible were determined to be so low that it didn't really matter? As in it was already running about as well as could be expected so that any gains would be minimal and relatively inconsequential?
     
  4. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,293
    Location:
    Helsinki, Finland
    SiSoft has done GPU latency measurements for various AMD/Intel/NVidia GPUs. It's an interesting read: http://www.sisoftware.net/?d=qa&f=gpu_mem_latency. Unfortunately the charts do not include GCN based Radeons (7000 series). Fermi was the first Nvidia GPU to have a proper cache hierarchy, and GCN is the first AMD card. It would be interesting to see both of them compared (and Kepler also).

    How much latency affects performance? It depends entirely on data access patterns and slackness (how many extra threads you have ready to run if current ones stall). Hiding latency is rather easy in traditional rendering tasks (as access patterns are simple -> cache hit ratio is high + shaders have low GPR count -> you have lots of threads to run if a subset of them stalls). Memory latency is more important for compute tasks (more complex access patters + higher GPR count in shaders = less slackness).
     
    #4 sebbbi, May 4, 2013
    Last edited by a moderator: May 4, 2013
    Pixel likes this.
  5. Kameradschaft

    Regular

    Joined:
    Jan 3, 2010
    Messages:
    310
    Likes Received:
    0
    Location:
    Europa?
    I was wondering about that too lately, is esram something that will bring benefits to Durango when compared to the PS4 like it was with edram (lots and higher res particles for example) or the esram is simply there to aid and help the overall bandwidth of the system with no advantages at all over the PS4 architecture?

    I guess it's too early to say any of that with certainty but it'd be cool if any of the guys in here shed some light into this.
     
  6. XpiderMX

    Veteran

    Joined:
    Mar 14, 2012
    Messages:
    1,768
    Likes Received:
    0
    If esram is there to help with bandwidth, isn't "cheaper" to put GDDR5?
     
  7. AlphaWolf

    AlphaWolf Specious Misanthrope
    Legend

    Joined:
    May 28, 2003
    Messages:
    8,957
    Likes Received:
    967
    Location:
    Treading Water
    Nope. Esram is on die and will cost reduce with node shrinks.

    And this thread would be about better latencies improving efficiency.
     
  8. XpiderMX

    Veteran

    Joined:
    Mar 14, 2012
    Messages:
    1,768
    Likes Received:
    0
    Isn't HSA supposedly for improve efficiency? I have read that both APUs (PS4 and Durango) will have similar improvements.
     
  9. Sonic

    Sonic Senior Member
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,883
    Likes Received:
    88
    Location:
    San Francisco, CA
    What puzzles me is why such an anemic bandwidth for the 32 MB of esram? Is being limited to 102 GB/s bandwidth have to do with being very low latency? When I heard rumors of edram/esram in X720 I had fantasies of 256 GB/s bandwidth or greater all with lower latency than main RAM. So why such a low bandwidth?
     
  10. AlphaWolf

    AlphaWolf Specious Misanthrope
    Legend

    Joined:
    May 28, 2003
    Messages:
    8,957
    Likes Received:
    967
    Location:
    Treading Water
    More than one thing can improve efficiency. If ram latency does or not I'm unsure, but that's really the subject of the thread I think.
     
  11. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    42,970
    Likes Received:
    15,077
    Location:
    Under my bridge
    The OP is asking about lower latency RAM access. That's the topic.

    I'm not suitably well informed to know the answer, but I expect latency not to make a huge difference. In GCN, a wavefront is presumably prepared ahead of the ALUs needing it (prefetch textures and line up instructions in the pipe). Everything should be working from caches, same as a CPU, and the occasions when the caches fail should be fairly few and far between, especially with GPU workloads. Latency only impacts your processing if you want data and it's not to hand, and then you have to wait for it, which means your data management is failing.
     
  12. ERP

    ERP Moderator
    Moderator Veteran

    Joined:
    Feb 11, 2002
    Messages:
    3,669
    Likes Received:
    49
    Location:
    Redmond, WA
    Yes it largely comes down to how good the caches are and how much pending work there is that can be done.
    It's hard to know how much effect the lower latency would have, it's my impression that even in the best performing PC games GPU utilization isn't very high overall, I've heard numbers like 40-50% thrown around), but that's for many different reasons.
    For example the ALU's are generally massively underutilized when processing vertex wavefronts, and that has nothing to do with latency. It's one of the reasons Sony has talked about using compute to process vertices.
    Obviously the ALU's are also massively underutilized when rendering shadows or other simple geometry, again reduces latency doesn't help you there either.
    What you'd need to know is how much of the under utilization is due to cache misses and how much having the resource in low latency memory reduces that. I doubt most PC GPU's have enough performance counters to even answer the first part of that question.
    You could get a bullshit estimate by replacing every texture in a game with a 1x1 to eliminate the cache altogether, but that doesn't answer the question in isolation as it also greatly reduces the bandwidth being consumed.
     
  13. Urian

    Regular

    Joined:
    Aug 23, 2003
    Messages:
    622
    Likes Received:
    55
    I prefer the ESRAM in an interposer with the SoC, like the current Xbox 360 and using the local memory channel from the GPU.
     
  14. Brad Grenz

    Brad Grenz Philosopher & Poet
    Veteran

    Joined:
    Mar 3, 2005
    Messages:
    2,531
    Likes Received:
    2
    Location:
    Oregon
    That is a bit curious, and Mark Cerny's interview with Gamasutra included something of a hip check at that figure when he suggested that had Sony gone for embedded memory they would have used something exponentially faster, in the 1TBps range. That would have been more like the PS2 where the VRAM is 15 times as fast as main memory.

    It's also important to note the ESRAM in Durango isn't a cache, so it's up to the programmer to make sure data is there and not off chip in the main DDR3 pool in the first place for any latency advantage to manifest, and there will be trade offs in terms of where you're storing your framebuffer, etc. If you keep it in ESRAM to avoid saturating the main memory bus, you limit the utility of the ESRAM for compute, and if you store your framebuffer in the DDR3 you could end up limiting the kind of frame buffer effects you can manage. I don't know, maybe you could earmark a few megabytes and write a software caching algorithm SPE style, but will the benefits actually justify that kind of effort, or would anyone even bother?
     
  15. Cjail

    Cjail Fool
    Veteran

    Joined:
    Feb 1, 2013
    Messages:
    2,027
    Likes Received:
    210

    This is what Cerny said:

    "I think you can appreciate how large our commitment to having a developer friendly architecture is in light of the fact that we could have made hardware with as much as a terabyte of bandwidth to a small internal RAM, and still did not adopt that strategy."

    It's just an example of how PS4 could have been if Sony had privileged BW over memory amount and opted for an "unfriendly" architecture.

    This is what he says about eDRAM:

    The memory is not on the chip, however. Via a 256-bit bus, it communicates with the shared pool of ram at 176 GB per second.
    One thing we could have done is drop it down to 128-bit bus which would drop the bandwidth to 88 gigabytes per second, and then have eDRAM on chip to bring the performance back up again...
    We did not want to create some kind of puzzle that the development community would have to solve in order to create their games. And so we stayed true to the philosophy of unified memory"
     
    #15 Cjail, May 5, 2013
    Last edited by a moderator: May 5, 2013
  16. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,174
    Location:
    La-la land
    The only reason I can think of is that having a very fast eDRAM pool would require the interface to the actual DRAM banks to be extremely wide, since apparantly DRAM itself won't clock very high. IE, GDDR5 DRAM runs at a couple hundred MHz at most with wide read/write ports on-chip, then extra logic funnels the data to the narrow interconnect to the memory controller. ...From what I understand of it.

    And a very wide on-chip interface would mean a tremendous amount of wiring on that chip, which on an already very complicated ASIC could be troublesome to route, perhaps. We're up to as much as 11 metal layers on a high-end microprocessor already, would a 2-4kbit wide DRAM interface push that even further?

    ...That's the only reason I can think of off-hand anyway. IE, a "weaker" eDRAM implementation would be easier, and thus less costly to implement/build. Or else MS is toning down the importance of multisample antialiasing (in favor of post-processing blurring for example instead), and extra bandwidth is thus simply considered unneccessary. But that also implies a cost saving measure...

    *shrug*

    We get some answers hopefully at/around the 21st (hopefully without too much PR spin, but maybe too much to hope for.) Anyway, it's not THAT far away now. *takes a deeeeeeep breath*
     
  17. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,011
    Likes Received:
    537
    IMO it's a Nintendo like design decision ...

    Sony mentioned they could have had a TB/s eDRAM ... and really that's what you need for it to be interesting.
     
  18. (((interference)))

    Veteran

    Joined:
    Sep 10, 2009
    Messages:
    2,499
    Likes Received:
    70
    Rather than anything else, it just seems like a way of having a lot of cheap memory while maintaining some degree of performance.

    So it's a performance aid over just having DDR3, not special sauce.

    Though it'd be interesting to see the latency figures for ESRAM vs DDR3 and GDDR5.
    I was asking in the Orbis technical thread and the answer was the difference in DDR3 vs GDDR5 latency was not significant, but I never got actual figures.
     
  19. dobwal

    Legend Veteran

    Joined:
    Oct 26, 2005
    Messages:
    5,321
    Likes Received:
    1,347
    Amd has a couple of patents of employing embedding memory into a gpu and its not cost saving based related.

    http://appft1.uspto.gov/netacgi/nph...&RS=(AN/"advanced+micro+devices"+AND+embedded)

    http://appft1.uspto.gov/netacgi/nph...&RS=(AN/"advanced+micro+devices"+AND+embedded)

    I havent fully read the patents but one is basically a way to facillitate a high bandwidth and low latency memory system and a way to use embedded memory to allow gpus in a multiple gpu configuration to create a large unified on chip memory pool.

    The second seems to refer a more efficient approach when wave fronts are outputting data needed by the subsequent wave fronts of gpgpu based tasks. Im guessing in this case it only applicable to times where L2 isnt enough.

    IBM has this patent where the i/o interconnect is used to inject data into caches to avoid the latency of main memory.

    http://www.google.com/patents/US20090157977

    Here is an AMD patent for using a split memory system for AA.

    http://www.google.com/patents/US201...a=X&ei=XtyGUc_nLKnb4AOFsYCACw&ved=0CDoQ6AEwAQ
     
    #19 dobwal, May 5, 2013
    Last edited by a moderator: May 5, 2013
  20. Thowllly

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    551
    Likes Received:
    4
    Location:
    Norway
    http://forum.beyond3d.com/showpost.php?p=1714988&postcount=132
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...