(((interference)))
Veteran
Thanks, so GDDR5 and DDR3 are not that different.
What about 6T SRAM and 1T SRAM?
What about 6T SRAM and 1T SRAM?
Thanks, so GDDR5 and DDR3 are not that different.
What about 6T SRAM and 1T SRAM?
You can't stack dies with the thermal profile of an APU-type chip, it'd nuke itself with the heat trapped inside of it. Haswell and the various versions of the 360 all use dies set side-by-side in a traditional MCM manner.If I were MS, I would go for the SiP option (stack a DRAM die on top of your main die, like Intel seems to be doing for Haswell...
and what they of course in the original Xbox 360).
Rather than anything else, it just seems like a way of having a lot of cheap memory while maintaining some degree of performance.
So it's a performance aid over just having DDR3, not special sauce.
Though it'd be interesting to see the latency figures for ESRAM vs DDR3 and GDDR5.
I was asking in the Orbis technical thread and the answer was the difference in DDR3 vs GDDR5 latency was not significant, but I never got actual figures.
So how big would the difference be between say the PS4's GDDR5 setup and the on die ESRAM in Durango?
EDRAM locks you into certain foundries since it requires specialized manufacturing techniques while SRAM is the same as the rest of the logic on the die and can be made anywhere.
the ESRAM can be a coherent cache for both CPU and GPU and useful in HSA.
If PRT can leverage eSRAM, perhaps there's a much coarser level of coherence at a page or texture chunk level?According to who? Cache needs a lot of extra die area for tags and a controller, and adds the headache of putting it in between the bus instead of on a separate bus.
Not to minimize the importance of bandwidth at all, but what other benefit does esram provide the console? Wouldn't that budget have been better used on GPU CUs? Sony has faster memory and more compute for the same cost I take it?
It's only useful to have more CU's if you can feed them, and that requires bandwidth and the ability to hide memory latency.
Caches help with the latter, but CU's are idle a lot of the time waiting on memory, or other parts of the render pipeline.
Doubling the CU count does not make a part twice as fast except in artificial tests.
On Xbox 360, the EDRAM helps a lot with backbuffer bandwidth. For example in our last Xbox 360 game we had a 2 MRT g-buffer (deferred rendering, depth + 2x8888 buffers, same bit depth as in CryEngine 3). The g-buffer writes require 12 bytes of bandwidth per pixel, and all that bandwidth is fully provided by EDRAM. For each rendered pixel we sample three textures. Textures are block compressed (2xDXT5+1xDXN), so they take a total 3 bytes per sampled texel. Assuming a coherent access pattern and trilinear filtering, we multiply that cost by 1.25 (25% extra memory touched by trilinear), and we get a texture bandwidth requirement of 3.75 bytes per rendered pixel. Without EDRAM the external memory bandwidth requirement is 12+3.75 bytes = 15.75 bytes per pixel. With EDRAM it is only 3.75 bytes. That is a 76% saving (over 4x external memory bandwidth cost without EDRAM). Deferred rendering is a widely used technique in high end AAA games. It is often criticized to be bandwidth inefficient, but developers still love to use it because it has lots of benefits. On Xbox 360, the EDRAM enables efficient usage of deferred rendering.
Also a fast read/write on chip memory scratchpad (or a big cache) would help a lot with image post processing. Most of the image post process algorithms need no (or just a little) extra memory in addition to the processed backbuffer. With large enough on chip memory (or cache), most post processing algorithms become completely free of external memory bandwidth. Examples: HDR bloom, lens flares/streaks, bokeh/DOF, motion blur (per pixel motion vectors), SSAO/SSDO, post AA filters, color correction, etc, etc. The screen space local reflection (SSLR) algorithm (in Killzone Shadow Fall) would benefit the most from fast on chip local memory, since tracing those secondary rays from the min/max quadtree acceleration structure has quite an incoherent memory access pattern. Incoherent accesses are latency sensitive (lots of cache misses) and the on chip memories tend to have smaller latencies (of course it's implementation specific, but that is usually true, since the memory is closer to the execution units, for example Haswell's 128 MB L4 should be lower latency than the external memory). I would expect to see a lot more post process effects in the future as developers are targeting cinematic rendering with their new engines. Fast on chip memory scratchpad (or a big cache) would reduce bandwidth requirement a lot.
i mean, it just really seems odd.l i saw one post thumbnailing well, 5b trans, must be broken down 2b gpu, 2b esram, 1b cpu/everything else...
i mean, he just dedicated 4b to the gpu, thats a 7970. why not use an actual 7970 then.
something just doesnt add up to me. options.
-microsoft screwed up big time, making a weak console that's also expensive (certainly this will be a popular opinion)
-the esram is somehow much less weighty than it's trans count indicates. ie, can be packed in much smaller space etc. so the trans count is not close to a true indication of it's cost which is much less.
-the esram is really useful/give the gpu a major power efficiency help? (what this thread was about, with inconclusive well, conclusion that i can tell, but overall not seeming to be greatly positive)
also, i'm not to say "locking you into certain foundries", while intriguing info, matters so much?
What of the EDRAM in 360? Seemed to work out ok.
There's much talk about the advantage of being able to fab anywhere, but it always just ends up being TSMC, or at best one or two other big options, doesn't it? Always found that a bit odd lol.