The ESRAM in Durango as a possible performance aid

Betanumerical · Jun 5, 2013

Michellstar said:
How can it be?

Amd is producing L3 caches that clock past 2Ghz, not that big though.
Durango ESRAM is below the gigaherz.

If they have to downclock the apu to 800/900 Gflops, they would better off scraping Durango altogether, and go with a discrete config. The soc might come off cool at 22nm

What strikes me is all those rumours of durango being hot (devs-kits actually), now the apu itself, 6 months delays, and any word/rumour from Sony camp.

When they share: process, foundry, design, almost copycat config apart from the esram..

What is going on?

AMD is producing caches at 32MB and above now?.

Michellstar · Jun 5, 2013

Betanumerical said:
AMD is producing caches at 32MB and above now?.

"not that big though."

Anyway, Is it 8mb but 32nm SOI?
Which is the largest part at TSMC, 28 nm bulk isn´t it?

Or more precisely, How much sram does it have a high end discrete AMD GPU?

liolio · Jun 5, 2013

I do not want to give much credit to what at this point looks like a lot of fud but I guess like in others fields that involve a fair amount of complexity black swans are set to happen.

Betanumerical · Jun 5, 2013

Michellstar said:
"not that big though."

Theres your issue right there it would seem.

Michellstar · Jun 5, 2013

Betanumerical said:
Theres your issue right there it would seem.

I´ll guess will find out eventually, i hope they disclose the clocks soon

Love_In_Rio · Jun 5, 2013

Betanumerical said:
AMD is producing caches at 32MB and above now?.

If Intel with Iris pro has gone with off-die EDRAM instead of on-die ESRAM and they are already in 22nm and have the best process engineers of the world, we should wonder why MS shouldn´t have problems producing a stable and functional 32 MB ESRAM configuration and not the contrary. What is the maximun amount of ESRAM Intel has put in its CPUs?. 12-16 MBs in Xeon server processors?. And how much do they cost?.

MS should scrap X1 all together and go with a discrete CPU + discrete GPU and go for GDDR5 like Sony. X1 has more transistors in its chip than a 7970 and its performance could be end being worse than a 7750!.

Ketto · Jun 5, 2013

I honestly am still very reserved about the entire thing. I doubt it's anywhere near as big as GAF is making it out to be, but that's my personal take on the matter.

Michellstar · Jun 5, 2013

I think that choosing esram was more a matter of staying in amd tech and in one apu.
We need more sources, or better quality ones to give credit to this mess.

Love_In_Rio · Jun 5, 2013

Michellstar said:
I think that choosing esram was more a matter of staying in amd tech and in one apu.
We need more sources, or better quality ones to give credit to this mess.

Well, we still have no official confirmation of the PS3 GPU clock despite of being rumored of have been downclocked to 500 Mhz eons ago. So figure out...

Michellstar · Jun 5, 2013

I didn´t know that Sony never stated gpu clocks on RSX

Anyway, if this fiasco turns out to be true, can we change the topic of this thread?

"The ESRAM in Durango as a possible performance hindrance"

Love_In_Rio · Jun 5, 2013

Michellstar said:
I didn´t know that Sony never stated gpu clocks on RSX

Anyway, if this fiasco turns out to be true, can we change the topic of this thread?

"The ESRAM in Durango as a possible performance hindrance"

LOL.

Shifty Geezer · Jun 5, 2013

This is a technical thread in the technical forum. The subject of ESRAM impact on GPU performance is independent to the final performance of XB1, so let's keep that discussion in the rumour thread.

Cyan · Jun 5, 2013

3dilettante said:
256 GB/s is the internal bandwidth of the Xenos daughter die when performing color/Z/blend/MSAA operations, at least when a title is able to exercise all those at once. The link between the GPU and the daughter die is 32 GB/s, and the eDRAM is usable for a restricted set of roles.

For Durango, the ROPs can amplify their bandwidth by going through their color and Z caches first, and then may output to eSRAM.
No longer on the other side of a dedicated ROP partition, the storage can be used for more general read/write workloads, and in that scenario the eSRAM has three times the bandwidth.

As for why there's not even more bandwidth, it doesn't seem like the rest of the APU is capable of utilizing much more, and there is a complexity and power cost to having even more connections or higher clocks to the eSRAM.
ROPs performing Z writes seem to be the largest single client, and the bus is sized to match it.

Ah okay, with the approach of the new technologies the apparent disadvantage isn't there then. I knew the read/write thing was a big difference favouring the new embedded RAM, but the advantages in other areas are striking as well, efficiency-wise. 8 years is a lot of time in the computer space.

ERP said:
The frame buffer is compressed, there are caches between the ROPS and the memory.
So you can actually exceed the peak memory bandwidth.
This is true of any modern GPU, though not 360.
It's also worth noting that the 360 can only use all of it's bandwidth with 4XMSAA enabled.
There are some additional advantages to the eSRAM when doing frame buffer operations the relatively short read/write transition time might be a win.

Thanks a lot ERP, best explanation ever! thanks. : ) Now I am going to show off about that when talking to people on the net in regards to the subject.

This post reminds me why I love Beyond3D so much.

What would be the equivalent amount of bandwidth -approximately- to 102GB/s in comparison with the technology of the past? I mean, could 102GB/s be like 200GB/s+ of fully uncompressed data? A 100% performance gain can be considered really good.

Additionally, when you say "you can actually exceed the peak memory bandwidth", I understand you are saying that you can exceed that theoretical peak within the constraints of the 102GB/s of the eSRAM but were that data passing through the eSRAM uncompressed it would be comparable to a much larger uncompressed framebuffer, not that you can actually exceed 102GB/s of data, right? Excuse me if I am wrong.

Cyan · Jun 5, 2013

Love_In_Rio said:
If Intel with Iris pro has gone with off-die EDRAM instead of on-die ESRAM and they are already in 22nm and have the best process engineers of the world, we should wonder why MS shouldn´t have problems producing a stable and functional 32 MB ESRAM configuration and not the contrary. What is the maximun amount of ESRAM Intel has put in its CPUs?. 12-16 MBs in Xeon server processors?. And how much do they cost?.

MS should scrap X1 all together and go with a discrete CPU + discrete GPU and go for GDDR5 like Sony. X1 has more transistors in its chip than a 7970 and its performance could be end being worse than a 7750!.

I hope not.

I kinda loved the architectural design of the machine since the very beginning and accepted the specs as they were.

Adding GDDR5 at this point seems impossible.

But you never know, after what happened with the eSRAM and the last-minute changes, everything seems unpredictable. :???:

I wonder though, after ALL this time now they found out that the eSRAM is causing problems?? Seriously??!!!

After the disassembling and teardowns of the machine, we've seen it inside, now that it should be in production?

Darn... wouldn't be better to just remove a couple of CUs? What were Microsoft's engineers thinking?? :???:

Gipsel · Jun 6, 2013

Michellstar said:
Or more precisely, How much sram does it have a high end discrete AMD GPU?

About 12MB, probably a bit more as there are a lot of smaller buffers unaccounted for in that number.

The only thing I could imagine (save for a major planning mistake) would be some kind of timing problems in a large array (opposed to the high number of smaller arrays used in GPUs), roughly along the line what 3dilletante (I think) wrote in another thread. SRAM should neither consume a lot of power (compared to the additional 6 CUs and 16 ROPs Orbis have over Durango) nor should there be serious yield issues (in the sense of nonfunctional dies) if they planned for enough redundancy. SRAM is regularly produced as one of the first functional things on a new process (intel is well known for it; they bragged about their 364MBit SRAM chips with >2.9 billion transistors on 22nm in 2009, roughly 2 years before any CPU using this process came close to market). And even some fundamental timing issues should have been caught early on and fixed in a respin. So I'm a bit puzzled about the alleged reasons for the alleged massive downclock so late in the game.

almighty · Jun 6, 2013

Could ESRAM be a hindrance to developers like the EDRAM in 360 was?

Will developers have to using another form of predictive tiling to maximise space?

Will using this kind of thing cause over lapping tiles?

BRiT · Jun 6, 2013

almighty said:
Could ESRAM be a hindrance to developers like the EDRAM in 360 was?

How exactly was EDRAM a hindrance to developers if they didn't want to utilize it?

almighty · Jun 7, 2013

BRiT said:
How exactly was EDRAM a hindrance to developers if they didn't want to utilize it?

Causing developers to have to run funny native resolutions because of lack of space..

And that's the whole point, most developers didn't use it making it useless..... So why use a similar thing again....

Gipsel · Jun 7, 2013

almighty said:
And that's the whole point, most developers didn't use it making it useless

That is impossible as everything what was written out through the ROPs ended up in the eDRAM. AFAIK, one couldn't render directly to memory (Durango can, XBox360 doesn't).

sebbbi · Jun 7, 2013

Gipsel said:
That is impossible as everything what was written out through the ROPs ended up in the eDRAM. AFAIK, one couldn't render directly to memory

Yes, Xbox 360 ROPs are inside the EDRAM die, and write directly to it. If you game needs ROP functionality (depth buffering, stencil, triangle inside test, blending), you need to use EDRAM on Xbox 360. I haven't heard of a single current gen game that doesn't render polygons (even Minecraft does)...

You can also directly write to memory with MEMEXPORT. But that's similar to compute shader writes. It is direct raw memory write, there's no ROP functionality. So it's not good for traditional triangle rendering algorithms, but very handful for many other purposes (and thus one of my favourite features of Xbox 360 platform).

almighty said:
Causing developers to have to run funny native resolutions because of lack of space..
And that's the whole point, most developers didn't use it making it useless..... So why use a similar thing again....

10 MB EDRAM is more than enough for 720p forward rendering. However 3 years after Xbox 360 was released deferred rendering was becoming really popular. Most AAA game engines are now using deferred rendering. Nobody could have guessed this development in 2005. Researchers are every year inventing new algorithms that stress GPUs in ways that were not done before. Deferred rendering was one of these things, and it allowed developers to use much higher count of local light sources, and much lower count of shader permutations (less GPU state changes = better GPU efficiency = better performance). A fully optimized g-buffer layout (Crytek 3), uses 12 bytes per pixel. On 720p that is 10.8 MB. Had Microsoft seen 7 years in the future, they had chosen to include 12 MB of EDRAM instead in Xbox 360, and we wouldn't have this discussion

EDRAM was one of the key things that allowed Xbox 360 GPU to over-perform PS3 GPU. PS3 developers needed to use Cell CPU resources to compensate the GPU performance/bandwidth difference. This required lots of extra development time, and basically removed any CPU performance advantage (better physics / AI / etc) PS3 could have had in the long run. Xbox 360 GPU architecture + EDRAM were among the best engineering decisions they made for the last generation hardware.

The ESRAM in Durango as a possible performance aid

Betanumerical

Michellstar

liolio

Aquoiboniste

Betanumerical

Michellstar

Love_In_Rio

Ketto

Michellstar

Love_In_Rio

Michellstar

Love_In_Rio

Shifty Geezer

uber-Troll!

Cyan

orange

Cyan

orange

Gipsel

almighty

BRiT

(>• •)>⌐■-■ (⌐■-■)

almighty

Gipsel

sebbbi

Similar threads