Xbox One (Durango) Technical hardware investigation

Status
Not open for further replies.
One More Time From The Top

But IBM is using eDRAM. Isn't that a whole different story than using eSRAM? I think it is because that IBM POWER7 chip with 80MB eDRAM has just a bit more transistors* than whole eSRAM package in XBOne.

* IBM POWER7: 2.1B Transistors. Rumoured XBOne eSRAM takes 1.6B to 2B transistors alone.

I do not know if AMD is using "true" SRAM or if they are using 1T SRAM which is really eDRAM. There are many types of DRAM that has additional support circuitry so that it "looks" much like SRAM.


I doubt anyone posting here knows. But 1T-SRAM that has been publicly announced for the likely processes/fabs sounds like a good bet. In other words eDRAM in truth.



It has been covered in other posts, but:

Embedding memory on the ASIC or processor allows for much wider buses and higher operation speeds, and due to much higher density of DRAM in comparison to SRAM, larger amounts of memory can be installed on smaller chips if eDRAM is used instead of eSRAM. eDRAM requires additional fab process steps compared with embedded SRAM, which raises cost, but the 3X area savings of eDRAM memory offsets the process cost when a significant amount of memory is used in the design.

eDRAM memories, like all DRAM memories, require periodic refreshing of the memory cells, which adds complexity. However if the memory refresh controller is embedded along with the eDRAM memory, the remainder of the ASIC can treat the memory like a simple SRAM type such as in 1T-SRAM. :!:


eDRAM is used in IBM's POWER7 processor[1] and in many game consoles and other devices, including Sony's PlayStation 2, Sony's PlayStation Portable, Nintendo's GameCube, Nintendo's Wii, Nintendo's Wii U, Apple Inc.'s iPhone, :arrow: Microsoft's Zune HD, and Microsoft's Xbox 360.



1T-SRAM is a pseudo-static random-access memory (PSRAM) technology introduced by MoSys, Inc., which offers a high-density alternative to traditional static random access memory (SRAM) in embedded memory applications. Mosys uses a single-transistor storage cell (bit cell) like dynamic random access memory (DRAM), but surrounds the bit cell with control circuitry that makes the memory functionally equivalent to SRAM (the controller hides all DRAM-specific operations such as precharging and refresh). 1T-SRAM (and PSRAM in general) has a standard single-cycle SRAM interface and appears to the surrounding logic just as an SRAM would.

Due to its one-transistor bit cell, 1T-SRAM is smaller than conventional (six-transistor, or “6T”) SRAM, and closer in size and density to embedded DRAM (eDRAM). At the same time, 1T-SRAM has performance comparable to SRAM at multi-megabit densities, uses less power than eDRAM and is manufactured in a standard CMOS logic process like conventional SRAM.

MoSyS markets 1T-SRAM as physical IP for embedded (on-die) use in System-on-a-chip (SOC) applications. It is available on a variety of foundry processes, including Chartered, SMIC, TSMC, and UMC. :arrow: Some engineers use the terms 1T-SRAM and "embedded DRAM" interchangeably:!:, as some foundries provide Mosys's 1T-SRAM as “eDRAM”. However, other foundries provide 1T-SRAM as a distinct offering.



1T SRAM is built as an array of small banks (typically 128 rows × 256 bits/row, 32 kilobits in total) coupled to a bank-sized SRAM cache and an intelligent controller. Although space-inefficient compared to regular DRAM, the short word lines allow much higher speeds:!:, so the array can do a full sense and precharge (RAS cycle) per access, providing high-speed random access. Each access is to one bank, allowing unused banks to be refreshed at the same time. Additionally, each row read out of the active bank is copied to the bank-sized SRAM cache. In the event of repeated accesses to one bank, which would not allow time for refresh cycles, there are two options: either the accesses are all to different rows, in which case all rows will be refreshed automatically, or some rows are accessed repeatedly. In the latter case, the cache provides the data and allows time for an unused row of the active bank to be refreshed.



There have been four generations of 1T-SRAM:

Original 1T-SRAM
About half the size of 6T-SRAM, less than half the power.


1T-SRAM-M
Variant with lower standby power consumption, for applications such as cell phones.


1T-SRAM-R
Incorporates ECC for lower soft error rates. :!: To avoid an area penalty, it uses smaller bit cells, which have an inherently higher error rate, but the ECC more than makes up for that.

1T-SRAM-Q

This "quad-density" version uses a slightly non-standard fabrication process to produce a smaller folded capacitor, allowing the memory size to be halved again over 1T-SRAM-R. This does add slightly to wafer production costs, but does not interfere with logic transistor fabrication the way conventional DRAM capacitor construction does.



I think the truth is that the eSRAM people are taking about is the exact same 1T-SRAM (eDRAM) that IBM is taking about. I expect it is the same cells in the same foundries in the same process. I might be wrong but I don't think so.
 
Add to the point that server chips don't need to yield that well for the price they're selling (for the system that they're sold in*)...

That, and factor in the different manufacturing process. IBM's processes are heavily tweaked for performance while TSMC's bulk is much more general purpose. Things could go really wrong with the ESRAM; I'm very surprised that there's no conscious mention at all wrt redundancy mechanisms!
 
Yes. IBM switched to embedded DRAM instead of SRAM for their large caches. likewise, the Xbox 360's daughter die was embedded DRAM. The Xbox One is using SRAM which is composed of 6-8 times as many transistors as embedded DRAM, and there has never been a commercial product with such a large contiguous block of SRAM.

If you know for a fact that it is 6T or 8T please provide a real reference as I would like to know.

It might just be 1T-SRAM which is actually eDRAM at the core just as used in Power 7, Power 7+ and Power 8.
 
I'm pretty sure it's 6T, the transistor count is too large to just be 1T, they could have gone with a far greater amount if it was just 1T - even the Wuu has 32MB of eDRAM.
 
Add to the point that server chips don't need to yield that well for the price they're selling (for the system that they're sold in*)...

That, and factor in the different manufacturing process. IBM's processes are heavily tweaked for performance while TSMC's bulk is much more general purpose. Things could go really wrong with the ESRAM; I'm very surprised that there's no conscious mention at all wrt redundancy mechanisms!

Are you talking about no mention by IBM of redundancy mechanisms or Xbox One?

Because IBM does: Please see slide 17:

https://www-950.ibm.com/events/wwe/grp/grp017.nsf/vLookupPDFs/PatO'RourkeDeep%20Dive/$file/PatO'RourkeDeep%20Dive.pdf

And memory does all the time. Memory industry lives and breaths that stuff.



So IBM's SiGe processes are highly tweaked for performance. But as far as the common platform alliance processes I am pretty sure IBM's processes are not tweaked at all as the whole idea is portability on a common platform, not a tweaked one of a kind platform:

http://www.commonplatform.com/
 
I'm pretty sure it's 6T, the transistor count is too large to just be 1T, they could have gone with a far greater amount if it was just 1T - even the Wuu has 32MB of eDRAM.

There may not be a need for a far greater amount though. It's a sweet spot thing.

Hell Intel said with Haswell 32MB was fine, they just twice doubled it to be super safe and future proof. In a console every penny counts. ED/SRAM is already something you need to use very sparingly imo for it too make sense.

10Mb worked ok for 360, 1080P is 2.25X the pixels and 32 is 3.2 times the ESRAM, so you have more relative room in Xbone than you did in 360.

I still assume it's 6T though because of the 5 billion transistor callout.
 
I'm pretty sure it's 6T, the transistor count is too large to just be 1T, they could have gone with a far greater amount if it was just 1T - even the Wuu has 32MB of eDRAM.

But where did you get the transistor count?

I read that number too, but it sounded like someone heard 32MB and then assumed 6T and multiplied out. In fact I remember reading such a post, but not sure if that was the origin.

But I don't have a real reference. If you do have a *real* reference please post as I am interested in finding out.



I think this set of rumors might turn out to be non-sense started by someone assuming 6T and multiplying it out and then all the implications started coming out of the woodwork. (Combined with MS bashing season.)

Heck, Power 7+ slide from IBM says 2.1B transistors for 10MB per core on an octal chip (80MB) and says only 2.1B transistors for the whole chip.

We are talking about 32MB not 80MB and tiny cores (rumors) not Power 7+ so I claim the rumors are bullstuff as it just doesn't all add up.
 
But where did you get the transistor count?

I read that number too, but it sounded like someone heard 32MB and then assumed 6T and multiplied out. In fact I remember reading such a post, but not sure if that was the origin.

But I don't have a real reference. If you do have a *real* reference please post as I am interested in finding out.



I think this set of rumors might turn out to be non-sense started by someone assuming 6T and multiplying it out and then all the implications started coming out of the woodwork. (Combined with MS bashing season.)

Heck, Power 7+ slide from IBM says 2.1B transistors for 10MB per core on an octal chip (80MB) and says only 2.1B transistors for the whole chip.

We are talking about 32MB not 80MB and tiny cores (rumors) not Power 7+ so I claim the rumors are bullstuff as it just doesn't all add up.

It's hard to get to 5B transistors in Xbone (if what's assumed is true) without a whole lot of SRAM trans.

HD 7790=14 CU's=only 2.1B Trans. Small CPU cores.

Assume 1.6+ for ESRAM and now you only need to justify 3.4B more, much easier.
 
I'm pretty sure it's 6T, the transistor count is too large to just be 1T, they could have gone with a far greater amount if it was just 1T - even the Wuu has 32MB of eDRAM.

So if the Wii U has 32MB why is it crazy to consider that MS does too?

I have not heard of horrific Wii U yield, TDP and down clock nightmares. Not that they aimed for a clock or TDP that I would approve of.



I think this is all crowd-sourced MS and Xbox One bashing/smear.
 
It's hard to get to 5B transistors in Xbone (if what's assumed is true) without a whole lot of SRAM trans.

HD 7790=14 CU's=only 2.1B Trans. Small CPU cores.

But it is assumptions on top of assumptions.

I don't see any real references for the majority of the rumors.

I think I will sign out for a few days and hope that E3 and more real data comes really really soon.



I don't buy that 5B transistors were just squandered. The team that did the Xbox shrink Xenos+Xenon APU is better than that.
 
So if the Wii U has 32MB why is it crazy to consider that MS does too?

I have not heard of horrific Wii U yield, TDP and down clock nightmares. Not that they aimed for a clock or TDP that I would approve of.



I think this is all crowd-sourced MS and Xbox One bashing/smear.

Can you show me something that is actually eSRAM that is at the scale of 32MB ?.

The Wii U 32MB cache is EDRAM, the IBM massive 80MB is EDRAM. Haswells 128MB is EDRAM

Do you not see a pattern? Theres a reason they are all EDRAM.


But it is assumptions on top of assumptions.

I don't see any real references for the majority of the rumors.

I think I will sign out for a few days and hope that E3 and more real data comes really really soon.



I don't buy that 5B transistors were just squandered. The team that did the Xbox shrink Xenos+Xenon APU is better than that.

your making a large assumption that the IBM guys are actually working on XBONE. Isn't it more likely they are being used for Oban the 360 Shrink?.
 
There may not be a need for a far greater amount though. It's a sweet spot thing.

Hell Intel said with Haswell 32MB was fine, they just twice doubled it to be super safe and future proof. In a console every penny counts. ED/SRAM is already something you need to use very sparingly imo for it too make sense.

10Mb worked ok for 360, 1080P is 2.25X the pixels and 32 is 3.2 times the ESRAM, so you have more relative room in Xbone than you did in 360.

I still assume it's 6T though because of the 5 billion transistor callout.
Otish but I think that Intel went for 128MB so they don't waste the efforts done at optimizing their drivers with their next generation of products. That is on the graphics side I would think the same applies for the CPU, they want to provide a stable platform for the devs in the broad sense.

I wonder if with Broadwell Intel will try to compete with discrete part in the mid range segment, with the ceiling on performances set by the consoles it looks like an achievable target to me.

Wrt to crystalweb there is a difference between Durango scratchpad and Intel, INtel vouched for a cache, they have pretty high cache hit rate at 32MB but pushed further, the consideration for durango were different, imho the reasons why Intel could have gone with only 32MB and MSFT vouched for it are different even putting production costs aside for a second.

I wish AMD had more engineers CrystalWeb would have been great for Durango. If Intel figures are not too off, a 128 bit set-up + CW acts as a +100GB/s GDDR5 set-up.
MSFT would not have need more than 32MB, could have passed on a 256 bit bus, and the main chip would have been significantly tinier (max 250 mm^2) at the cost of a second chip but even @ 40nm 32MB of eDram should not be that expensive.
Either way... they should have gone with Intel (I know Intel is unwilling to let things out for cheap...).
 
Last edited by a moderator:
But it is assumptions on top of assumptions.

I don't see any real references for the majority of the rumors.

SPdkHeq.jpg


The 5 Billion transistor figure is directly from Microsoft stated during the reveal event and later confirmed in press materials and official round tables.

The only way that figure makes sense is if the ESRAM that has always been referenced in the leaked and verified as genuine developer documentation is indeed a 6 transistor version.
 
Otish but I think that Intel went for 128MB so they don't waste the efforts done at optimizing their drivers with their next generation of products. That is on the graphics side I would think the same applies for the CPU, they want to provide a stable platform for the devs in the broad sense.

I wonder if with Broadwell Intel will try to compete with discrete part in the mid range segment, with the ceiling on performances set by the consoles it looks like an achievable target to me.

Wrt to crystalweb there is a difference between Durango scratchpad and Intel, INtel vouched for a cache, they have pretty high cache hit rate at 32MB but pushed further, the consideration for durango were different, imho the reasons why Intel could have gone with only 32MB and MSFT vouched for it are different even putting production costs aside for a second.

I wish AMD had more engineers CrystalWeb would have been great for Durango. If Intel figures are not too off, a 128 bit set-up + CW acts as a +100GB/s GDDR5 set-up.
MSFT would not have need more than 32MB, could have passed on a 256 bit bus, and the main chip would have been significantly tinier (max 250 mm^2) at the cost of a second chip but even @ 40nm 32MB of eDram should not be that expensive.
Either way... they should have gone with Intel (I know Intel is unwilling to let things out for cheap...).

well, i bet all being one chip is a big advantage.

they just need to get over these teething pains, and over time the yields will inevitably become a walk in the park and the thing should get very efficient costwise from then out. one apu on a ddr3 based system should become very cheap. The APU/ESRAM will scale great to lower nodes too.
 
So if the Wii U has 32MB why is it crazy to consider that MS does too?
eDRAM versus ESRAM, although in this thread we aren't saying 32 MBs is crazy yet. There's no place for high emotions in a technical investigation. ;)

I have not heard of horrific Wii U yield, TDP and down clock nightmares. Not that they aimed for a clock or TDP that I would approve of.
Not to go OT, but that reasoning is flawed. We never knew what the target clocks for Wii U were prior to release, and for all we know it did have a massive downclock because of the eDRAM. There's not enough information to understand the backgrounds and make fair comparisons, plus Wii U and XB1 are using different techs AFAWK.

I think this is all crowd-sourced MS and Xbox One bashing/smear.
Whatever anyone thinks, this thread addresses the issue through technical analysis without regard for the consequences.

But it is assumptions on top of assumptions.
The discussion of whether it's eDRAM, SRAM, or 1T-SRAM under another name has been had on this board. All the discussion pointed to it being SRAM, and all the pieces fit. It's not assumption on assumption but bits of info here and there that point to it being eSRAM. 32 MBs SRAM cache is something new, and perhaps it did come with issues no-one expected? Had MS used eDRAM, we'd be looking at a very different proposition with established tech suggesting it's highly unlikely there'd be issues, but MS went with a novel approach.

I don't buy that 5B transistors were just squandered. The team that did the Xbox shrink Xenos+Xenon APU is better than that.
I wasn't squandered. It was chosen for a purpose, to aid the GPU in a way eDRAM could not. The issue here is if that choice came with unexpected, unpredictable engineering challenges. As it's an untested field, massive SRAM, we don't have a point of reference and can only speculate based on whatever any of us knows about SRAM.
 
well, i bet all being one chip is a big advantage.

they just need to get over these teething pains, and over time the yields will inevitably become a walk in the park and the thing should get very efficient costwise from then out. one apu on a ddr3 based system should become very cheap. The APU/ESRAM will scale great to lower nodes too.
It should especially as starting from ~400mm^2 fitting the 256 bit bus should not be an issue anytime soon. Though even with an extra chip (Edram) I'm not sure about how a system using a way tinier chip and 128 bit bus would compare as far as costs are concerned. I wonder what was the cost of the smart eDRAM in the 360.

Lets hope all this noise is just FUD as as costumers we need proper competition especially when the temptation is strong to go for pretty strong DRM policies. Whatever is MSFT or Sony position on those policies they can move depending on costumers reactions, if one comes on top by a quite significant margins (and pretty early in the generation) well business is business...

EDIT
Anyway pointless only Intel so far managed to pull that L4 cache trick.
 
Last edited by a moderator:
.

10Mb worked ok for 360, 1080P is 2.25X the pixels and 32 is 3.2 times the ESRAM, so you have more relative room in Xbone than you did in 360.

I still assume it's 6T though because of the 5 billion transistor callout.

10 MB wasn't really enough, you couldn't use the 'free' MSAA on a 720p image without tiling or have multiple 720p buffers for HDR (which was why H3 was 640p noAA).
I think you would need 64MB eDRAM to satisfy devs next gen, especially given engines moving to deferred rendering and multiple framebuffers.
 
I wasn't squandered. It was chosen for a purpose, to aid the GPU in a way eDRAM could not.

Maybe not even that but from what I have read here, ESRAM can be combined in one APU while EDRAM has to be separate, ESRAM can be fabbed anywhere versus limited fab options for EDRAM. Those seem to be favorable cost points for ESRAM that may negate or at least mitigate the unfavorable size issue, when you also combine it with being more flexible technically it could be the better overall choice by MS reasoning.
 
10 MB wasn't really enough, you couldn't use the 'free' MSAA on a 720p image without tiling or have multiple 720p buffers for HDR (which was why H3 was 640p noAA).
I think you would need 64MB eDRAM to satisfy devs next gen, especially given engines moving to deferred rendering and multiple framebuffers.
Well it is unclear how much devs would want, using big render targets might have its advantages as Guerilla are not using 800MB of memory for the sake of using memory. It is unclear how Durango would come with that type of set-up / could keep the most relevant buffer/RT in the scratchpad (though the bandwidth to the main ram is not anemic either).
 
10 MB wasn't really enough, you couldn't use the 'free' MSAA on a 720p image without tiling or have multiple 720p buffers for HDR (which was why H3 was 640p noAA).
I think you would need 64MB eDRAM to satisfy devs next gen, especially given engines moving to deferred rendering and multiple framebuffers.

yeah, it was a tight squeeze, but remember 32mb is more per pixel. iirc just a little more (per pixel) would have helped 360 a lot (it was discussed that 10MB rather than a little more seemed an odd choice for this reason, I think there was speculation 360 was originally designed for SD?). xbone has that little more.

whatever the case i can say 10mb worked very well, we saw seemingly less bw constraints than on ps3 (more full res particles), just as many or more full 720p games, etc.

the msaa thing is kind of pointless anyway, afaics everybody agrees msaa is basically dead, it's all post processing aa towards the future now. maybe in an ideal world you'd want that option but it's nothing to lose sleep over giving up.
 
Status
Not open for further replies.
Back
Top