Predict: Next gen console tech (9th iteration and 10th iteration edition) [2014 - 2017]

Scott_Arm · Apr 22, 2016

DieH@rd said:
It is also limited to just 4GB with bigass 4096bit wide memory controller. MS would need to use HBM2 to reach 8GB.

So, not likely for an Xbox One.5 in the short term, but probably realistic for Xbox Two in the longer term. Maybe it'll be ESRAM + (G)DDR again.

hesido · Apr 22, 2016

Jay said:
if it was a big deal, would leaving it on there for b/c be a huge waste of space even considering the node it would be on?
When not in b/c mode could it be used as general scratchpad/cache, then when memory tech can replace it then replace it.
maybe the overall speed improvement of chip would be able to mask the latency difference anyway.

If it's smaller, it's still taking up space that could be used for CU's, but I don't know if there's a quantized amount of die you could fit things that come "for free".

Grall · Apr 22, 2016

Allandor said:
HBM2 has also (almost like HBM) very low frequencies (1GHz) which makes it not easier to get to the latencies of SRAM.

1GHz is faster than the GPU clock of the Xbone...

Anyhow! From what I understand, all current DRAM tech has basically the same base clock frequency (because the way DRAM works on a fundamental level does not scale up in performance well with silicon process shrinks), at around 100-150MHz. To reach high performance, the internal DRAM layout is arranged into a very wide array that transfers a lot of data at a low clock speed, and then transferred to the host through a narrower bus at a higher clock speed (very high clock speed in the case of GDDR5...)

So because of the internal functionality of DRAM, different memory technologies have similar latencies - DDR3, DDR4, GDDR5 (and presumably HBM1/2) all show roughly same latency on access. It was a long time since DRAM hooked straight up to the memory controller; there's an interface inbetween these days, and the clock speed of that interface won't radically affect latency. ...Or so I've read anyway.

A gearhead could probably give a more detailed and accurate explanation!

function · Apr 22, 2016

On the general subject of esram, esram only takes die area away from CUs if that die area would have been used for CUs had the esram not been there.

For example, if X1 had not had esram, and had used the die area for CUs and forumites commonly suggest, the machine would have been effectively useless unless MS had gone for 256-bit GDDR5 at significantly higher cost (not just increased cost for the memory chips, but also for power, board etc).

You can't just throw a ton more CUs onto a chip while simultaneously slashing the BW down to unusable levels.

Allandor · Apr 22, 2016

Shifty Geezer said:
Has anyone spoken about the impact and importance of the low latency? When it was raised in the past, it was basically dismissed as advantage (MS themselves said graphics work was latency tolerant). So unless we have evidence that the latency is a factor, I'm inclined to think it can just be ignored. Gains in processing rate might offset any latency disadvantage.

If latency is no factor, they could have easily used embedded DRAM

, which would be much cheaper and had more theoretical bandwidth. For graphics alone, latency isn't a factor (that's ok) but if you want to do more with it (e.g. GPGPU (e.g. async compute)) it is needed to quickly change some states without loosing the time-frame. Latencies would cut those times into peaces (if you really want to go for high efficiency).

Shifty Geezer · Apr 22, 2016

The choice of ESRAM was for production; eDRAM limits fabs they could work with. No-one's ever given figures for the latency AFAIK, only that it's 'fast'. so we've no idea in real world situations what the latency benefits are. If there really are some noteworthy benefits, real enablers, there should be some papers/talks explaining them. And MS, I'd have thought, would have talked up the compute advantages etc. during 2012 alongside the whole Balance PR message.

Jay · Apr 22, 2016

would edram affect which foundries can be used, I thought that was one of the reasons for going esram?

latencies may be nice but I've never heard any dev actually say it's a benefit. Enough for it to be a concern if moving to a hugely faster chip.

- ninjered

Scott_Arm · Apr 22, 2016

function said:
On the general subject of esram, esram only takes die area away from CUs if that die area would have been used for CUs had the esram not been there.

For example, if X1 had not had esram, and had used the die area for CUs and forumites commonly suggest, the machine would have been effectively useless unless MS had gone for 256-bit GDDR5 at significantly higher cost (not just increased cost for the memory chips, but also for power, board etc).

You can't just throw a ton more CUs onto a chip while simultaneously slashing the BW down to unusable levels.

The question becomes balancing the cost of off-die RAM vs yields, cost and performance of the APU. I'm obviously not privy to that info, but it seems like the complexity of ESRAM may not be worth the price you save on DDR3 vs GDDR5. Not to mention, eliminating ESRAM basically puts development in-line with videocards and PS4, where ESRAM is not something you have to manage. If devs want ESRAM gone (I have no idea if they do), it'll probably go.

function · Apr 22, 2016

Scott_Arm said:
The question becomes balancing the cost of off-die RAM vs yields, cost and performance of the APU. I'm obviously not privy to that info, but it seems like the complexity of ESRAM may not be worth the price you save on DDR3 vs GDDR5. Not to mention, eliminating ESRAM basically puts development in-line with videocards and PS4, where ESRAM is not something you have to manage. If devs want ESRAM gone (I have no idea if they do), it'll probably go.

The esram is supposed to be very defect tolerant, so I'd guess it has little effect on yields.

I image that devs would only like to see esram gone if they're faced with a better option. In the case of PS4, I think it's fair to say they had a better option. With GDDR5X not ready yet, and HBM2 not arriving for consumers till next year (and likely being very expensive), it could be that embedded memory is the only cost effective way to exceed a certain performance threshold.

With esram tools now maturing and developers seemingly getting to grips with it, perhaps there's still a role for it in a next gen system. It could offer HBM like BW where you need it most for a far lower cost.

Michellstar · Apr 22, 2016

If from now on, all Xbox One exclusives become cross-platform with PC, Back compatibility won´t pose a problem with XO, the´ll get the pc version.

fehu · Apr 22, 2016

Considering that interposer are just passive, isn't possible that down the road to decrease costs it will be used one or more pcb layers instead?

3dilettante · Apr 22, 2016

Shifty Geezer said:
The choice of ESRAM was for production; eDRAM limits fabs they could work with. No-one's ever given figures for the latency AFAIK, only that it's 'fast'. so we've no idea in real world situations what the latency benefits are. If there really are some noteworthy benefits, real enablers, there should be some papers/talks explaining them. And MS, I'd have thought, would have talked up the compute advantages etc. during 2012 alongside the whole Balance PR message.

I tried to cross-reference the SDK leak with various presentations on the console APUs.
https://forum.beyond3d.com/threads/...are-investigation.53537/page-407#post-1816112

If at least in the ballpark of reality, the numbers indicate that the ESRAM takes roughly half the time to service a request.
In terms of the DRAM mattering, HBM is indicated to be roughly equal to GDDR5 in latency, and GDDR5 seems to give 50-60 cycles of latency out of the hundreds experienced.
The time it takes to traverse the APU and get through the memory pipeline appears to dominate.

The latency difference might matter if something leans on it heavily, although the GPU is generally tolerant of longer latency.
The more predictable minimum bandwidth figure is a significant difference. DRAM's various penalties can drop bandwidth by an order of magnitude, whereas the ESRAM can reliably promise its baseline bandwidth. This does have knock-on effects on the memory pipeline, which doesn't need to take additional time to collect accesses to minimize penalties like read/write turnaround.

iroboto · Apr 22, 2016

Shifty Geezer said:
I was replying to Egmon who seems to think MS are disabling the ESRAM in XB1 in software and leaving devs with only 60 BG/s to work with. A single large pool as you suggest would work as long as latency isn't an issue, and I for one believe it isn't and what you suggest would work.

The high read/write capability of esram might be the only hurdle. From what we know today latency is not an issue; but with async compute threads becoming more popular in games perhaps it could be?

I guess a big factor is how much differently developers used this to their advantage or not. So far I'm guessing not as much.

function · Apr 22, 2016

X1 seems to be doing remarkably well for a system with only half the ROPs. Perhaps the leaked docs were telling the truth, and low latency does help with ROP efficiency (so it's merely a disadvantage instead of a massacre)?

iroboto · Apr 22, 2016

function said:
X1 seems to be doing remarkably well for a system with only half the ROPs. Perhaps the leaked docs were telling the truth, and low latency does help with ROP efficiency (so it's merely a disadvantage instead of a massacre)?

This may actually have to do with the balanced statement they made. They only have so many CUs and so much bandwidth it probably lines up well with ROPs as well. It would appear as though maxed out in every section XBO can only max out so many ROPs as well.

Egmon83 · Apr 23, 2016

Shifty Geezer said:
I was replying to Egmon who seems to think MS are disabling the ESRAM in XB1 in software and leaving devs with only 60 BG/s to work with. A single large pool as you suggest would work as long as latency isn't an issue, and I for one believe it isn't and what you suggest would work.

You are putting words in my mouth.
They probably have the option to bypass ESRAM when compiling games with the newer SDK > for Xbox 2.
Xbox 2 will be backwards compatible; or even the most devoted of Xbox fans would be upset.
The ESRAM, again, is not that special. Xbox 2 wil not use DDR3 ram.

MS learned from Xbox 360 that an unusual memory setup would be hard to emulate on future hardware. So with Xbox One they were prepared, and even though they shipped an unorthodox: severely underpowered for the transistor count, machine, you can be certain that they made sure games would be programmed in a way that allowed for simple recompilation on future hardware setups.
Xbox One 'devkits' like the Nvidia Forza 5 demos, obviously didn't use ESRAM. take that as a hint

Scott_Arm · Apr 23, 2016

So, here's a question about Xbox Two, assuming there's an Xbox One.5. If they don't drop ESRAM now, does it get even harder to drop it later?

Scenario one:
They keep ESRAM as part of the design for One.5. If they do that, they pretty much have to increase the amount, because 32MB is not big enough for 1080p in a lot of engines like Frostbite and UE4.0. If they increase the capacity of ESRAM for a mid-cycle upgrade, does that force them to go all-in on that design with Xbox Two, or would HBM2 be a possible replacement? How much of the die to they sacrifice doing this? What's the next logical bump in ESRAM after 32MB? It sounds like Polaris is going to have 36 and 40 CU variants, and with PS4 Neo going with 36 CUs we should be roughly able to guess another Xbox wouldn't be able to do much better, considering power budget and thermal limits for a home console. Could they even fit 36CU on a die with ESRAM?

Scenario Two:
They drop ESRAM for Xbox One.5, but it sounds like they'd pretty much have to go with HBM2 right away. This scenario sounds less realistic to me, at this point.

Egmon83 · Apr 23, 2016

Scott_Arm, they won't use ESRAM again, look at this:

That's right. Can you imagine how 64MB (which is still too small for any real improvements) would look?
They will have HBM3 before ESRAM, count on it.

Scenario 2 is most likely, but with GDDR5 instead of HBM2

function · Apr 23, 2016

Egmon83 said:
You are putting words in my mouth.

He's really not.

They probably have the option to bypass ESRAM when compiling games with the newer SDK > for Xbox 2.

Developers have the option of not using esram, should they be stupid enough. You made no mention of the unknown "Xbox 2", indeed, you talked about "a compile option in newer SDKs to bypass it" meaning X1. You are full of JUICE (ModEdit).

The ESRAM, again, is not that special. Xbox 2 wil not use DDR3 ram.

Whether the ESRAM is special is a completely different question to whether "Xbox Two" will use DDR3. Your attempts to dig yourself out of the mess you are drowning in will not be helped by you efforts here.

MS learned from Xbox 360 that an unusual memory setup would be hard to emulate on future hardware.

You made no mention of emulation. You're trying to retrofit your awful failed argument to make yourself look like less of a mess. MS had many years with X360. It had a less usual memory arrangement than the PS3. MS still managed to emulate it on a less powerful system than the PS4.

After your awful arguments that Nvidia's next gen graphics card won't be DX12 capable, I don't know how you still have the gall to be posting.

iroboto · Apr 23, 2016

Egmon83 said:
Xbox One 'devkits' like the Nvidia Forza 5 demos, obviously didn't use ESRAM. take that as a hint

I'm not sure if you are serious. It took them 4 man months to port the game over to DX12 suitable for a nvidia GPU.

Predict: Next gen console tech (9th iteration and 10th iteration edition) [2014 - 2017]

Scott_Arm

hesido

Grall

Invisible Member

function

None functional

Allandor

Shifty Geezer

uber-Troll!

Jay

Scott_Arm

function

None functional

Michellstar

fehu

3dilettante

iroboto

Daft Funk

function

None functional

iroboto

Daft Funk

Egmon83

Scott_Arm

Egmon83

function

None functional

iroboto

Daft Funk

Similar threads