Predict: Next gen console tech (9th iteration and 10th iteration edition) [2014 - 2017]

Status
Not open for further replies.
Thats a great informative reply. Just to clarify
I was speculating for the next Xbox not deriding the engineers for the current esram choice.

I was wondering if a different pool of memory would work for both the next console and allow backwards compatibility with launch titles as is, IE is esram different enough from hbm/gddr5 to not work with only hypervisor changes to force it's use or other compromises in the backwards compatibility like having to download pre setup versions that will work rather than installing from the disk similar to the 360 solution.

Esram as a larger faster pool seems like a great evolution but would that still not mean a third less ALU than the competition assuming a similar die size?.
 
I am just happy I guess that Sony didn't went with 1TB/sec EDRAM, as it could severely limit future plans it seems. Mark Cerny made the best choice.
 
Thats a great informative reply. Just to clarify
I was speculating for the next Xbox not deriding the engineers for the current esram choice.

I was wondering if a different pool of memory would work for both the next console and allow backwards compatibility with launch titles as is, IE is esram different enough from hbm/gddr5 to not work with only hypervisor changes to force it's use or other compromises in the backwards compatibility like having to download pre setup versions that will work rather than installing from the disk similar to the 360 solution.

Esram as a larger faster pool seems like a great evolution but would that still not mean a third less ALU than the competition assuming a similar die size?.
I'm not sure if I will answer this right. So bear with me, likely wrong but:

It would likely take a larger amount of memory than 8GB to be able to sustain the read/write speed of esram. 8GB DDR can read 2x faster and wrote 2X faster than esram, but when required to do both operations at the same time that bandwidth gets chopped badly fast. By chopped badly I'm probably talking about no where near the performance level.

I want to note that im not implying that to have super high read/write makes t superior, on the contrary these elite graphics coders have managed to get by this limitation quite easily. Proof
of it is that HBM is even wider and slower. However the requirement here is that more memory footprint must be available for them to write and read from. It may explain why graphics memory has continually gotten larger and larger in footprint

But when your system has 90% of its bandwidth stuck in 32MB, a thousand fraction of the size of the footprint, you've got an issue. You have no choice but to write over your data.

So whenever you hear of stories of X and Y game making 160-170-190 GB/s bandwidth in esram, remember that only 104 is reading the rest is writing. Meaning it's performing read and writes and all the algorithms for memory management on Xbox are based around reading and writing to a very small footprint. And to fully make use of Xbox the graphics guys are probably ordering and rewriting algorithms to take as much advantage of this as possible. Minimizing downtime of the hardware is the only way to extract performance, which means you are going to see that Xbox related code is going to likely have more scenarios where the developers are taking advantage around read/writes. When we read about developers having a hard time getting good performance out of esram, they are likely referring to this point. Reading and writing over memory locations really can mess shit up. One second you're reading from A. Later on the pipeline you need A but its gone. Or the values aren't right because it's been overwritten. It's likely caused a lot of headaches

This is completely the opposite of what we see happening everywhere else where they are likely doing as many reads as possible follows by as many writes as possible to keep the bandwidth up.

And such on the topic of having proper backwards compatibility they will likely have to require a technology that can do exactly that as competently as esram.
 
I'm not sure if I will answer this right. So bear with me, likely wrong but:

It would likely take a larger amount of memory than 8GB to be able to sustain the read/write speed of esram. 8GB 2x faster and wrote 2X faster than esram, but when required to do both operations at the same time that bandwidth gets chopped badly fast. By chopped badly I'm probably talking about no where near the performance level.

I want to note that im not implying that to have super high read/write makes t superior, on the contrary these elite graphics coders have managed to get by this limitation quite easily. Proof
of it is that HBM is even wider and slower. However the requirement here is that more memory footprint must be available for them to write and read from. It may explain why graphics memory has continually gotten larger and larger in footprint

But when your system has 90% of its bandwidth stuck in 32MB, a thousand fraction of the size of the footprint, you've got an issue. You have no choice but to write over your data.

So whenever you hear of stories of X and Y game making 160-170-190 GB/s bandwidth in esram, remember that only 104 is reading the rest is writing. Meaning it's performing read and writes and all the algorithms for memory management on Xbox are based around reading and writing to a very small footprint. And to fully make use of Xbox the graphics guys are probably ordering and rewriting algorithms to take as much advantage of this as possible. Minimizing downtime of the hardware is the only way to extract performance, which means you are going to see that Xbox related code is going to likely have more scenarios where the developers are taking advantage around read/writes. When we read about developers having a hard time getting good performance out of esram, they are likely referring to this point. Reading and writing over memory locations really can mess shit up. One second you're reading from A. Later on the pipeline you need A but its gone. Or the values aren't right because it's been overwritten. It's likely caused a lot of headaches

This is completely the opposite of what we see happening everywhere else where they are likely doing as many reads as possible follows by as many writes as possible to keep the bandwidth up.

And such on the topic of having proper backwards compatibility they will likely have to require a technology that can do exactly that as competently as esram.

I did read the switching from read to write was not free so I can see that whilst the operations required are easily handled by another memory technology , so because of memory access patterns it's possible that as is games use best case scenario for esram and that may be worse case for say gddr5?

I am not sure if the console api leaves any tells that would allow a hypervisor to negate this effect given it would know where esram addresses are and what move engines are. It could stop data movement which is probably half the concurrent access patterns and dynamically redirect to data in its ddr3 holding space, sort of lookup table in memory to track what is in virtual esram and what it's writing to and reading from. Could get messy quite quickly but would it be possible?

Very interesting technical challenge and as suspected pesky things like details take what seems trivial and make it complex.
 
I did read the switching from read to write was not free so I can see that whilst the operations required are easily handled by another memory technology , so because of memory access patterns it's possible that as is games use best case scenario for esram and that may be worse case for say gddr5?
this would be a good way to summarize it I think. Once again I could be wrong.


I am not sure if the console api leaves any tells that would allow a hypervisor to negate this effect given it would know where esram addresses are and what move engines are. It could stop data movement which is probably half the concurrent access patterns and dynamically redirect to data in its ddr3 holding space, sort of lookup table in memory to track what is in virtual esram and what it's writing to and reading from. Could get messy quite quickly but would it be possible?

Very interesting technical challenge and as suspected pesky things like details take what seems trivial and make it complex.

The CPU can access esram albeit extremely slowly. Esram is made specifically for the GPU. Like there are direct X commands to move memory from system RAM to GPU ram, Xbox direct X comes with additional instructions to manage esram. The hypervisor should not play a role.
 
Last edited:
Esram as a larger faster pool seems like a great evolution but would that still not mean a third less ALU than the competition assuming a similar die size?.
Not all the die is devoted to the GPU, and some elements like the external interfaces are relatively fixed within a given memory standard.
The ratio doesn't stay the same if there is a long-term goal to rely more on process scaling, and relying on an external bus means cutting into the power budget for the ALUs.

Interposer-based tech can change the power equation for memory, and maybe in the future could affect how the whole SoC is physically arranged. Until then, the ratio of area available for scaling with process node is different, and can shift how much one architecture can change with process scaling.

Also, while read/write traffic is a known pain point for DRAM, there are various other banking restrictions, refresh periods, and activation penalties inherent to managing DRAM devices that should generally be less painful for ESRAM. DRAM utilization in general can be poor.

Also, if speculation is that compression tech helps a GDDR5-based console, it can also help with an ESRAM-based one.
Bandwidth-wise, it helps ESRAM, but it can help in other ways if the compressed targets can lead to more effective ESRAM capacity. The compression pipeline for GCN is itself dependent on a cache of metadata for compression, which can experience cache misses.
That's more possible sources of latency, during which the apparent bandwidth of memory also drops.
ESRAM's lower latency can mean servicing the compression pipeline better.

There may not be a revamp of Durango with newer tech or more/faster ESRAM, but the numbers can look interesting if it did.
 
In regards to emulating the ESRAM in the xbox one, would it be possible to do with a smaller (8MB?) sram cache that was backed by "smart" cache engines that fed the cache with correct ESRAM data 99.9% (not sure what penalties would occur if you has a cache miss here) of the time when latency could matter and then then the whole system could use GDDR or HBM as main memory. Basically, the hardware would need to be able to coherently map the small sram cache and a small bit of dram main memory into 1 memory space that the emulation layer thinks is 1 continuous 32mb of esram memory. This would mean much smaller sram die area and the sram could potentially be a L3 cache when not doing emulation.

Otherwise the MS could just always include minimal of 32MB of esram going forward. Doesn't seem like that much of a problem either as long as they can find use for it other than emulation.
 
The question on everyone's mind by now must be... If PS5 is the 9th generation, is the PS4K in the 8.5th generation? Eighth-and-a-half gen. No? It's just my mind?
 
A generation is not defined as being able to emulate the previous generation. To date, it's simply been a system, so Wii was a new generation despite being architecturally no more advanced than the previous GC. Current market changes mean the concept of generations may well be over. In fact I am going to change the title of this thread to reflect this - let's get with the times!

Edit: This thread consolidates 9th and 10th iterations as we'll still be talking 'next gen' (10th iteration) while questioning alternative 9th gen iterations.

Of course, we may see this structure ditched and just have individual threads for Next Xbox Iteration and Next PS Iteration and Next Nintendo Hardware, in case one wants to introduce a 6/12 monthly upgrade cycle or some other madness. That's making a lot of sense to me right now...
 
Otherwise the MS could just always include minimal of 32MB of esram going forward. Doesn't seem like that much of a problem either as long as they can find use for it other than emulation.

I'm sure it'd get used. 3dilettante makes it clear that there's more to esram than just high stated bw, low latency and low power/GB/s. The fact that it can maintain BW better under more difficult access patterns presumably means that developers could use it maintain performance in some situations.

I find myself wondering if rather than going the HBM2 route - where you get 8/16/32GB of super high BW dram when you only need a few tens [of MBs] of the stuff - if simply stacking your main die on top of a layer of low-power esram might be a cheaper solution for a console. You could stick with an older process with lower cost per gate - such as 28nm - and slap in 128+ MB of the stuff or more with no real concerns while also avoiding TSVs. Might still retain some of the favourable power characteristics too, and being less dense (less heat per mm^2) might provide fewer cooling problems.
 
Last edited:
I'm sure it'd get used. 3dilettante makes it clear that there's more to esram than just high stated bw, low latency and low power/GB/s. The fact that it can maintain BW better under more difficult access patterns presumably means that developers could use it maintain performance in some situations.

I find myself wondering if rather than going the HBM2 route - where you get 8/16/32GB of super high BW dram when you only need a few tens [of MBs] of the stuff - if simply stacking your main die on top of a layer of low-power esram might be a cheaper solution for a console. You could stick with an older process with lower cost per gate - such as 28nm - and slap in 128+ MB of the stuff or more with no real concerns while also avoiding TSVs. Might still retain some of the favourable power characteristics too, and being less dense (less heat per mm^2) might provide fewer cooling problems.
I'm not sure I understand why write cache wouldn't be enough to take care of the latency of single ported memory. Adding more cache should be less expensive than adding a large pool of external ESRAM (even if stacked, it still needs an external interface with TSVs). Maybe that TSV interface would be of better use widening the HBM2 pool with more channels, more banks, so more concurrency?

Let's say we're talking about a 128GB/s full duplex ESRAM external die, plus another 256GB/s main memory. This would be the same number of TSVs required than a unified 512GB/s main memory (with twice the channels and banks).
 
I'm not sure I understand why write cache wouldn't be enough to take care of the latency of single ported memory. Adding more cache should be less expensive than adding a large pool of external ESRAM (even if stacked, it still needs an external interface with TSVs). Maybe that TSV interface would be of better use widening the HBM2 pool with more channels, more banks, so more concurrency?

Let's say we're talking about a 128GB/s full duplex ESRAM external die, plus another 256GB/s main memory. This would be the same number of TSVs required than a unified 512GB/s main memory (with twice the channels and banks).

I'm just going of what 3dilettante is saying. There has to be a limit to what a cache can reasonably do. Presumably a large write cache would add complexity in addition to the cost of having the large, fast memory pool it was supporting. Plus it wouldn't help once data was out of cache and a read was waiting on a sequence of other reads and writes to complete. There seem to be some benefits to a large, directly addressable pool of embedded ... at least in theory.

Regarding TSVs - why would a single external esram chip need any TSVs and not just metal layers and microbumps? And main memory would only need TSVs through the "esram layer" if it sat entirely above it (which may be necessary, I dunno - hopefully someone can clarify).

Edit: Have added attachment, was thinking of something along these lines, heat and other factors permitting.
 

Attachments

  • TSV.png
    TSV.png
    24.7 KB · Views: 18
Last edited:
I'm just going of what 3dilettante is saying. There has to be a limit to what a cache can reasonably do. Presumably a large write cache would add complexity in addition to the cost of having the large, fast memory pool it was supporting. Plus it wouldn't help once data was out of cache and a read was waiting on a sequence of other reads and writes to complete. There seem to be some benefits to a large, directly addressable pool of embedded ... at least in theory.

Regarding TSVs - why would a single external esram chip need any TSVs and not just metal layers and microbumps? And main memory would only need TSVs through the "esram layer" if it sat entirely above it (which may be necessary, I dunno - hopefully someone can clarify).

Edit: Have added attachment, was thinking of something along these lines, heat and other factors permitting.
I'm not sure what all the implications are... but either way, since you still have microbumps to connect the dies together (TSV or not, and interposer or not), there's still a (custom) interface with it's own capacitance, alignment issues, can't test the die before assembly, etc... The limitations look close to the same reasons HBM2 is difficult to implement, so if it's not going to be on-die, it looks like almost a zero sum, unless latency is a huge advantage versus adding much more B/W (I don't know!).

This wasn't the case with the XB360, since there was on-die pixel processors along with the eDRAM.
It also wasn't the case with XB1 since the external interface was a big low-cost pool of relatively slow DDR3.
The availability of HBM2 changes the game completely...

Real 3DIC have been promised for years, and it doesn't seem anywhere near ready...
 
5-6x more powerful than PS4 Neo and only 10TF? Not possible. Dual top end Vega's MIGHT be 6x more powerful than PS4 Neo but they'll also be packing something in the region of 20TF and be totally impossible to include in a console.

I know...that's why I said take with massive grain of salt. Unless he meant 5-6 times more powerful than PS4. That would put it in the ballpark of 10Tflops...
 
Single Vega gpu (4096 stream processors) will probably be around 10Tflops though no?

Other interesting tidbits:

"Panos Panay is overseeing the project and aesthetics of the console, I think it will be amazing."

If true I agree.

"it will be sold at a loss"

Skeptical Microsoft would do this but I'd be on board...:)
 
Sold at a loss, just to win the numbers game? Not going to happen.

They can just match Neo, which is more than enough, and play at the same level this time around.
 
Sold at a loss, just to win the numbers game? Not going to happen.

They can just match Neo, which is more than enough, and play at the same level this time around.
Unless they are truly modeling the business around subscription fees...ie Xbox live gold subscriptions.

They were making a shit ton of money from Xbox Live subscriptions last gen when Xbox 360 was at it's peak. Now Sony is making more from PSN than Nintendo makes in total. The subscription/digital content side is lucrative if you get a lot of hardware out there.
 
Status
Not open for further replies.
Back
Top