Predict: Next gen console tech (9th iteration and 10th iteration edition) [2014 - 2017]

Status
Not open for further replies.
Non-Manhattan routing is long-standing topic of research, since there are advantages in terms of wire length and the number of metal layers necessary for a diagonal connection versus right angle turns.
Routing methods that allow for increasing options for diagonals such as Y and X architectures exist.
For people interested in mass production it's research, for others on the cutting edge it's actual development. :yep2:
 
Non-Manhattan routing is long-standing topic of research, since there are advantages in terms of wire length and the number of metal layers necessary for a diagonal connection versus right angle turns.
Routing methods that allow for increasing options for diagonals such as Y and X architectures exist.

That's all in the past though. Double patterning puts significant restrictions on what can be done with litography. Intel moved to 1D layouts at the 45nm node, where all wires in a layer runs in one direction, either horizontal or vertical, ie. not even Manhatten routing. This increases the layout work significantly which is why the foundries have supported 2D layouts up to now. AFAIK TSMC will move to a 1D layout scheme at the 10nm node, Samsung apparently will continue with 2D layouts at the 10nm node.

1D makes design harder but eases manufacturability, 2D makes design easier but manufacturing much harder.

Cheers
 
Last edited:
That's all in the past though. Double patterning puts significant resitrictions on what can be done with litography. Intel moved to 1D layouts at the 45nm node, where all wires in a layer runs in one direction, either horizontal or vertical, ie. not even Manhatten routing. This increases the layout work significantly which is why the foundries have supported 2D layouts up to now. AFAIK TSMC will move to a 1D layout scheme at the 10nm node, Samsung apparently will continue with 2D layouts at the 10nm node.

1D makes design harder but eases manufacturability, 2D makes design easier but manufacturing much harder.

Cheers

The more recent papers (most are still rather old) on complex routing schemes had retreated to higher layers and global interconnects as the local blocks necessitated higher regularity. Synopsys mooted non-Manhattan for passive interposers, which is one of the most recent mentions I've seen for it.
http://www.chipex.co.il/_Uploads/dbsAttachedFiles/MarcoCasaleRossiSynopsys.pdf
That's more of a gray area where it's silicon acting more like a PCB or package than the main die.
 
Btw, I read that in current gen consoles, it was hard to "exploit" the bandwidth because of the cpu and gpu accessing it at the same time, and it was not very efficient performance wise. How do you think they can improve on that ? More cache on cpu/gpu ? Separate / dedicated ram pools ? (let say 8gb "main" and 8gb "vram" ?). Or nothing, and we have to live with it because a solution would be too costly ?
 
Last edited:
Btw, I read that in current gen consoles, it was hard to "exploit" the bandwith because of the cpu and gpu accessing it at the same time, and it was not very efficient performance wise. How do you think they can improve on that ? More cache on cpu/gpu ? Separate / dedicated ram pools ? (let say 8gb "main" and 8gb "vram" ?). Or nothing, and we have to live with it because a solution would be too costly ?
I don't think that statement is accurate. Much of the discussion is about bandwidth loss due to synchronous memory access by the CPU and GPU.

It's a behaviour we have discussed in detail, mainly, we bring it up when we discuss real world
Performance numbers for bandwidth, and compare it to other scenarios where we are strictly discussing bandwidth numbers.

the overall advantages outweigh the disadvantages else a dual pool of fast slow memory would have been implemented.
 
the overall advantages outweigh the disadvantages else a dual pool of fast slow memory would have been implemented.
Was implemented in the XO.

In terms of development, the issue was more the size (32MB) than inherent trouble designing for split memory.

1X doesn't surprise me it going single pool of memory, for multiple reasons.

But next gen, if it meant 4GB HBM, 18GB DDR and it meant a cheaper device, I could see it happening.
The 4GB for just intermediate RT.
Otherwise your talking 18 HBM, or 18 GDDR6 or something.

HBM + DDR may not have much issues in regards to memory controller, also I believe amd already has that on their roadmap.

May even have hardware page management, HBMCC etc.
So I wouldn't just rule it out, especially if say sony went for clean gen break with no BC.
 
Was implemented in the XO.

In terms of development, the issue was more the size (32MB) than inherent trouble designing for split memory.

1X doesn't surprise me it going single pool of memory, for multiple reasons.

But next gen, if it meant 4GB HBM, 18GB DDR and it meant a cheaper device, I could see it happening.
The 4GB for just intermediate RT.
Otherwise your talking 18 HBM, or 18 GDDR6 or something.

HBM + DDR may not have much issues in regards to memory controller, also I believe amd already has that on their roadmap.

May even have hardware page management, HBMCC etc.
So I wouldn't just rule it out, especially if say sony went for clean gen break with no BC.
I'm not ruling it out, but as of this moment, they've gone a single pool for reasons beyond just performance.

1X was not fully split. DDR3 was shared pool of memory and esram was for the most part consider inaccessible. That shared pool while lower in bandwidth was still a place where developers could lean on for compute.

And in the console space where you're leaning very heavy on GPU, compute is a necessary part of high performance.
 
I'm not ruling it out, but as of this moment, they've gone a single pool for reasons beyond just performance.

1X was not fully split. DDR3 was shared pool of memory and esram was for the most part consider inaccessible. That shared pool while lower in bandwidth was still a place where developers could lean on for compute.

And in the console space where you're leaning very heavy on GPU, compute is a necessary part of high performance.
I was more referring to your comment about overall advantages outweigh the disadvantages for single pool.
I think it's possible that the balance of cost and performance could change. At the moment the cost is possibly higher to have 2 pools, but I think that could change moving forward.

Having high performance may be important, but's it's only really important to have the high performance in the places that's required.

I'm pretty sure amd has this set up lined up for one of their mobile parts roadmap? Correct me if I'm wrong, or it's changed.
If so, will be interesting to see how it does across all metrics (cost, performance etc)
 
I was more referring to your comment about overall advantages outweigh the disadvantages for single pool.
I think it's possible that the balance of cost and performance could change. At the moment the cost is possibly higher to have 2 pools, but I think that could change moving forward.

Having high performance may be important, but's it's only really important to have the high performance in the places that's required.

I'm pretty sure amd has this set up lined up for one of their mobile parts roadmap? Correct me if I'm wrong, or it's changed.
If so, will be interesting to see how it does across all metrics (cost, performance etc)
I don't think we've had, or I have at least, seen metrics that showcase the efficiency of HUMA architecture vs traditional split pool. So I've been hesitant to use wording that would suggest that one has unbeatable advantages over the other. I recognize that
a) split pools still suffer from similar losses, not as a result of prioritization, but simultaneous read/writes will drop the bandwidth of any memory pool. So if you're copying textures over from system to VRAM, or requiring compute results back into system ram, there's going to be scenarios where this is going to be heightened.
b) developers using APIs and other tricks can hide the latency of copying back and forth
c) it would suggest that HUMA based systems reach their maximum potential by offloading a significant portion of what would be CPU work to the GPU. 2 separate pools would not benefit as much from this arrangement.

Without knowing how developers have leveraged their games around HUMA, it's also difficult to compare. But it's not hard to imagine the scenario where as maturity of development of this generation continues, the lean towards taking advantage of their features will become deeper and pronounced.
 
I don't think we've had, or I have at least, seen metrics that showcase the efficiency of HUMA architecture vs traditional split pool. So I've been hesitant to use wording that would suggest that one has unbeatable advantages over the other. I recognize that
a) split pools still suffer from similar losses, not as a result of prioritization, but simultaneous read/writes will drop the bandwidth of any memory pool. So if you're copying textures over from system to VRAM, or requiring compute results back into system ram, there's going to be scenarios where this is going to be heightened.
b) developers using APIs and other tricks can hide the latency of copying back and forth
c) it would suggest that HUMA based systems reach their maximum potential by offloading a significant portion of what would be CPU work to the GPU. 2 separate pools would not benefit as much from this arrangement.

Without knowing how developers have leveraged their games around HUMA, it's also difficult to compare. But it's not hard to imagine the scenario where as maturity of development of this generation continues, the lean towards taking advantage of their features will become deeper and pronounced.
if your only talking about performance profiles, I can understand your points.
I'm talking as a whole, which includes the cost of memory etc.
I'm not saying in future it will definitely be cheaper but I can see a scenario where, performance, cost, complexity overall ends up balanced on the side of having small fast with big slower memory pool set up.
 
if your only talking about performance profiles, I can understand your points.
I'm talking as a whole, which includes the cost of memory etc.
I'm not saying in future it will definitely be cheaper but I can see a scenario where, performance, cost, complexity overall ends up balanced on the side of having small fast with big slower memory pool set up.
I'm sure price is a factor in there somewhere.
Big GPU/small CPU will probably have better cost tradeoffs overall and HUMA helps as an enabler.
So in the scenario if it's possible a split pool would be cheaper there would likely be costs elsewhere to make up for the absence of HUMA, likely, a required increase in performance from the CPU. Also increasing heat and power etc.
 
Was implemented in the XO.

In terms of development, the issue was more the size (32MB) than inherent trouble designing for split memory.

1X doesn't surprise me it going single pool of memory, for multiple reasons.

But next gen, if it meant 4GB HBM, 18GB DDR and it meant a cheaper device, I could see it happening.
The 4GB for just intermediate RT.
Otherwise your talking 18 HBM, or 18 GDDR6 or something.

HBM + DDR may not have much issues in regards to memory controller, also I believe amd already has that on their roadmap.

May even have hardware page management, HBMCC etc.
So I wouldn't just rule it out, especially if say sony went for clean gen break with no BC.
I feel like the cost associated with using any HBM would mean that if you are using it at all it makes sense to use as much as physically possible.
 
I feel like the cost associated with using any HBM would mean that if you are using it at all it makes sense to use as much as physically possible.
If it's on amd's roadmap for mobile device then I would say there's a very good reason for it, and one could very well be cost.
I had a quick google but couldn't find it. so as i said I could very well be wrong about overall feasibility/cost etc.

I'm not against a single pool, just saying may be more than a reasonable solution next gen.

i understand why you would say that though, but i have no idea how they charge for hbm, they may still charge a lot for the stacks even though the interposer is already there etc?
 
I thought the benefit of an unified memory system was coherency and the ability for the CPU and GPU to efficiently work from the same data in ram.

Do you actually need an unified memory system to provide most of desired aspects?

In other words if you take the Xbox one and replace esram with a bunch of gddr5/6 or hbm, how would that present a challenge?

Data utilized by esram now, isn't coherent or readily accessed by the CPU.

You need easy CPU access or coherency write out to DDR4, if not write out to VRAM.

Plus 25-40% of the ram in consoles is being reserved by the OS. How much of that reservation really needs the performance of gddr or HBM?
 
Last edited:
http://wccftech.com/sk-hynix-samsung-micron-hbm-hmc-ddr5-hot-chips/

Nope. Notice how Samsung basically envisions everything outside of graphics solutions relying on a combination of HBM+DDR4.
Thanks.

I think slide System & Memory Architecture (first slide on page) gives an indication of what I was trying to get across.
The Client-DT & NB (B/W & Cost) is the kind of set up I see being viable next gen.

The only one in the slide that only used HBM was Network and Graphics. Which makes perfect sense.

I wouldn't suggest that you could have an apu that used ddr & gddr together though, but hbm & ddr I can seeing being done.
 
Last edited:
Status
Not open for further replies.
Back
Top