Predict: The Next Generation Console Tech

Status
Not open for further replies.
It could be useful someday for a console. It has benefits in simplifying board design and can provide more performance than would be possible with off-module DRAM.

The downsides are not yet quantified.
The interposer, addtional chips, and special manufacturing for the components would be a cost adder. Without knowing how much money is saved with a more compact system board and how much more expensive the 2.5D integration solution is, I am not sure if it would be cost-effective.

The initial markets would be in embedded and mobile applications, but one example would be Intel researching it for its mobile chips, which on their own can be more expensive than an entire console.

The other question is one of readiness. The tech is not ready yet, and there may be some desire to see long-term reliability studies of the modules. There are a number of mechanical, thermal, and electrical factors that could compromise function over time due to thermal and electrical stress.
This could be an RROD several times over if it's deployed prematurely.
 
It could be useful someday for a console. It has benefits in simplifying board design and can provide more performance than would be possible with off-module DRAM.

The downsides are not yet quantified.
The interposer, addtional chips, and special manufacturing for the components would be a cost adder. Without knowing how much money is saved with a more compact system board and how much more expensive the 2.5D integration solution is, I am not sure if it would be cost-effective.

The initial markets would be in embedded and mobile applications, but one example would be Intel researching it for its mobile chips, which on their own can be more expensive than an entire console.

The other question is one of readiness. The tech is not ready yet, and there may be some desire to see long-term reliability studies of the modules. There are a number of mechanical, thermal, and electrical factors that could compromise function over time due to thermal and electrical stress.
This could be an RROD several times over if it's deployed prematurely.


Stacking though allows you to go with 2 or 3 smaller chips instead of one big one. That will allow better yields which is a cost saver and a power/thermal saver. It also allows center chip connects which for the stack will help with heat disipation and facilitates a central power distribution which also helps with heat.
 
Stacking though allows you to go with 2 or 3 smaller chips instead of one big one. That will allow better yields which is a cost saver and a power/thermal saver. It also allows center chip connects which for the stack will help with heat disipation and facilitates a central power distribution which also helps with heat.
You're assuming that you can do a lot of validation prior to connecting the chips which might not be the case.

Edit: Another issue is if you find a yield issue after stacking you might be throwing away a good chip because of a bad chip. I'm just trying to show that there are multiple things to consider.
 
You're assuming that you can do a lot of validation prior to connecting the chips which might not be the case.

Edit: Another issue is if you find a yield issue after stacking you might be throwing away a good chip because of a bad chip. I'm just trying to show that there are multiple things to consider.


Of course you are trading one set of risks for another as node shrinks are risky too, but we are coming to the end of shrinks that are fiscally advantageous.

The validation question is a good one in that there are 2 avenues available in a stacked process(stack-n-cut or cut-n-stack) and the ability to do spot validation (eps. on wafer) could decide which becomes the dominant option.

Going forward, imo stacked chips are the future and in 5 yrs time that'll have become obvious and only the dinosours will not be participating.
 
Since it has implications I will link/paste my questions here about AMD's future rumored chip on a Silicon Interposer and Stacked Memory.

http://forum.beyond3d.com/showpost.php?p=1593574&postcount=869

I know answers may be general / guestimations, but I am curious about some things about the Silicone Interloper (SI) and stacked memory -- manufacturing cost, and benefits and sacrifices.

1. Memory Performance. It appears stacked memory requires lower performance memory. Between the added benefit of stacked memory & the SI's ability to offer much wider I/O (could we be looking at 512bit? 1024bit? More?) what kind of memory performance are we looking at?

2. Memory Density. I haven't been following stacked memory, so what are the current densities offered per-module as compared to currently competing non-stacked products? What is the prognosis for 2012? If this is a far off GPU, say late 2013 or 2014 what does the memory landscape look like?

3. What kind of chip-area related constraints are there? An SI is going to need to fit the GPU + memory and anything related (bus to the CPU and such as well). Is the SI going to be sufficient for a mainstream GPU or is it going to be constrained to the laptop-style market with smaller (e.g. sub-150mm^2 chips) and few(er) memory modules?

4. The manufacturing cost of a SI (Silicone Interposer). If I am understanding this correctly this isn't too far off, in concept, from how many laptops use a distinct PCB for the GPU/Memory. I know my Dell XPS M1330 (spit!) had such. But what is the cost comparison? Is there a net neutral or positive trade off by investing in SI (more expensive) but then again reduction of PCB complexity, expensive GDDR, etc?

5. CPU link. What kind of connection would be expected to the CPU? Could something crazy be set up, e.g. the GPU and *System Memory* be on the SI and then the CPU be on the main-board? Would something like this, a single very fast connection, allow for a fairly simple mainboard PCB but also increased lines for the CPU (Bandwidth)?

6. Multi-GPU... Multi-Chip (GPU/CPU). Would an SI offer a realistic solution for Crossfire/SLI cross traffic or will it not hold enough large chips? And what of the prospect of the CPU moving to the SI?

Obviously this could be a big hit in the Laptop market as well as the midrange PC market where, if SI costs can be held in check, AMD could offer some pretty compelling hardware-PLATFORMS. Ok, we know Bulldozer bit it hard (although, oddly, my dad is very excited about 8 CPU cores at a modest investment... he didn't even know any of the performance issues, so I guess in some circles AMD has a lot of positive mojo still) but could the SI be a game changer for AMD? Instead of marketing CPUs they could use SIs to market the "platform" GPU/CPU/Memory and chase the competition from that end?

Ditto consoles... lets say a certain high profile game developer heard that a certain console manufacturer was considering multi-GPU console solutions. What kind of craziness can be fit onto an SI? A couple 200mm^2 GPUs and 4GB of memory too much? What of the CPU?

7. Lets get crazy: XDR2. Is there any indication of stacking with XDR modules? Let's take crazy pills: The AMD XDR rumor is one crazy pill. The fact Sony' PS3 uses XDR and may also be contracting GPU services is another crazy pill. XDR also has fewer pins per module which may also allow for even crazier potential bandwidths is worth 2 more crazy pills. Did my question just go straight to the psych ward or is XDR2, at least in theory, something could could be used downstream? Or are there reasons this is a non-starter?

Thanks for allowing me to ask all these questions. I just had my wisdom teeth surgically removed so I felt the extra kick of crazy pills to come back and post some wild questions. No don't all flood my PM box at once now with those non-leak leaks :cool:
I cannot read the SA site for some reason (failed to load) but the concept of using an Interposer for jacked up bandwidth and chip communication on a platform with no need for upgrade concerns by the end user seems like a possible solution for consoles.

Getting away from eDRAM but still having a lot of bandwidth--to all system memory--seems possible.

And with the DICE comments about multi-GPU I don't think a traditional SLI/Crossfire with microstuttering and lost performance + PCB complexity/cooling makes a ton of sense. BUT an SI where you could go with, say, 2x175mm^2 chips with extremely FAST chip-to-chip communication AND figure a way CONSOLIDATE memory and you may have a win in offsetting the cost of the SI by not using eDRAM, simplifying the PCB, and getting better power/heat metrics and better yields by going multichip. (EDIT: Now that I think about it, RSX in the PS3 was originally on its own PCB with memory. So going with a little more advanced packaging for the GPU isn't known, although this would be a further step in that direction).

Wild comments... but I am sure in 2013/2014 a lot is on the table. Some things, like FINFETS and Optical Interconnects are not an option, but an SI appears to be.

Lately I have been ho-hum about the prospects that our next consoles may last 8-10 years but also be a smaller leap than previous generations--this gives me hope that maybe someone will do something *smartly* radical.
 
Stacking though allows you to go with 2 or 3 smaller chips instead of one big one. That will allow better yields which is a cost saver and a power/thermal saver. It also allows center chip connects which for the stack will help with heat disipation and facilitates a central power distribution which also helps with heat.

The module pictured had stacked DRAM, and I'm not sure they were significantly smaller than regular DRAM dies. They probably can't be much smaller because a GPU module is going to have a minumum amount of memory on the interposer, and smaller chips have lesser capacity.
DRAM already has very high yields, which the die thinning and stacking process would probably reduce.

The rest of the components on the module are not stacks, and their individual yield rates can only go down when they are integrated into a module.
The GPU looks to be a single layer, and the interposer is a single die. The interposer's yields will probably need to be very high, and this can be accomplished since they are using very mature processes for it.

There are still challenges in combining multiple layers of silicon and thousands of thin connections between layers, and uncertainty in how well they will function over time.
 
Since it has implications I will link/paste my questions here about AMD's future rumored chip on a Silicon Interposer and Stacked Memory.

http://forum.beyond3d.com/showpost.php?p=1593574&postcount=869

I cannot read the SA site for some reason (failed to load) but the concept of using an Interposer for jacked up bandwidth and chip communication on a platform with no need for upgrade concerns by the end user seems like a possible solution for consoles.

Getting away from eDRAM but still having a lot of bandwidth--to all system memory--seems possible.

And with the DICE comments about multi-GPU I don't think a traditional SLI/Crossfire with microstuttering and lost performance + PCB complexity/cooling makes a ton of sense. BUT an SI where you could go with, say, 2x175mm^2 chips with extremely FAST chip-to-chip communication AND figure a way CONSOLIDATE memory and you may have a win in offsetting the cost of the SI by not using eDRAM, simplifying the PCB, and getting better power/heat metrics and better yields by going multichip. (EDIT: Now that I think about it, RSX in the PS3 was originally on its own PCB with memory. So going with a little more advanced packaging for the GPU isn't known, although this would be a further step in that direction).

Wild comments... but I am sure in 2013/2014 a lot is on the table. Some things, like FINFETS and Optical Interconnects are not an option, but an SI appears to be.

Lately I have been ho-hum about the prospects that our next consoles may last 8-10 years but also be a smaller leap than previous generations--this gives me hope that maybe someone will do something *smartly* radical.

2x175 mm2 chips in an interposer package seems awfully large. You'd probably be better off with one 300mm2 SoC and ram.
 
The module pictured had stacked DRAM, and I'm not sure they were significantly smaller than regular DRAM dies. They probably can't be much smaller because a GPU module is going to have a minumum amount of memory on the interposer, and smaller chips have lesser capacity.
Why would you assume they'd be smaller? they are just taking current dram chips and stacking them. In the end, you still need 4 chips.

DRAM already has very high yields, which the die thinning and stacking process would probably reduce.

Which makes dram the best guinea pig for a stacking process. Why do you assume yields would reduce from stacking?

The rest of the components on the module are not stacks, and their individual yield rates can only go down when they are integrated into a module.

Why would there individual yields be effected in any way? When I add a chip to a pcb does that effect the yield rate for that chip?

The GPU looks to be a single layer, and the interposer is a single die. The interposer's yields will probably need to be very high, and this can be accomplished since they are using very mature processes for it.

The interposer layer is interesting, it may be a bridge technology or it may become the base in any full stack or multi-stack design. For dram manufactures, they will need to provide some type of logic layer in their stacks for integration, that could be an interposer as shown but I don't know that other chip fabricators would be willing/able to have that outside their control. An almost certainty is that there will be standards established around die sizes and stack interconnect patterns. As you say, yields we need to be high for the interposer layer (as it's effectively replacing part of a pcb) so it'll be interesting to see just how complicated this layer becomes. In my crystal ball I see it becoming more of a optical interconnect layer.

There are still challenges in combining multiple layers of silicon and thousands of thin connections between layers, and uncertainty in how well they will function over time.

This is always a given for tech, it doesn't mean we don't move forward.
 
Why would you assume they'd be smaller? they are just taking current dram chips and stacking them. In the end, you still need 4 chips.
Your point was that small dies get better yields, my statement was that none of the chips that were stacked appear to be any smaller, and probably wouldn't be smaller. At best, the yields are the same, and I will go further and indicate why they are probably worse.

Which makes dram the best guinea pig for a stacking process. Why do you assume yields would reduce from stacking?
Any additional step in the process presents a small but non-zero fault rate. The stacked DRAM is probably using TSV and thinned dies, all of which may have manufacturing errors and can contribute to longevity concerns because they involve mechanically weak components.

Why would there individual yields be effected in any way? When I add a chip to a pcb does that effect the yield rate for that chip?
The individual yield rates of each component and manufacturing step multiply to give the final yield of a complete module. I was saying that whatever good yields each component has, the final result would be lower than the individual yields.
(edit: had an incomplete sentence above)

This is always a given for tech, it doesn't mean we don't move forward.
It's a valid concern for a next gen console that would be released in a few years. Hundreds of millions to billions of dollars would be riding on the reliability of an immature 2.5D implementation.
 
Last edited by a moderator:
but it's über cool :) it's like bandwith was sitting there, waiting to be discovered.
if we compare it to AMD's anemic sideport on IGPs, that's like the difference between a SSD and a painful thumb drive.
 
Your point was that small dies get better yields, my statement was that none of the chips that were stacked appear to be any smaller, and probably wouldn't be smaller. At best, the yields are the same, and I will go further and indicate why they are probably worse.

Actually the point to that response was to ask why you thought the dram chips would be smaller, I see no reason they'd need to be unless you were looking to trade memory density for a smaller footprint on the interposer. That's a design choice though, not yield concession. As to other chips, hell yes I think taking a 450mm^2 gpu and slicing it into 3 150mm^2 section will produce better yields. If nothing else, you'll get better layout utilization on the wafer.


Any additional step in the process presents a small but non-zero fault rate. The stacked DRAM is probably using TSV and thinned dies, all of which may have manufacturing errors and can contribute to longevity concerns because they involve mechanically weak components.

Those may be possible fabrication choices, yes. I was truly interested in your opinion so I won't debate the point and just say I appreciate the response.


The individual yield rates of each component and manufacturing step multiply to give the final yield of a complete module. I was saying that whatever good yields each component has, the final result would be lower than the individual yields.
(edit: had an incomplete sentence above)

That wasn't what you said (or at least what I was responding to), you said: "their individual yield rates can only go down when they are integrated into a module.". I see nothing that makes this any truer.


I
t's a valid concern for a next gen console that would be released in a few years. Hundreds of millions to billions of dollars would be riding on the reliability of an immature 2.5D implementation.

I completely agree, but every console launch has had its risks. That's no reason to not move forward if the rewards can be there.
 
That wasn't what you said (or at least what I was responding to), you said: "their individual yield rates can only go down when they are integrated into a module.". I see nothing that makes this any truer.
The stacking process is hardly perfect. It's going to introduce defects of it's own.
 
But that doesn't effect the yield of individual components since they've already "yielded" by the time they reach the stacking process.

When a stack fails final test, all the good dies in it are lost, and thus yield of those individual dies goes down.

Cheers
 
Actually the point to that response was to ask why you thought the dram chips would be smaller, I see no reason they'd need to be unless you were looking to trade memory density for a smaller footprint on the interposer.
I didn't say the chips would be smaller, I said they probably wouldn't be because the drop in capacity would make it less likely they could fit all they needed on the module.

That's a design choice though, not yield concession. As to other chips, hell yes I think taking a 450mm^2 gpu and slicing it into 3 150mm^2 section will produce better yields. If nothing else, you'll get better layout utilization on the wafer.
The module pictured did not slice the GPU or memory. A 3-way slice of a GPU would not play nice with the memory bus, unless it is a tri-channel setup with one bus per slice, and it would be difficult for the middle slice with its neighbors hogging the majority of the pad space.
 
http://msnerd.tumblr.com/post/12233928364/clarity

The Xbox is another story altogether. With a heady mix of rumors, tips and speculation, I am now stating that Xbox codename “loop” (the erstwhile XboxTV) will indeed debut a modified Win9 core. It will use a Zune HD-like hardware platform—a “main” processor with multiple dedicated assistive cores for graphics, AI, physics, sound, networking, encryption and sensors. It will be custom designed by Microsoft and two partners based on the ARM architecture. It will be cheaper than the 360, further enabling Kinect adoption. And it will be far smaller than the 360. It will also demonstrate how Windows Phone could possible implement Win9’s dev platform on the lower end.
 
Last edited by a moderator:
So GAF then extrapolated that Xbox Loop is more like "XBox TV" and real next Xbox then is Xbox PU ("product update"), which is listed as 2014. So another year farther away if you believe that.

This does make some sense to me (an "xbox tv" set top box without the hardcore gaming hardware makes a lot of sense imo), but on the whole those microsoft nerd rumors now seem less credible (especially since some MS guy laughed at them on twitter).
 
So GAF then extrapolated that Xbox Loop is more like "XBox TV" and real next Xbox then is Xbox PU ("product update"), which is listed as 2014. So another year farther away if you believe that.

This does make some sense to me (an "xbox tv" set top box without the hardcore gaming hardware makes a lot of sense imo), but on the whole those microsoft nerd rumors now seem less credible (especially since some MS guy laughed at them on twitter).

The idea of AI core, physics core, graphics core sort of gives away the fact that MSnerd doesn't know wtf he is talking about.
 
Status
Not open for further replies.
Back
Top