What will NEXT generation consoles do about bandwidth?

Rangers

Legend
The most crucial/interesting design issue with the PS3/360 imo is that they are limited to 128 bit busses.

Apparantly this is because they are pin limited, the die MUST be a certain size or larger to support the 256 bit pinout. The same reason that you never see 256 bit busses in truly budget PC cards either. These consoles are headed to $99 one day and thus cannot afford a die that cannot shrink very much over time.

But NEXT gen, PS4/Xbox3, the problem will only grow. The 128 bit limit will apply no matter the process size.

How do you get around it? My ideas:

1. EDRAM. I think this might become more attractive because as process shrinks occur, you can dedicate more and more transistors to EDRAM without it becoming unduly costly. In the future if you have say, a one billion transitor chip, dedicated 200 million of those to EDRAM will not be as big a deal. I think a sweet spot needs to be found, perhaps providing enough space for 1080P with a compromise of 2XAA without tiling. It seems to me that the 360 has shown us tiling should be avoided if possible.

2. Use 128 bit bus, and simply scale memory throughput with memory speeds only. GDDR4 is already at 1GHZ, whereas the 360/PS3 are only around 700 mhz. This will limt you somewhat, but perhaps it will be "enough" memory bandwidth?

3. Some new serial memory technology to gain fast speeds without a 256 bus. No idea what this might be, but something like Rambus?

To me by far the most elegant solution is actually going to be the EDRAM, at least on the surface. Any thoughts?
________
Ecigarette forum
 
Last edited by a moderator:
Here you'll find some methods that Intel may use to tackle the bandwidth problem

By 2015 Rattner predicted that Intel CPUs would have 10s or 100s of cores on each die, which in turn would require a lot of memory bandwidth. The problem with memory bandwidth at that level is that you effectively become pin limited, you can't physically have enough pins leaving your microprocessor to allow for a wide enough memory bus delivering the sort of bandwidth necessary to feed those 10s or 100s of cores.

memorybandwidthchallengebw3.jpg


One solution that Rattner presented was 3D die and wafer stacking. Normally microprocessor circuits are laid out in a flat 2D surface, as the name implies 3D die and wafer stacking builds on top of that, literally.

First let's talk about wafer stacking; wafer stacking involves stacking two identically sized/shaped wafers on top of each other, and using through-silicon vias (interconnects) to connect the top wafer layer to the bottom layer. The best example of an application of this would be a DRAM wafer sitting on top of a CPU wafer, meaning that you would have memory (not cache, that would still be inside your CPU) sitting directly on top of your CPU.

mem3dwaferstackingto0.jpg


With wafer stacking, instead of having hundreds or thousands of pins between your CPU and main memory, you have 1 - 10 million connections between your CPU and memory, directly increasing memory bandwidth. What's interesting is that this method of stacking could also mean the end of external memory.

Die stacking is another possibility, where you could stack multiple different sized die on top of the CPU core logic, that die could also be DRAM as well as Flash memory or anything else really. Intel showed off an 8 layer configuration using die stacking, which according to Intel is a very realistic option.

mem3ddiestackingaf9.jpg


Rattner was fairly confident in the potential with die and wafer stacking, so it's a technology that we'll definitely have to keep an eye on as time goes on. There are definitely limitations to consider, such as power and thermal dissipation, but there are solutions in the works for that as well (e.g. nanoscale thermal pumps).
http://www.anandtech.com/tradeshows/showdoc.aspx?i=2367&p=3

...
Our own Johan De Gelas caught up with Justin Rattner after his keynote to get some more information about stacked die and wafer technology:

1) Current Intel research estimates that about 256MB of memory can be stacked on top of a CPU (die stacking). A huge latency reduction is the result, but if you need more memory you have to go off die of course.

2) Different thermal expansion between the layers might of course ruin the chip. Intel is looking into this but Justin believes that it is not going to stop the stacked die show.

3) Right now stacked die is obviously in its infancy, they still have to move to the next step: one memory chip on top the other.

We're quite excited about the possibility of stacked DRAM although it will definitely be a long time before we see it productized.
http://www.anandtech.com/tradeshows/showdoc.aspx?i=2368
 
Firstly, eDRAM is not going to help at all. eDRAM is used as solution in the 360 only for framebuffer bandwidth, not for general memory bandwidth. It does ameliorate a bandwidth limitation between the Xenos AA logic and the framebuffer, but that only works because the framebuffer is so small (relative to the rest of memory). Putting large amounts of eDRAM onto a die for is simply not practical. The size of main memory relative to the size of the usable die space is just way too big for it to be effective, at least for any time in the forseeable future.

Next, I don't think 128-bit is a hard and fast limit; it's just not currently cost-effective. But it may become so in the future. Certainly it'll be cheaper to manufacture 256-bit buses far before large amounts of eDRAM.

Lastly, you should really read up on RAMBUS; it was designed with exactly this problem in mind. It uses a narrow serial interface to pump out small amounts of data at very high clock rates. It was invented with DRAM in mind, where every "generation" of memory was just incrementally adding more and more banks of memory and making the bus wider and wider. Ironically (or not), RDRAM has its own issues. Although it can deliver massive amounts of bandwidth, it also has serious issues with latency (it is a serial bus, after all) and power consumption. So, as with everything, there's a tradeoff involved there.

Serial memory interfaces are definitely a possibility for the future. They can be used to great effect in a closed system like a console. Look at how well the PS2 did with RDRAM being soldered directly to the board, thus shortening the bus length and minimizing some of the latency problems while still gaining massive bandwidth. Of course, this also added a lot of complexity to the entire bus, as all data had to be multiplexed and demultiplexed onto this narrow bus throughout the system.

Bottom line: manufacturers are always going to with whichever one ends up delivering the performance they need at the cheapest price. I think right at this moment a wide bus with lots of cheap DDR memory is the winner - but, as you said, widening the bus will increases the pin count and the cost, so the verdict is still out on which way the next-gen will go. It's too tough to tell if widening the bus more will be so much more expensive in the future that we will need to go back to serial interfaces.

FYI, I highly recommend you read the ArsTechnica article on memory technologies, particularly RAMBUS, located here.
 
I don't have the patience for re-iterating a by now very old discussion yet again, but current rambus memory tech does not have any particular issues with latency. If anything, it's said to actually be less of an issue than current DDR technology DRAM, to some degree due to the separate command and data buses rambus uses and general re-think of the whole memory interface thing, while with DDR, command and data are multiplexed on the same pins, and is a rather hodge-podgey solution full of legacy crap that limits performance.
 
Hey... with taller CPU packages, they can make heat sinks that fit around them exactly. :D
 
I don't have the patience for re-iterating a by now very old discussion yet again, but current rambus memory tech does not have any particular issues with latency. If anything, it's said to actually be less of an issue than current DDR technology DRAM, to some degree due to the separate command and data buses rambus uses and general re-think of the whole memory interface thing, while with DDR, command and data are multiplexed on the same pins, and is a rather hodge-podgey solution full of legacy crap that limits performance.
Fair enough, I haven't really kept up with the state of the art with regards to memory technologies lately. I'll do a search and see if I can dig up threads about it. But I do keep abreast of it enough to know that RAMBUS adoption does not seem to be too high. If the latency issues have been resolved, what's stopping widespread use of it? Is it simply the costs involved (licensing and manufacturing, in particular)?
 
What about heat issues with wafer stacking? Wouldn't that generate tremendous heat?

They mention something about it:

There are definitely limitations to consider, such as power and thermal dissipation, but there are solutions in the works for that as well (e.g. nanoscale thermal pumps).
 
Fair enough, I haven't really kept up with the state of the art with regards to memory technologies lately. I'll do a search and see if I can dig up threads about it. But I do keep abreast of it enough to know that RAMBUS adoption does not seem to be too high. If the latency issues have been resolved, what's stopping widespread use of it? Is it simply the costs involved (licensing and manufacturing, in particular)?

Well, there is a 64-bit XDR interface in the PS3...
 
The most crucial/interesting design issue with the PS3/360 imo is that they are limited to 128 bit busses.

Seriously, you are not ready yet to jump to the next generation ... imho, you should first learn to appreciate the technologies in the PS3 better. ;)
 
Back
Top