What solutions do you propose?
Nothing completely new, just bandwidth compression, and maybe something a bit more exotic..
On the GPU side we continue to see bandwidth needs increase with IHV's up to 384bit and 512bit memory buses. It seems, from a console perspective, we would be lucky to see a 256bit bus in the new consoles.
Don't forget that we will also have faster memories.
On the other side of the MB you have the CPU, which is also increasing in cores. I know Intel is expecting to have a wide number of memory controllers embedded.
That's another reason why I don't see EDRAM happening next gen; CPU and GPU will be even more close than they are now, possibly sharing more resources.
So we may continue to see an increase in GPU bandwidth needs (maybe not as much... but where is the cap where it begins to level off? Any suggestions nAo?) and on the CPU end we will see bandwidth needs increase both with core increases as well as clock increases.
I don't believe there's a cap, bandwidth requirements will indefinitely go up and up..
So what should be done about it?
Split pools? That hasn't been popular and could be costly. A unified pool is nice, but you could have some serious client contentions. A unified pool with an eDRAM scratchpad for certain tasks?
Given the fact that an unified pool is more desirable but not always attainable, I expect next gen console to mostly use unified pools, simply cause CPU and GPU will get someway together or very close architecture wise. They will need to cooperate more and more..
Though I don't see them on the same chip, unless we are talking about some kind of next gen underpowered console (Wii2)
joker454 said:
Why? Having dirt cheap alpha and free pixel side msaa is quite helpful.
The problem is that is not dirty cheap from a manufactoring stand point, and unless tomorrow we will have some kind of edram that is easy to embed with logic on the same die I think it's not worth the hassle (Arun don't cite zram!
)
Alpha to coverage, or rendering large alpha surfaces at low res like they were suggesting at PS3Devcon doesn't cut it in many cases, especially when you need to preserve texture detail. Being able to support lots of overdraw (for certain effects) is also very helpful.
As I already proposed many times on these forums for all the stuff you're worried about there's imho a relatively simple solution: small on chip buffer + wide internal bus + (CPU)tiling.
Supporting 32x32xmaxAA tiles on chip would be pretty amazing, I'd use something like that for all the bandwidth heavy compositing (particles or other fancy effects).. to be honest one could implement the last part of the REYES rendering pipeline with that..
For example supporting 128bits per pixel render targets + zbuffer + 8xAA would take just 160k on chip, probably an awfully small area on 32 nm chips (the difference here is that this memory wouldn't be EDRAM).
Moreover we (as game developers) relied on bandwidth to simulate a lot of stuff that can be done in other ways thanks to programmable hw.
Now that even edram doesn't cut it anymore (see Lost Planet, they needed to render particles to lower res buffers anyway) we just have to re-think that part of the pipeline.
I know a lot of games that use these tricks and 99% of the time ppl just don't notice the difference.
BTW..there's a very good article about these techniques on the last GPU GEMS3.
Plus it's not like we're gonna see another jump in resolution next generation, so it should be possible to get enough edram in there to avoid tiling in a few years.
Maybe, but at what cost? Moreover I don't want some EDRAM that doesn't allow me to read back data without resolving to an external buffer that most of the time just defy the original purpose of having EDRAM.
I empirically found that for everything that is not using tons of alpha blending is already very difficult to be bandwidth limited on next gen consoles.
An exception are trivial downsampling passes or stuff like that, otherwise a lot of bandwidth go wasted.
Maybe we shouldn't ask for more bandwidth, we just learn how to spread its usage across the whole frame (but I don't know how to do that..
)