Memory bandwidth vs memory amount *spin off*

If a console was to use an interposer for memory attachment/stacking would it be worthwhile for them to include embedded RAM in the interposer itself?
 
That's true, but none of the current PC or console GPUs work like that.

GPU accessible read/write EDRAM would practically nullify all the bandwidth costs of the deferred g-buffer generation/sampling and post process rendering (etc full screen effects that are consumed later in the pipeline). And it would be great for GPU compute. However it wouldn't nullify the bandwidth cost of shadow maps, unless you had huge amount of EDRAM. A single 4096x4096 shadow map atlas takes 64 MB, and even that isn't enough if you want to have above 720p rendering with good shadow map quality.

---

Last weekend I bumped into an article of Sequoia. It's the new #1 super computer in the TOP500 list. It doubles the performance of the previous champ, and consumes almost 40% less power. The most interesting thing is that it uses EDRAM to reach high memory bandwidth, and the PowerPC A2 CPU is basically a spiritual successor for Xenon. It has in-order execution, powerful vector units, lots of cores (Xenos had the highest core and thread count when it was released) and SMT/hyperthreading (four way this time).

16 cores, 4 threads per core = 64 threads per CPU. Each CPU has double channel DDR3-1333 memory bus and 32 MB of EDRAM. This is an interesting design if we analyze its memory performance. Large chunk of EDRAM gives it very fast local work memory. Compared to Cell SPU local stores (256 KB) the EDRAM is 128x larger. That's a huge deal, and allows you to run much wider selection of algorithms inside the fast local work memory. The main memory bus isn't wide, but the four way SMT provides the chip with good memory latency hiding capacity. Low 1.6 GHz CPU clock also means that memory latency (in cycles) remains low. Put 1.6 million of these processing cores to a same room, and you get nice chunk of processing power (and nice amount of combined EDRAM bandwidth) :)


Do you have the link to that article ?

Those 16 cores, 4 threads per core, vector units sounds like VTE's ?! Very interested to read that article :)
 
Back
Top