Nick Baker: Over that interface, each lane - to ESRAM is 256-bit making up a total of 1024 bits and that's in each direction. 1024 bits for write will give you a max of 109GB/s and then there's separate read paths again running at peak would give you 109GB/s. What is the equivalent bandwidth of the ESRAM if you were doing the same kind of accounting that you do for external memory... With DDR3 you pretty much take the number of bits on the interface, multiply by the speed and that's how you get 68GB/s. That equivalent on ESRAM would be 218GB/s. However, just like main memory, it's rare to be able to achieve that over long periods of time so typically an external memory interface you run at 70-80 per cent efficiency.
The same discussion with ESRAM as well - the 204GB/s number that was presented at Hot Chips is taking known limitations of the logic around the ESRAM into account. You can't sustain writes for absolutely every single cycle. The writes is known to insert a bubble [a dead cycle] occasionally... One out of every eight cycles is a bubble, so that's how you get the combined 204GB/s as the raw peak that we can really achieve over the ESRAM. And then if you say what can you achieve out of an application - we've measured about 140-150GB/s for ESRAM. That's real code running. That's not some diagnostic or some simulation case or something like that. That is real code that is running at that bandwidth. You can add that to the external memory and say that that probably achieves in similar conditions 50-55GB/s and add those two together you're getting in the order of 200GB/s across the main memory and internally.
One thing I should point out is that there are four 8MB lanes. But it's not a contiguous 8MB chunk of memory within each of those lanes. Each lane, that 8MB is broken down into eight modules. This should address whether you can really have read and write bandwidth in memory simultaneously. Yes you can there are actually a lot more individual blocks that comprise the whole ESRAM so you can talk to those in parallel and of course if you're hitting the same area over and over and over again, you don't get to spread out your bandwidth and so that's why one of the reasons why in real testing you get 140-150GB/s rather than the peak 204GB/s is that it's not just four chunks of 8MB memory. It's a lot more complicated than that and depending on how the pattern you get to use those simultaneously. That's what lets you do read and writes simultaneously. You do get to add the read and write bandwidth as well adding the read and write bandwidth on to the main memory. That's just one of the misconceptions we wanted to clean up.
Andrew Goossen: If you're only doing a read you're capped at 109GB/s, if you're only doing a write you're capped at 109GB/s. To get over that you need to have a mix of the reads and the writes but when you are going to look at the things that are typically in the ESRAM, such as your render targets and your depth buffers, intrinsically they have a lot of read-modified writes going on in the blends and the depth buffer updates. Those are the natural things to stick in the ESRAM and the natural things to take advantage of the concurrent read/writes.
Digital Foundry: So 140-150GB/s is a realistic target and you can integrate DDR3 bandwidth simultaneously?
Nick Baker: Yes. That's been measured.