On a similar simple scenario we know GDDR5 can reach 90-91% of the peak BW so roughly 160GB/s from 176GB/s in real ideal case.
Did anyone have more context on this (particularly the bar on the far right):
On a similar simple scenario we know GDDR5 can reach 90-91% of the peak BW so roughly 160GB/s from 176GB/s in real ideal case.
Did anyone have more context on this (particularly the bar on the far right):
Did anyone have more context on this (particularly the bar on the far right):
We've been over this. The architects said with 145GB/s scenario with the esram by measuring real games. This is not some benchmark test to max out the bandwidth by using some bogus code, so if you are making a comparison on game code on the X1 vs synthetic code on the PS4 I don't see how that's a good comparison.
The GPU numbers seem lower than 90% utilization, but since the GPU is more than ROPs that isn't unexpected.
90% figure for a ROP test, or more generally?mmhm... so where does this 135GB/s fit really when there's a 90% figure for PC cards?
You're completely forgetting that you do more cycles in the same time frame.you need more cycles for one read on dram. GDDR5 is even worse than DDR3. there are cycles that are not effective at all. the higher clock rate only helps to reduce the latency, but not the needed cycles.
That doesn't make any sense. Higher clock rates mean you get to plow through your data before you queue up, and you also have much better throughput.higher clock rate means for the GDDR5/DDR3 ram that more cycles get lost.
only if you read or write large chunks you can use the cycles a bit more efficient on dram. and yes the latencies are much lower on the esram, which means the bandwidth can be used more effective (small reads/writes).
[strike]the higher clock-rate makes the latency of GDDR5 and DDR3 almost equal (latency not bandwidth), but the esram has still much better latency than DDR3. but latency is only needed, where many small operations are done. GDDR5 is good for big things (like textures) that are not accessed frequently. And if you often change between read and write you even loose more cycles, which again means loss of bandwidth.
And because MS has used DDR3 with lower bandwidth than the GDDR5 on the PS4, they can't afford loosing any bandwidth on those ineffective cycles. so they use esram to compensate so many of the small operations can be done in the small esram where they don't harm the bandwidth.
please think of it a moment. You have small buffers on the gpu, that must be filled with data. that means small reads/writes. the 32Mb was only the render target. on ps4 it might be a little bit bigger, because you "only" have one memory pool but the situation is still the same. if the render target is not spread all over the memory you can never reach the esram bandwith. worst case would be your render target is just in one physical memory-module (512 MB of memory is one module). So you would be limited to 11 GB/s max. only way to get it faster is to spread into all memory-modules so you can reach theoretically the max bandwidth. but now you have even smaller chunks which reduce the effectiveness of dram memory tricks and you are actually loosing more bandwidth.
I don't say it is the holy grail, but the esram is really, really fast for it's size.
and all I'd said in my last post, was, that the bandwidth is not the limiting factor on xbone development so far.
)
Not an memory expert, but your numbers seem wrong.Strange said:To put in context
GDDR5 timings as provided by Hynix datasheet: CAS = 10.6ns tRCD = 12ns tRP = 12ns tRAS = 28 ns tRC = 40ns
5500Mhz cycle time is 181 nanoseconds. Even if you add all the latencies above together, you won't go over the cycle time.
Not an memory expert, but your numbers seem wrong.
1 sec = 1,000,000,000 ns
5500Mhz = 5,500,000,000 Hz
1 sec / 5500Mhz gives, 0.18ns a cycle, I think you are way off by 1000X, but it's been a long day at coding and my brain is pretty mushy.
Also that GDDR5 is quad pumped, so in reality it's running at 1375Mhz, not sure what the implication is, just throwing it out there.
In general I've always found overly simplified models are not very helpful at understanding the reality of things at work, so I'll leave it at that.
Why do you insist that the system cannot spread the load across all the memory modules/pins???
That's what the RAMa nd bandwidth is there for. Honestly, what else will you do with it? Stream 60 GB/s of audio data? Process 60 GB/s of AI entities and physics objects?I am not sure why one would want to store any graphics buffer into main memory and read. It would eat a large portion of the system wide bandwidth.
Had edram been ready at the 28nm node at gloflo or tsmc during xo development timeline, and given that you can fit approx 3x as much edram in the same space as 8t sram, do you think MS would have gone with a full +90MB (same real estate allocation) and maintain the same chip size or allocate more towards CU? Or shrunken the size and save on chip costs? Or something in the middle.
Maybe 2-4 more CU, and a 50-60MB edram cache, giving developers a much heftier scratch pad for buffers and other high badwith assets.
What ifs are so much fun.