Seriously they only now figured it can do read+write at the same time?
That would effectively double the bandwidth. That is not what is being described here, especially the "133GB/s during FP16x4 blend".
Seriously they only now figured it can do read+write at the same time?
That would effectively double the bandwidth. That is not what is being described here, especially the "133GB/s during FP16x4 blend".
What do you mean in the bolded part? Sorry my blurred vision but I dont understand if you mean the ESRAM is only connected to the GPU or is it physically in another part of the die compared to where it is expected it to be?If the physical interface is 128 bytes wide at 800 MHz, that would be the bandwidth the ports can provide.
The blending scenario could allow for queuing logic has some kind of forwarding or coalescing capability. Within the length of the read/write queue, detecting a read to a location with a pending write could allow the read to source from the queue and allow the next operation to issue. A small amount of read buffering could also provide a secondary read location within a small window.
Maybe it can also combine contiguous writes, but that might be unnecessary complication since no client can send enough write data to tell the difference.
The GPU's read capability is wider than its write port, so forwarding or coalescing reads in tandem with another access actually have additional data paths that can carry the data.
This brings back my earlier questions as to where in the memory subsystem the eSRAM hooks in.
I knew the ESRAM is flexible/accessible enough in regards to read/write by the GPU.They havent increased performance. They have changed how they calculate things.
Seriously they only now figured it can do read+write at the same time?
Even within the cache/memory subsystem of the GPU, there would be different parts of the pipeline it could fit into, like whether it's sitting behind the spot where there would otherwise be a standard graphics memory controller or if it's sitting in front of the control logic.What do you mean in the bolded part? Sorry my blurred vision but I dont understand if you mean the ESRAM is only connected to the GPU or is it physically in another part of the die compared to where it is expected it to be?
At the end of the day if Mark Cerny did the math and worked out that 1TB of bandwidth from EDRAM wouldn't offset the advantages of a pure DDR5 solution, then what is the extra 88% worth really? Other than trying desperately to conjure some magic good will juju.
DF has more about XBO here.
Now that close-to-final silicon is available,
Microsoft has revised its own figures upwards significantly, telling developers that 192GB/s is now theoretically possible.
Apparently there is no suggestion of any downclock. It's still 800MHz for the people making X1 games.
Also this is not really the eDRAM solution Cerny talked about.You completely dismiss the possibility that Cerny in fact isn't omnipotent, godlike person and might actually make a wrong choice in terms of pure performance and other factors like cost etc?
It sounds like the news about the ESRAM is being misinterpreted in the Eurogamer article. If MS is claiming simultaneous read/write operations in certain instances yielding a theoretical total bandwidth of 192 GB/s, that means unidirectional bandwidth would be 96 GB/s (i.e., 192 / 2 = 96). This would be a downgrade from 102 GB/s unidirectional (i.e., specifically a 50 MHz downgrade from GPU/ESRAM originally clocked at 800 MHz down to 750 MHz).
I suspect, considering they say a theoretical 190GB/s is possible, but have given an alpha blend example that reaches 133GB/s, it is some kind of function or set of functions, or a way of ordering them, that they hadn't investigated properly that allows a read and write to happen at the same time, in the ESRAM exclusively. A blend operation seems close to an ideal case scenario, so it remains to be seen how much more than the 33GB improvement is possible, but it may well be that they are aware of this, and therefore indicated 68GB+133GB is 201GB per second, which fits the over 200GB/s bandwidth figure that had been given earlier.
I am buying it.
What can I say.., that's fascinating stuff.Some random math I thought up for the scenario put forward in the article.
Let's assume there's some kind of fowarding going on from the queues in the eSRAM's arbitration and scheduling logic.
How to get from a supposed max of 102.4 GB/s of the interface, but also take into consideration that a pure doubling is 204.8 GB/s?
If it's forwarding:
Assume there's some portion of the time that has to be a pure write to eSRAM, limited by the 102.4 GB/s write path maximum.
Assume that the memory subsystem can handle additional requests because it can split its attention between eSRAM and DRAM, and assume the queues can actually use that spare capacity for eDRAM.
The interface=102.4 GB/s
Double that, if the so-called simultaneous read+write were true = 204.8 GB/s.
The theoretical maximum in the article = 192 GB/s.
How to get this?
Assuming forwarding.
Some fraction of the cycles is write cycles to the queue and limited by the GPU's write bandwidth.
The other fraction is full traffic with additional read requests serviced by forwarding, leading to an internal doubling.
(204.8*x+102.4)/(x+1)=192
204.8*x+102.4=192*x+192
204.8*x=192*x+89.6
204.8*x-192*x=89.6
(204.8-192)*x=89.6
12.8*x=89.6
7=x
It's early for me to think about this, but it sounds like 1/8 of the time, you can set up data you forward, and then you can send reads to the eSRAM proper and another portion is automatically forwarded. This leads to double the bandwidth in those cycles, although it would require more than one client is reading at a time, since no single requestor can take all that bandwidth. (edit: Or half read, half write if they can combine.)
At some point, the write has to file itself to the eSRAM, and during that phase the GPU can queue up the next round of fowarded data (every 8th cycle?). Other scenarios are possible, but that would be the big-number shot.