Love the enthusiasm here. 14 pages and going!
Last edited:
That's unique to the eDRAM block in 360. In PS2 there's no processing logic in the eDRAM, only storage.I asked because on X360 it works different, because ROP blocks is inside EDRAM chip.
You've got it completely backwards! PS2 has more processing logic with the embedded memory. Rasterization, texture sampling, render output, and more, all on the eDRAM die! Wow!That's unique to the eDRAM block in 360. In PS2 there's no processing logic in the eDRAM, only storage.
What do you mean by "other side of the bus"? What bus are we referring to?Aren't they on the other side of the bus? XB360 had 256 GB/s internal BW to ROPs so they didn't impact the eDRAM BW. AFAIK 360 was unique in that respect. It certainly wasn't raised regards PS2 in the Great Bandwidth PR Wars of the early 21st Millenium .
The eDRAM <> 'pixel pipes' bus. I envisioned PS2 being eDRAM 'die' and logic 'die' (could be on the same silicon) like a typical GPU with eDRAM taking the place of the VRAM and all processing happening on the logic side. Envisioned is a strong word - I've not really given it much thought.What do you mean by "other side of the bus"? What bus are we referring to?
I'm sure Liandry will be asking for those perfect BW amounts when they're noext online.On the PS2, that eDRAM is on the same die as multiple major graphical functions, and both the texture-mapping and render-output hardware has "perfect" access to it. On the 360, it was spun off as a separate die with just the ROPs, and only the render-output hardware has "perfect" access to it.
Great picture and explanation. Only isn't there should be arrow to EDRAM and one from it?The 360's ROPs are on the same chip as the eDRAM, but the eDRAM itself doesn't do processing.
I have a lot question. But of course thank to all who answers.Love the enthusiasm here. 14 pages and going!
That's true. What does it mean exactly, "Perfect"? Also. Why X360 EDRAM needs so many internal bandwith? It's 8 times more than Main die - EDRAM die bandwith. 4xAA need only 4 times more bandwith.I'm sure Liandry will be asking for those perfect BW amounts when they're noext online.
The logic is inside the eDRAM in that diagram. You'd need the diagram to have further details showing ROPs and Texture Samplers and the BW they have to the eDRAM.Great picture and explanation. Only isn't there should be arrow to EDRAM and one from it?
That's the exact amount needed for the ROPs to do their work reading directly from the eDRAM. For comparison, GPU's have TB/s BW to their internal caches.That's true. Why X360 EDRAM needs so many internal bandwith?
I'm sure Liandry will be asking for those perfect BW amounts when they're noext online.
The image is simply showing what's happening while pixels are being drawn.Great picture and explanation. Only isn't there should be arrow to EDRAM and one from it?
It means that the bus supplies exactly the amount of access that an I/O component can make use of at peak throughput.That's true. What does it mean exactly, "Perfect"?
4xAA quadruples the rate, but depending on the operation the ROPs are also reading the existing color and/or depth or whatever, which further doubles the required size of the bus.Also. Why X360 EDRAM needs so many internal bandwith? It's 8 times more than Main die - EDRAM die bandwith. 4xAA need only 4 times more bandwith.
Also great disscussion.
Must be on the same silicon, lest our eDRAM become mere DRAM.(could be on the same silicon)
(2*(32+32))*4*8*500*10^6 b/sI'm sure Liandry will be asking for those perfect BW amounts when they're noext online.
No.The logic is inside the eDRAM in that diagram. You'd need the diagram to have further details showing ROPs and Texture Samplers and the BW they have to the eDRAM.
Xbox 360 GPU only had to send one result per pixel shader invocation to the eDRAM die. The bus between the GPU and eDRAM didn't have to be as wide as the eDRAM bandwidth. The eDRAM die replicates the pixel to all affected MSAA samples (multiple writes per output pixel) and does the blending (read existing eDRAM contents + blend it + write back to eDRAM). IIRC the total eDRAM bandwidth is equal to 4xMSAA + alpha blend (4x 32 bit reads + 4x 32 bit writes per received pixel). Thus eDRAM bandwidth is never a bottleneck.Aren't they on the other side of the bus? XB360 had 256 GB/s internal BW to ROPs so they didn't impact the eDRAM BW. AFAIK 360 was unique in that respect. It certainly wasn't raised regards PS2 in the Great Bandwidth PR Wars of the early 21st Millenium .
You have mentioned this number before, but where did it came from? I've tried to find it but can't find anyhyng. Can you explain it?8192-bits per cycle peak, something I mentioned earlier. Something you definitely wouldn't see if communication were to a separate die. My guess is that 8192-bits is an entire DRAM row.
Oh, ok.The image is simply showing what's happening while pixels are being drawn.
Then, does it mean what almost all PS2 EDRAM bandwith are used?So if you have a ROP that can make use of 2GB/s read and 2GB/s write, that ROP will have 2GB/s read and 2GB/s write dedicated to it. If you have ten of those ROPs, the ROPs will have a bus to eDRAM consisting of 20GB/s read and 20GB/s write.
Ok, I understood. But on PS2 it's different or mostly the same?4xAA quadruples the rate, but depending on the operation the ROPs are also reading the existing color and/or depth or whatever, which further doubles the required size of the bus.
IIRC the total eDRAM bandwidth is equal to 4xMSAA + alpha blend (4x 32 bit reads + 4x 32 bit writes per received pixel). Thus eDRAM bandwidth is never a bottleneck.
You have mentioned this number before, but where did it came from? I've tried to find it but can't find anyhyng. Can you explain it?
Yes, of courseDepth read and write too, right?
Yes, you had to optimize the eDRAM bank accesses to maximize the BW. I vaguely remember writing this code for our engine (also for post processing), but unfortunately I can't remember any more detailsPS2 EDRAM is something I know a little bit about. There was some info on the PS2 dev forums about how to use tall sprites of a certain width and screen alignment that would perfectly match the internal read/write caches in the EDRAM.
Old ATI hardware had also a small on-chip memory to store HiZ data. This is needed as HiZ is checked before the GPU invokes any pixel shader threads (early out large batches of pixels).
I found an old B3D article describing Xenos and eDRAM. Nothing further to say:
eDRAM: https://www.beyond3d.com/content/articles/4/4
HiZ: https://www.beyond3d.com/content/articles/4/5