Questions about PS2

That's unique to the eDRAM block in 360. In PS2 there's no processing logic in the eDRAM, only storage.
You've got it completely backwards! PS2 has more processing logic with the embedded memory. Rasterization, texture sampling, render output, and more, all on the eDRAM die! Wow! :runaway:
 
Aren't they on the other side of the bus? XB360 had 256 GB/s internal BW to ROPs so they didn't impact the eDRAM BW. AFAIK 360 was unique in that respect. It certainly wasn't raised regards PS2 in the Great Bandwidth PR Wars of the early 21st Millenium .
 
Aren't they on the other side of the bus? XB360 had 256 GB/s internal BW to ROPs so they didn't impact the eDRAM BW. AFAIK 360 was unique in that respect. It certainly wasn't raised regards PS2 in the Great Bandwidth PR Wars of the early 21st Millenium .
What do you mean by "other side of the bus"? What bus are we referring to?

Both systems use eDRAM to enable some aspects of GPU I/O to be unconstrained by access to their memory pool. On the PS2, that eDRAM is on the same die as multiple major graphical functions, and both the texture-mapping and render-output hardware has "perfect" access to it. On the 360, it was spun off as a separate die with just the ROPs, and only the render-output hardware has "perfect" access to it.
 
What do you mean by "other side of the bus"? What bus are we referring to?
The eDRAM <> 'pixel pipes' bus. I envisioned PS2 being eDRAM 'die' and logic 'die' (could be on the same silicon) like a typical GPU with eDRAM taking the place of the VRAM and all processing happening on the logic side. Envisioned is a strong word - I've not really given it much thought. ;)

On the PS2, that eDRAM is on the same die as multiple major graphical functions, and both the texture-mapping and render-output hardware has "perfect" access to it. On the 360, it was spun off as a separate die with just the ROPs, and only the render-output hardware has "perfect" access to it.
I'm sure Liandry will be asking for those perfect BW amounts when they're noext online. ;)
 
The 360's ROPs are on the same chip as the eDRAM, but the eDRAM itself doesn't do processing.
Great picture and explanation. Only isn't there should be arrow to EDRAM and one from it?

Love the enthusiasm here. 14 pages and going!
I have a lot question. But of course thank to all who answers. ;)

I'm sure Liandry will be asking for those perfect BW amounts when they're noext online.
That's true. What does it mean exactly, "Perfect"? Also. Why X360 EDRAM needs so many internal bandwith? It's 8 times more than Main die - EDRAM die bandwith. 4xAA need only 4 times more bandwith.
Also great disscussion. :D
 
Great picture and explanation. Only isn't there should be arrow to EDRAM and one from it?
The logic is inside the eDRAM in that diagram. You'd need the diagram to have further details showing ROPs and Texture Samplers and the BW they have to the eDRAM.

That's true. Why X360 EDRAM needs so many internal bandwith?
That's the exact amount needed for the ROPs to do their work reading directly from the eDRAM. For comparison, GPU's have TB/s BW to their internal caches.
 
The 360 had a limited amount of logic on the edram daughter die to handle z/colour buffer write/ blending. IIRC reading from the colour and depth buffers required them being written out to main memory, which was very fast but was normally done as a single complete action. The main GPU die couldn't read from the z-buffer on a per pixel basis unless it was copied out to main memory, again iirc.
 
Great picture and explanation. Only isn't there should be arrow to EDRAM and one from it?
The image is simply showing what's happening while pixels are being drawn.

A more complete picture of the 360's GPU would show an arrow leaving the daughter die... so that after finishing drawing to a render target, you can copy it out to main RAM. Otherwise obviously you'd never be able to actually use or display anything that you rendered.

That's true. What does it mean exactly, "Perfect"?
It means that the bus supplies exactly the amount of access that an I/O component can make use of at peak throughput.

So if you have a ROP that can make use of 2GB/s read and 2GB/s write, that ROP will have 2GB/s read and 2GB/s write dedicated to it. If you have ten of those ROPs, the ROPs will have a bus to eDRAM consisting of 20GB/s read and 20GB/s write.

Also. Why X360 EDRAM needs so many internal bandwith? It's 8 times more than Main die - EDRAM die bandwith. 4xAA need only 4 times more bandwith.
Also great disscussion. :D
4xAA quadruples the rate, but depending on the operation the ROPs are also reading the existing color and/or depth or whatever, which further doubles the required size of the bus.

(could be on the same silicon)
Must be on the same silicon, lest our eDRAM become mere DRAM. :no:

(I really should have drawn a box around all of the PS2 stuff to make it less inconsistent with the 360 diagram):

bsxsLw9.png


I'm sure Liandry will be asking for those perfect BW amounts when they're noext online. ;)
(2*(32+32))*4*8*500*10^6 b/s

=

238 gigabuckets

The logic is inside the eDRAM in that diagram. You'd need the diagram to have further details showing ROPs and Texture Samplers and the BW they have to the eDRAM.
No.

In the PS2 case, the TMU and ROP logic is inside the block marked "GS."

In the 360 case, the TMU logic (and most of the GPU) is inside the block marked "Xenos", while the ROPs are inside the block marked "ROPs."

The blocks marked eDRAM are DRAM.
 
Aren't they on the other side of the bus? XB360 had 256 GB/s internal BW to ROPs so they didn't impact the eDRAM BW. AFAIK 360 was unique in that respect. It certainly wasn't raised regards PS2 in the Great Bandwidth PR Wars of the early 21st Millenium .
Xbox 360 GPU only had to send one result per pixel shader invocation to the eDRAM die. The bus between the GPU and eDRAM didn't have to be as wide as the eDRAM bandwidth. The eDRAM die replicates the pixel to all affected MSAA samples (multiple writes per output pixel) and does the blending (read existing eDRAM contents + blend it + write back to eDRAM). IIRC the total eDRAM bandwidth is equal to 4xMSAA + alpha blend (4x 32 bit reads + 4x 32 bit writes per received pixel). Thus eDRAM bandwidth is never a bottleneck.

Textures (and other data such as vertices) are read from the main memory (unified GDDR3). eDRAM can resolve (copy) data to the main memory. This is needed for multipass rendering.
 
8192-bits per cycle peak, something I mentioned earlier. Something you definitely wouldn't see if communication were to a separate die. My guess is that 8192-bits is an entire DRAM row.
You have mentioned this number before, but where did it came from? I've tried to find it but can't find anyhyng. Can you explain it?

The image is simply showing what's happening while pixels are being drawn.
Oh, ok. :D

So if you have a ROP that can make use of 2GB/s read and 2GB/s write, that ROP will have 2GB/s read and 2GB/s write dedicated to it. If you have ten of those ROPs, the ROPs will have a bus to eDRAM consisting of 20GB/s read and 20GB/s write.
Then, does it mean what almost all PS2 EDRAM bandwith are used?

4xAA quadruples the rate, but depending on the operation the ROPs are also reading the existing color and/or depth or whatever, which further doubles the required size of the bus.
Ok, I understood. But on PS2 it's different or mostly the same?
 
You have mentioned this number before, but where did it came from? I've tried to find it but can't find anyhyng. Can you explain it?

Like I said before. It's in a Sony document.

The eDRAM doesn't have three buses. It has one bus that's 8192-bits wide. That bus is connected to two different 8KB buffers. One for pixel accesses and one for texture accesses. The 1024-bit and 512-bit buses connect to those buffers. Not the eDRAM.

I have no idea if host accesses go through the pixel buffer or not. I would expect not.
 
PS2 EDRAM is something I know a little bit about. There was some info on the PS2 dev forums about how to use tall sprites of a certain width and screen alignment that would perfectly match the internal read/write caches in the EDRAM. I used this as the basis of a full-screen post processing system. What I recall was that the pre-launch marketing numbers regarding EDRAM bandwidth suggested that it should be possible to do 60 fullscreen passes at 60Hz. I assumed the numbers were BS. But, I was able to get 50 passes @ 60 Hz out of the aligned sprite system. I don't recall the Gb/sec. But, I think "fullscreen" here was 640x448.

You can see the post-processing system in action in the worst game I ever worked on: "Charlie and the Chocolate Factory for the PS2". Notice that everything has a vaseline-on-the-lens blur to make it feel "magical". It's doing a 5-tap Gaussian downsample-and-blur recursively all the way down to 1 pixel followed by a recursive upsample-and-blend with variable weights at each upsample stage. Effectively it does work equivalent to 6 fullscreen passes. But, it uses about 5% of the GS @ 30Hz. So, we just left it on all the time and only adjusted the weights for different effects. If could do a blur kernel that effectively stretched across the whole screen or did nothing, or any frequency distribution in between just by adjusting the upsample weights. I'm proud to say that was my only contribution to that game ;) Getting an equivalent effect out of the Xbox version was much harder than the PS2 version. I believe corners were cut.

There was also a trick where you could have the sprites copy certain bits from the depth buffer into the alpha channel of your color buffer. Effectively you could copy specifically from the "green" channel (bits 8-16) of depth. By using the color buffer destination alpha instead of a constant as the final upsample blend factor of the blur, you could achieve a decent depth-of-field on the PS2 super cheap (again 10% of a 60Hz frame) or you could just blend in a constant color for colored depth fog. We used both of those techniques in the much better game "Hunter the Reckoning: Wayward". But, I can't find a good video to show it off...
 
Depth read and write too, right?
Yes, of course :)

Depth+stencil internal eDRAM bandwidth = 24(depth)+8(stencil) * 4(msaa) = 32 bits * 4 = 128 bits = 16 bytes. Both read and write.

Old ATI hardware had also a small on-chip memory to store HiZ data. This is needed as HiZ is checked before the GPU invokes any pixel shader threads (early out large batches of pixels).

I found an old B3D article describing Xenos and eDRAM. Nothing further to say:
eDRAM: https://www.beyond3d.com/content/articles/4/4
HiZ: https://www.beyond3d.com/content/articles/4/5
PS2 EDRAM is something I know a little bit about. There was some info on the PS2 dev forums about how to use tall sprites of a certain width and screen alignment that would perfectly match the internal read/write caches in the EDRAM.
Yes, you had to optimize the eDRAM bank accesses to maximize the BW. I vaguely remember writing this code for our engine (also for post processing), but unfortunately I can't remember any more details :)
 
Last edited:
Old ATI hardware had also a small on-chip memory to store HiZ data. This is needed as HiZ is checked before the GPU invokes any pixel shader threads (early out large batches of pixels).

I found an old B3D article describing Xenos and eDRAM. Nothing further to say:
eDRAM: https://www.beyond3d.com/content/articles/4/4
HiZ: https://www.beyond3d.com/content/articles/4/5

Just to make sure I'm getting this right: the HiZ data was generated by a Z prepass into the daughter die, that was then copied out to main memory, and then read back into the motherdie to and stored in the a dedicated HiZ buffer on the motherdie?

Is hierarchical z still used?
 
Back
Top