PS2 vs PS3 vs PS4 fillrate

Sony's GSCUBE had immense fillrate and aggregate bandwidth for all the eDRAM, each GS I-32 had 32 MB eDRAM.

The "I-32" Graphics Synthesizer was a custom variant that contained 32 MB of eDRAM instead of the typical 4 MB)

Specs for the first one, the "GSCUBE 16" or 16 Blade version shown at SIGGRAPH 2000.

http://en.wikipedia.org/wiki/GScube
http://www.assemblergames.com/forums/showthread.php?18036-GSCUBE-information

- CPU 128Bit Emotion Engine x 16
- System Clock Frequency 294.912MHz
- Main Memory Direct RDRAM
- Memory Size 2GB (128MB x 16)
- Memory Bus Bandwidth 50.3GB/s (3.1GB/s x 16)
- Floating Point Performance 97.5GFLOPS (6.1GFLOPS x 16)
- 3D CG Geometric Transformation 1.04Gpolygons/s (65Mpolygons/s x 16)
- Graphics Graphics Synthesizer I-32 x 16
- Clock Frequency 147.456MHz
- VRAM Size 512MB (embedded 32MB x 16)
- VRAM Bandwidth 755GB/s (47.2GB/s x 16)
- Pixel Fill Rate 37.7GB/s (2.36GB/s x 16)
- Maximum Polygon Drawing Rate 1.2 Gpolygons/s (73.7Mpolygons/s x 16)

- Display Color Depth 32bit (RGBA: 8 bits each)
- Z depth 32bit
- Maximum Resolutions 1080/60p (1920x1080, 60fps, Progressive)
- Merging Functions Scissoring

Alpha Test
Z Sorting
Alpha Blending

img14.jpg


img15.jpg


img16.jpg


The realtime Final Fantasy:The Spirits Within scenes done on GSCUBE were 60fps and much more impressive than the FF TSW scenes Nvidia did the following year on its NV20/GF3 based Quadro DCC using its shaders. Hardly fair though, of course, 16 EEs + 16 GS I-32s vs a single NV20-based chip with zero eDRAM.




Would've been interesting to see what the canceled 64 blade version of GSCube could've done. As well as the never-developed (or never seen) Graphics Synthesizer 2, especially in those workstations Sony was planning to build with lots of EE2 and GS2 chips.

I suppose EE3 in a sense became CELL (?) and the GS3 (or Visualizer?) was scrapped in favor of Nvidia and the RSX.
 
Last edited by a moderator:
Isn't that where high bandwidth caches come in though? The 290 doubles the the 7970GE's fill rate while only increasing bandwidth by 11% so there must be some benefit.
290 doubles the maximum theoretical fill rate, but cannot reach the doubled fill rate (due to BW limiations) if you are either using blending or are using wider than 32 bit render target. In a modern HDR rendering pipeline, the extra fill rate (or 64 ROPs) mainly speeds up shadow map rendering (and UI rendering) (*). A glance at fill rate benchmarks might tell you otherwise, but the real cause of improved fill rate in most tests is the 512-bit wide memory bus, not the 64 ROPs. 7970 GE was already BW bound in most fill rate test scenarios.

(*) You can actually also be fill bound (on 32 ROPs) if you have naively programmed g-buffer rendering using multiple 32 bit buffers. You should pack your stuff inside 64 bit buffers and double your fill rate (this way 32 ROPs are enough to saturate the BW).
According to the documentation you can do (A - B) * C + D, where A, B, and D are source/dest color or 0, and C is source/dest alpha or a constant.
I'm looking at the GS manual now and there at least isn't anything beyond what's called alpha blending. It's more flexible than the traditional LERP between source and dest using source alpha or a constant (which is all you get on say, Nintendo DS), and it at least has what's necessary for PS1 compatibility, but I wouldn't consider it particularly more advanced than just saying it has alpha blending.
The new dual source color blending modes in DirectX 10 are called... blending, and offer more possibilities than PS2 did. DirectX 11.1 added logic ops on top of that, and it's still just called blending. We are fighting about semantics here.

PS2 had obviously better blending/RMW hardware than other consoles at that time period. However PS2 blending hardware could do the DOT3 operation, a new feature in Xbox (the original one). This DX7 era feature allowed developers to implement fast per pixel lighting for the first time. It had it's own limitations, but you couldn't efficiently emulate that feature on PS2.
current GPU architecture evolves 5years before it gets released, it's design is frozen long before the software-tech is developed that gonna be used on it. it's a bet that it will be useful and that's why software solutions should be seen as 'made for the hardware' and if the hardware was PS2 alike, we'd write software as we did for ps2 (and maybe reyes). we'd solve other low hanging fruits as we did on ps2 and avoid problematic cases just like we do now in so many (other) cases.
In PS2-era it made sense to calculate just a single ALU operation per memory load and store. In current era that would be considered a huge waste of memory bandwidth. Back then you didn't need to hide memory latency by a long chain of ALU operations. ALU has become very cheap compared to bandwidth, and the trend does't seem to be slowing down. I don't think we will ever see GPU architectures like PS2 again. It just wouldn't make sense with the current processing technology bottlenecks. It's more efficient to crunch the pixel value inside registers (interleaving the reads you need) and finally write the value to the (slow) external memory. Deferred rendering does this twice per pixel, and the discussion is already going on about the future of it. The extra bandwidth cost of two passes might just not cut it in the future.
 

I seriously doubt that those images are not from the movie, rendered by PRMan, instead of a GS Cube demo.
The third one perhaps, and maybe even the first, but the second has proper hair on both characters...

Even today, it would be impossible to render that amount of geometry and textures with that image quality on any current GPU.
 
^Okay I am not saying they were indeed from GSCUBE, I don't know. However I do remember reading that the real time scenes that were done on GSCUBE (whatever they might've been) were from some of the less complex scenes of the film, and even then, they were still of a lower quality than that of the offline rendered movie.

The scenes done by Nvidia on NV20 / Quadro DCC were of even lower quality and at a much lower framerate, which only improved in fps a little as they moved to NV25 based Quadro but still not as impressive as what GSCUBE did or at 60fps.

I certainly cannot say those pics were of the GSCUBE version, I just assumed they might be.

BTW I should have posted the source of those supposed GSCUBE FFTSW images but I did not. They were from this Ars Technica article
 
Last edited by a moderator:

I was thinking a lot about what to reply... See, my background is in CG and not in programming, so I can't really dive into that aspect of renderers and hardware.

What I can tell you is that REYES and PRMan are outdated on their own, which is why Pixar has spent the last 5+ years working very, very hard on implementing raytracing without throwing out everything. They did a first try on Cars but they did get back to it for various reasons; Monsters University relies heavily on physically correct shading and raytraced global illumination, but the more important reason was that most of their clients, the big VFX houses, have badly needed it.

But as I've mentioned, VFX and feature animation abandoned the Reyes approach because CPUs became powerful enough to just brute force stuff and raytrace everything. All the various workaround solutions like shadow maps, dozens of hand-placed lightsources, elaborate reflection maps, ambient occlusion and such, were requiring a lot of artist time which eventually became far more expensive than a big render farm.

So this lead to ILM replacing PRMan with Arnold as their primary renderer. The best VFX vehicle this year - and the number one contender for the VFX Oscar - is Gravity from Framestore, also rendered in Arnold. Other studios rely on VRay, and some places use Maxwell which is an all-out physically correct renderer simulating everything about real light and surfaces.


Now sebbi came up with a pretty convincing argument about why deferred rendering is becoming more widespread in realtime engines - ALU capacity is easier to increase compared to memory bandwidth.
I would also point out that deferred rendering became the industry standard - it's used in UE, Cryengine, Frostbite, Killzone etc. - on the previous generation of hardware which wasn't developed with the tech in mind, at all. In fact, the X360's small EDRAM is a quite big problem, but still a lot of developers decided to go for it. So it was already becoming the future direction for realtime rendering, even by the time the X1/PS4 entered into the hw design stage. MS and Sony have only adapted to the industry's requirements.

It might not be too wise to lock all future development in this direction, but then again it is not happening either. As far as I know, one of the most promising looking game of the first gen titles, The Order, is using forward rendering - so it's not like deferred is the only option.
On the other hand the content seems to be built to use only a small number of characters and restricted environments, with very few light sources. We'll see about the new Uncharted game.

I also don't see how the Reyes architecture could be implemented in realtime engines using hardware acceleration. Sure, raytracing isn't a good fit either, which is why a lot of the actual tech is more like the old PRMan stuff - shadow maps, ambient occlusion, HDRI environment map probes and so on. But the hardware isn't really able to run the whole dicing and binning stuff that really gave PRMan its power.

All in all it'll be interesting to see where the PS5/Xwhatever is going to be moving. I don't really think that realtime raytracing will become feasible, possibly not until we have an unprecedented explosion in computing power and bandwidth - so it'll be on the developers and what they do with the current hardware in the next 5-6 years, instead.
 
I have saved a lot of the images from the FF movie many years ago... The movie was a very interesting beast, some of the assets had incredible quality for their time, but others had very visible UV seams and texture stretching and such. A lot of the software tools we take for granted today were not available and it shows. Even a small studio like us can create more detailed content in less time and render it at superior quality.

This explains some of the deficiencies in the images you've linked; the more important criteria would be stuff like the hair, or the thin stripe of shiny fluid at the bottom of the eyes which would be pretty damn aliased in any realtime rendering. Also, as I recall the scenes rendered on the GSCube were from the first few shots of the movie, where Aki was floating in her ship, on her own. And Square has never really released any images from that demo, so all the illustrations used by websites were from the general PR material of the movie which used actual shots rendered in PRMan.
 
The new dual source color blending modes in DirectX 10 are called... blending, and offer more possibilities than PS2 did. DirectX 11.1 added logic ops on top of that, and it's still just called blending. We are fighting about semantics here.

PS2 had obviously better blending/RMW hardware than other consoles at that time period. However PS2 blending hardware could do the DOT3 operation, a new feature in Xbox (the original one). This DX7 era feature allowed developers to implement fast per pixel lighting for the first time. It had it's own limitations, but you couldn't efficiently emulate that feature on PS2.

I don't want to dive further into semantics but I was pretty deliberate in my choice of calling it alpha blending and not just blending. That's what Sony calls it as well, and it's fitting given that the operations literally follow the alpha blending equation; any flexibility is in the operand selection. It's enough to allow emulation of PS1 blend modes and make multi-pass not completely useless, meaning it's a reasonable choice for Sony's specific goals but not much more.

As far as just blending goes, I can see that term being valid for any kind of operation involving the new and old pixels.

I'm surprised to hear PS2 had more advanced blending than its contemporaries given how much weaker the GPU was in most other ways. I guess subtractive blending wasn't a thing for nVidia at the time of XBox, although I think Gamecube had some form of it?
 
I have saved a lot of the images from the FF movie many years ago... The movie was a very interesting beast, some of the assets had incredible quality for their time, but others had very visible UV seams and texture stretching and such. A lot of the software tools we take for granted today were not available and it shows. Even a small studio like us can create more detailed content in less time and render it at superior quality.

This explains some of the deficiencies in the images you've linked; the more important criteria would be stuff like the hair, or the thin stripe of shiny fluid at the bottom of the eyes which would be pretty damn aliased in any realtime rendering. Also, as I recall the scenes rendered on the GSCube were from the first few shots of the movie, where Aki was floating in her ship, on her own. And Square has never really released any images from that demo, so all the illustrations used by websites were from the general PR material of the movie which used actual shots rendered in PRMan.

Well good to know then. At least that clears up what I was questioning.

Would be nice to one day see some of the images from some of the GSCUBE demos one day.
 
I don't want to dive further into semantics but I was pretty deliberate in my choice of calling it alpha blending and not just blending.
Yeah :). I use the following semantics myself: RMW (read-modify-write) to a render target = alpha blending. That seems to be the most common way to describe alpha blending nowadays (and includes also those funky use cases possible with "programmable alpha blending", such as per pixel OIT visibility function modification/sorting).
I'm surprised to hear PS2 had more advanced blending than its contemporaries given how much weaker the GPU was in most other ways. I guess subtractive blending wasn't a thing for nVidia at the time of XBox, although I think Gamecube had some form of it?
PS2 needed sophisticated blending, because it didn't support multitexturing. Multitexturing allowed quite sophisticated combinations.
 
That is very interesting. I would assume that they really mean this is a proportional sense, and not in a like for like scenario, bit its still interesting none the less. Goes to show just how beneficial edram can be for fillrate.
 
That is very interesting. I would assume that they really mean this is a proportional sense, and not in a like for like scenario, bit its still interesting none the less. Goes to show just how beneficial edram can be for fillrate.

It's not JUST having eDRAM, but also tightly coupling RMW operations around wide eDRAM buses. PS2 and XBox 360 did this, but Wii U does not do it and neither does XBox One (which uses eSRAM but that's not really an important distinction for this purpose).
 
It's not JUST having eDRAM, but also tightly coupling RMW operations around wide eDRAM buses. PS2 and XBox 360 did this, but Wii U does not do it and neither does XBox One (which uses eSRAM but that's not really an important distinction for this purpose).


I see. Can you explain the difference in a little more detail on the difference between the PS2's setup and the Wii U/X1 setup?
 
I see. Can you explain the difference in a little more detail on the difference between the PS2's setup and the Wii U/X1 setup?

Basically, it has what has been later referred to as "magic ROPs."

In DRAM read operations a sense-amplifier measures the charge on the capacitor by discharging it. This results in the contents of the DRAM cell being lost. For this reason a read normally has to be followed by a write to restore the lost value. PS2 (and XBox 360's) eDRAM takes this further - instead of just doing read + write it does read + modify + write, where the modify performs special graphics operations like alpha blending and depth update. This is more efficient than interfacing with conventional DRAM where you'd need two separate read + write cycles to perform the operation.
 
Basically, it has what has been later referred to as "magic ROPs."

In DRAM read operations a sense-amplifier measures the charge on the capacitor by discharging it. This results in the contents of the DRAM cell being lost. For this reason a read normally has to be followed by a write to restore the lost value. PS2 (and XBox 360's) eDRAM takes this further - instead of just doing read + write it does read + modify + write, where the modify performs special graphics operations like alpha blending and depth update. This is more efficient than interfacing with conventional DRAM where you'd need two separate read + write cycles to perform the operation.

I would assume this is a unique feature to the Graphics Synthesizer powering the PS2 that the X1 and Wii U cannot duplicate? Thanks for the info. Im just a Nintendo enthusiast myself, but i enjoy learning from those in the know that are willing to really explain things.
 
I would assume this is a unique feature to the Graphics Synthesizer powering the PS2 that the X1 and Wii U cannot duplicate? Thanks for the info. Im just a Nintendo enthusiast myself, but i enjoy learning from those in the know that are willing to really explain things.

Like I said, XBox 360 used eDRAM with special operations too. It went even further in that it expanded it to 4 samples for MSAA. AMD could have done a design like this for Wii U and/or XBox One if they were so tasked. Others can (and hopefully will) address this better than I can, but I think that they aren't designed this way anymore because it isn't so much of a win with more modern graphics workloads, with deferred rendering and more modern depth buffering and framebuffer compression changing the need for brute force RMWs all the time. That, and the RMWs themselves weren't very flexible, limiting what you can do in multipass rendering. On XBox 360 you couldn't even texture from the eDRAM at all, you had to push it out to main RAM them texture from there. On PS2 you could texture from the eDRAM, but you couldn't utilize nearly as much bandwidth throughput this way, and the texturing capabilities were very limited. You can kind of also see this mirrored at the hardware level with the increasing ratio of TMUs to ROPs.
 
First of all, this mysterious "ready-modify-write" = alpha blending :)

Seems that in general, the fill rate (ROP rate) is the most misunderstood part of graphics rendering performance. Increased maximum fill rate only helps when you are not memory BW, ALU, TMU or geometry bound. In modern games, most draw calls you submit are are bound by these four things instead of fill rate. This is because we have moved from simple (vertex based) gouraud shading to sophisticated per pixel lighting and material definition. Pixel shaders that do more than 100 operations per pixel are quite common nowadays.

In our latest Xbox 360 game we only were fill bound in three cases: shadow map rendering, particle rendering and foliage rendering. Infinite fill rate would make these steps around 10%-25% faster with our shaders, with a total impact less than 5% for the frame rate.

Pure fill rate is no longer the bottleneck for modern (next gen) particle rendering, as particle rendering has gotten much more sophisticated. Modern games do complex per pixel lighting on particles and output particles to a HDR render target. This means that particle pixel shader samples multiple textures (color and normal map at least) per pixel, increasing the TMU usage and the BW usage. Lighting uses lots of ALU instructions. The more lights you have, the more expensive the shader becomes. Soft particles also fetch the depth data (uncompressed read of 32 bits per pixel) = quite a bit extra BW cost (+ TMU cost). Blending to the 64 bit HDR back buffer eats a lot of bandwidth.

In comparison PS2 was designed for high fill rate. This was possible because each pixel did only a very simple ALU operation and only accessed one texture (no programmable pixel shaders were supported). Thus by design your "shader" was never ALU bound or TMU bound. Most common texture format was 256 color paletted texture, everything else was slow. For each outputted pixel the GPU sampled exactly one of these (low bit depth) textures. And it didn't support any fancy filtering (anisotropic) that require multiple TMU cycles to complete. So it was never TMU or BW bound, as long as all your textures (and your render target) fit to the 4 MB EDRAM. The most common render target format was the low precision 16 bit (565) format. In comparison modern 64 bit HDR particle rendering requires 4x more bandwidth per pixel, and if you also take the resolution increase into account, the back buffer BW requirement for particle rendering is over 20x in modern games. Not even the modern BW monsters such as Radeon 7970 GE can reach their full fill rate on particle rendering, because the BW becomes a limit halfway there. 64 bit HDR blending with 32 ROPS at 1000 MHz requires 512 GB/s BW (and the card "only" has 288 GB/s BW). So there's no performance benefit for increased GPU fill rate, until the BW problem is solved.

PS3 and PS4 can definitely exceed the fill rate of PS2. You can get quite high fill rate if you are willing to go back to gouraud shaded rendering with a single 256 color texture (or DXT1 compressed texture) on each particle/object, and you perform no per pixel calculations. However I am personally much more happy with the particles I see in next gen games. The particle counts (and overdraw) don't need to be that high when you have sophisticated particle lighting and soft particle rendering (making the particles look volumetric instead of textured billboards floating in air). For example smoke in recent games (BF4) looks incredible when missiles pass it (lighting the smoke in a realistic way).

Very good explanation. Thanks.
 
Back
Top