PS2 vs PS3 vs PS4 fillrate

Discussion in 'Architecture and Products' started by alexsok, Dec 1, 2013.

Tags:
  1. Megadrive1988

    Veteran

    Joined:
    May 30, 2002
    Messages:
    4,723
    Likes Received:
    242
    Sony's GSCUBE had immense fillrate and aggregate bandwidth for all the eDRAM, each GS I-32 had 32 MB eDRAM.

    Specs for the first one, the "GSCUBE 16" or 16 Blade version shown at SIGGRAPH 2000.

    http://en.wikipedia.org/wiki/GScube
    http://www.assemblergames.com/forums/showthread.php?18036-GSCUBE-information

    [​IMG]

    [​IMG]

    [​IMG]

    The realtime Final Fantasy:The Spirits Within scenes done on GSCUBE were 60fps and much more impressive than the FF TSW scenes Nvidia did the following year on its NV20/GF3 based Quadro DCC using its shaders. Hardly fair though, of course, 16 EEs + 16 GS I-32s vs a single NV20-based chip with zero eDRAM.




    Would've been interesting to see what the canceled 64 blade version of GSCube could've done. As well as the never-developed (or never seen) Graphics Synthesizer 2, especially in those workstations Sony was planning to build with lots of EE2 and GS2 chips.

    I suppose EE3 in a sense became CELL (?) and the GS3 (or Visualizer?) was scrapped in favor of Nvidia and the RSX.
     
    #21 Megadrive1988, Dec 8, 2013
    Last edited by a moderator: Dec 8, 2013
  2. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    290 doubles the maximum theoretical fill rate, but cannot reach the doubled fill rate (due to BW limiations) if you are either using blending or are using wider than 32 bit render target. In a modern HDR rendering pipeline, the extra fill rate (or 64 ROPs) mainly speeds up shadow map rendering (and UI rendering) (*). A glance at fill rate benchmarks might tell you otherwise, but the real cause of improved fill rate in most tests is the 512-bit wide memory bus, not the 64 ROPs. 7970 GE was already BW bound in most fill rate test scenarios.

    (*) You can actually also be fill bound (on 32 ROPs) if you have naively programmed g-buffer rendering using multiple 32 bit buffers. You should pack your stuff inside 64 bit buffers and double your fill rate (this way 32 ROPs are enough to saturate the BW).
    The new dual source color blending modes in DirectX 10 are called... blending, and offer more possibilities than PS2 did. DirectX 11.1 added logic ops on top of that, and it's still just called blending. We are fighting about semantics here.

    PS2 had obviously better blending/RMW hardware than other consoles at that time period. However PS2 blending hardware could do the DOT3 operation, a new feature in Xbox (the original one). This DX7 era feature allowed developers to implement fast per pixel lighting for the first time. It had it's own limitations, but you couldn't efficiently emulate that feature on PS2.
    In PS2-era it made sense to calculate just a single ALU operation per memory load and store. In current era that would be considered a huge waste of memory bandwidth. Back then you didn't need to hide memory latency by a long chain of ALU operations. ALU has become very cheap compared to bandwidth, and the trend does't seem to be slowing down. I don't think we will ever see GPU architectures like PS2 again. It just wouldn't make sense with the current processing technology bottlenecks. It's more efficient to crunch the pixel value inside registers (interleaving the reads you need) and finally write the value to the (slow) external memory. Deferred rendering does this twice per pixel, and the discussion is already going on about the future of it. The extra bandwidth cost of two passes might just not cut it in the future.
     
  3. Laa-Yosh

    Laa-Yosh I can has custom title?
    Legend Subscriber

    Joined:
    Feb 12, 2002
    Messages:
    9,568
    Likes Received:
    1,455
    Location:
    Budapest, Hungary
    I seriously doubt that those images are not from the movie, rendered by PRMan, instead of a GS Cube demo.
    The third one perhaps, and maybe even the first, but the second has proper hair on both characters...

    Even today, it would be impossible to render that amount of geometry and textures with that image quality on any current GPU.
     
  4. Megadrive1988

    Veteran

    Joined:
    May 30, 2002
    Messages:
    4,723
    Likes Received:
    242
    ^Okay I am not saying they were indeed from GSCUBE, I don't know. However I do remember reading that the real time scenes that were done on GSCUBE (whatever they might've been) were from some of the less complex scenes of the film, and even then, they were still of a lower quality than that of the offline rendered movie.

    The scenes done by Nvidia on NV20 / Quadro DCC were of even lower quality and at a much lower framerate, which only improved in fps a little as they moved to NV25 based Quadro but still not as impressive as what GSCUBE did or at 60fps.

    I certainly cannot say those pics were of the GSCUBE version, I just assumed they might be.

    BTW I should have posted the source of those supposed GSCUBE FFTSW images but I did not. They were from this Ars Technica article
     
    #24 Megadrive1988, Dec 8, 2013
    Last edited by a moderator: Dec 9, 2013
  5. Laa-Yosh

    Laa-Yosh I can has custom title?
    Legend Subscriber

    Joined:
    Feb 12, 2002
    Messages:
    9,568
    Likes Received:
    1,455
    Location:
    Budapest, Hungary
    I was thinking a lot about what to reply... See, my background is in CG and not in programming, so I can't really dive into that aspect of renderers and hardware.

    What I can tell you is that REYES and PRMan are outdated on their own, which is why Pixar has spent the last 5+ years working very, very hard on implementing raytracing without throwing out everything. They did a first try on Cars but they did get back to it for various reasons; Monsters University relies heavily on physically correct shading and raytraced global illumination, but the more important reason was that most of their clients, the big VFX houses, have badly needed it.

    But as I've mentioned, VFX and feature animation abandoned the Reyes approach because CPUs became powerful enough to just brute force stuff and raytrace everything. All the various workaround solutions like shadow maps, dozens of hand-placed lightsources, elaborate reflection maps, ambient occlusion and such, were requiring a lot of artist time which eventually became far more expensive than a big render farm.

    So this lead to ILM replacing PRMan with Arnold as their primary renderer. The best VFX vehicle this year - and the number one contender for the VFX Oscar - is Gravity from Framestore, also rendered in Arnold. Other studios rely on VRay, and some places use Maxwell which is an all-out physically correct renderer simulating everything about real light and surfaces.


    Now sebbi came up with a pretty convincing argument about why deferred rendering is becoming more widespread in realtime engines - ALU capacity is easier to increase compared to memory bandwidth.
    I would also point out that deferred rendering became the industry standard - it's used in UE, Cryengine, Frostbite, Killzone etc. - on the previous generation of hardware which wasn't developed with the tech in mind, at all. In fact, the X360's small EDRAM is a quite big problem, but still a lot of developers decided to go for it. So it was already becoming the future direction for realtime rendering, even by the time the X1/PS4 entered into the hw design stage. MS and Sony have only adapted to the industry's requirements.

    It might not be too wise to lock all future development in this direction, but then again it is not happening either. As far as I know, one of the most promising looking game of the first gen titles, The Order, is using forward rendering - so it's not like deferred is the only option.
    On the other hand the content seems to be built to use only a small number of characters and restricted environments, with very few light sources. We'll see about the new Uncharted game.

    I also don't see how the Reyes architecture could be implemented in realtime engines using hardware acceleration. Sure, raytracing isn't a good fit either, which is why a lot of the actual tech is more like the old PRMan stuff - shadow maps, ambient occlusion, HDRI environment map probes and so on. But the hardware isn't really able to run the whole dicing and binning stuff that really gave PRMan its power.

    All in all it'll be interesting to see where the PS5/Xwhatever is going to be moving. I don't really think that realtime raytracing will become feasible, possibly not until we have an unprecedented explosion in computing power and bandwidth - so it'll be on the developers and what they do with the current hardware in the next 5-6 years, instead.
     
    Shoujoboy likes this.
  6. Laa-Yosh

    Laa-Yosh I can has custom title?
    Legend Subscriber

    Joined:
    Feb 12, 2002
    Messages:
    9,568
    Likes Received:
    1,455
    Location:
    Budapest, Hungary
    I have saved a lot of the images from the FF movie many years ago... The movie was a very interesting beast, some of the assets had incredible quality for their time, but others had very visible UV seams and texture stretching and such. A lot of the software tools we take for granted today were not available and it shows. Even a small studio like us can create more detailed content in less time and render it at superior quality.

    This explains some of the deficiencies in the images you've linked; the more important criteria would be stuff like the hair, or the thin stripe of shiny fluid at the bottom of the eyes which would be pretty damn aliased in any realtime rendering. Also, as I recall the scenes rendered on the GSCube were from the first few shots of the movie, where Aki was floating in her ship, on her own. And Square has never really released any images from that demo, so all the illustrations used by websites were from the general PR material of the movie which used actual shots rendered in PRMan.
     
  7. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    I don't want to dive further into semantics but I was pretty deliberate in my choice of calling it alpha blending and not just blending. That's what Sony calls it as well, and it's fitting given that the operations literally follow the alpha blending equation; any flexibility is in the operand selection. It's enough to allow emulation of PS1 blend modes and make multi-pass not completely useless, meaning it's a reasonable choice for Sony's specific goals but not much more.

    As far as just blending goes, I can see that term being valid for any kind of operation involving the new and old pixels.

    I'm surprised to hear PS2 had more advanced blending than its contemporaries given how much weaker the GPU was in most other ways. I guess subtractive blending wasn't a thing for nVidia at the time of XBox, although I think Gamecube had some form of it?
     
  8. Megadrive1988

    Veteran

    Joined:
    May 30, 2002
    Messages:
    4,723
    Likes Received:
    242
    Well good to know then. At least that clears up what I was questioning.

    Would be nice to one day see some of the images from some of the GSCUBE demos one day.
     
  9. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    Yeah :). I use the following semantics myself: RMW (read-modify-write) to a render target = alpha blending. That seems to be the most common way to describe alpha blending nowadays (and includes also those funky use cases possible with "programmable alpha blending", such as per pixel OIT visibility function modification/sorting).
    PS2 needed sophisticated blending, because it didn't support multitexturing. Multitexturing allowed quite sophisticated combinations.
     
  10. Colourless

    Colourless Monochrome wench
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,274
    Likes Received:
    30
    Location:
    Somewhere in outback South Australia
    PS2 didn't support Src*Dest that is extremely useful.
     
  11. alexsok

    Regular

    Joined:
    Jul 12, 2002
    Messages:
    807
    Likes Received:
    2
    Location:
    Toronto, Canada
    So the PS2 had greater fillrate than both PS3 and PS4 given its fixed pipeline and constraints of the time?
     
  12. Goodtwin

    Veteran Subscriber

    Joined:
    Dec 23, 2013
    Messages:
    1,235
    Likes Received:
    714
    That is very interesting. I would assume that they really mean this is a proportional sense, and not in a like for like scenario, bit its still interesting none the less. Goes to show just how beneficial edram can be for fillrate.
     
  13. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    It's not JUST having eDRAM, but also tightly coupling RMW operations around wide eDRAM buses. PS2 and XBox 360 did this, but Wii U does not do it and neither does XBox One (which uses eSRAM but that's not really an important distinction for this purpose).
     
  14. Goodtwin

    Veteran Subscriber

    Joined:
    Dec 23, 2013
    Messages:
    1,235
    Likes Received:
    714

    I see. Can you explain the difference in a little more detail on the difference between the PS2's setup and the Wii U/X1 setup?
     
  15. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    Basically, it has what has been later referred to as "magic ROPs."

    In DRAM read operations a sense-amplifier measures the charge on the capacitor by discharging it. This results in the contents of the DRAM cell being lost. For this reason a read normally has to be followed by a write to restore the lost value. PS2 (and XBox 360's) eDRAM takes this further - instead of just doing read + write it does read + modify + write, where the modify performs special graphics operations like alpha blending and depth update. This is more efficient than interfacing with conventional DRAM where you'd need two separate read + write cycles to perform the operation.
     
    Shoujoboy likes this.
  16. Goodtwin

    Veteran Subscriber

    Joined:
    Dec 23, 2013
    Messages:
    1,235
    Likes Received:
    714
    I would assume this is a unique feature to the Graphics Synthesizer powering the PS2 that the X1 and Wii U cannot duplicate? Thanks for the info. Im just a Nintendo enthusiast myself, but i enjoy learning from those in the know that are willing to really explain things.
     
  17. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    Like I said, XBox 360 used eDRAM with special operations too. It went even further in that it expanded it to 4 samples for MSAA. AMD could have done a design like this for Wii U and/or XBox One if they were so tasked. Others can (and hopefully will) address this better than I can, but I think that they aren't designed this way anymore because it isn't so much of a win with more modern graphics workloads, with deferred rendering and more modern depth buffering and framebuffer compression changing the need for brute force RMWs all the time. That, and the RMWs themselves weren't very flexible, limiting what you can do in multipass rendering. On XBox 360 you couldn't even texture from the eDRAM at all, you had to push it out to main RAM them texture from there. On PS2 you could texture from the eDRAM, but you couldn't utilize nearly as much bandwidth throughput this way, and the texturing capabilities were very limited. You can kind of also see this mirrored at the hardware level with the increasing ratio of TMUs to ROPs.
     
    Shoujoboy likes this.
  18. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,176
    Location:
    La-la land
    Awesome posts, Exo. Thank you so much.
     
  19. Carnage Rules!

    Newcomer

    Joined:
    Feb 6, 2015
    Messages:
    19
    Likes Received:
    0
    Very good explanation. Thanks.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...