RSX: memory bottlenecks?

scooby_dooby said:
I was under the impression it was a workaround to a problem, but by no means a perfect solution.

It's already been said, but no HDR solution is perfect. In fact, nothing in computer graphics is perfect. Everything is an approximation, an attempt to boil reality down to maths you can do quickly enough in realtime ;) NAO32 is just a different approach to the obvious one "provided" by hardware. That does not make it any more or less valid than any other. Smarter? Perhaps.

scooby_dooby said:
For example, the KZ dev's have stated that method simply wouldn't work for their type of lighting requirements.

That's why it's for HS. Every game has different requirements, techniques that work well for one may not work so well for others.
 
scooby_dooby said:
Since then they've also implented NAO32 wwhich is not HDR,
NAO32 *is* HDR and because this point has been explained several times but people seem to persist in thinking otherwise, I'm tempted to start neg-repping people who say it isn't for being too stupid to learn, after several discussions on the matter, what HDR is and how different data formats can represent it in different ways with varying efficiencies.

I refer you to the previous discussion on the matter where Deano found himself having to explain why not being FP didn't mean not being HDR -
http://www.beyond3d.com/forum/showpost.php?p=648657&postcount=194

There's this important calrification from Marco as to the colourspace model -
http://www.beyond3d.com/forum/showpost.php?p=649948&postcount=254

Saying NAO32 is a workaround for real HDR is like saying saving a digital photograph saved in an HSL format is a workaround for saving it in RGB and it's not really a digital photo.

NAO32 represents colours differently, more akin to the way humans see than how TVs make up an image. The Lab space has plusses such as the BW saving in Marco's implementation and the true representation of 'overbright' areas of the image. It also has downsides like not being able to support standard blending modes which for example are essential in alpha blending (Marco visited this topic here). RGB doesn't have the problem with blending that colour+intensity colour spaces have, but had other problems like it can't represent overbright areas correctly.

There is no way perfect way to represent colour!
There is no one solution to HDR!
There is no one true way to do HDR and everything else is a fake, especially when your display is limited to trying to portray images as your eye sees in the real world with a load of red, green and blue dots.

This is true of all areas of computer graphics. CSGs are no more a fake of real objects than triangle meshes are. There's no correct way by which everything else is a fake. Everyone on this forum should know that by now. The fact you may associate one particular system as the 'proper' way and anything different as a 'fake' means you don't grasp the nature of the task of creating artificial images on finite hardware to be transmitted to the human eye through a technologically limited display system.
 
Just kidding guys, but ERP can you talk a bit more about this hunch of yours ?

nAo was quite confident that this would allow them to use MSAA as WELL as hardware blending and TSAA since the buffers were stored as 32 bits RGBA basically (all calculations were done in the shaders at full FP precision).
Actually, he made it quite clear that blending as is doesn't work with the alternate colorspace since chrominance and luminance are separated. So when doing simple alpha blending in NAO32, you have to convert back to RGB, do the blend, and then convert back to Luv.

So if you have loads and loads of accumulated blending, these polygons which would probably otherwise go through ordinary fixed-function processing (well not really, but in the sense that you don't do anything that requires programmable shaders) now actually have some per-pixel shader load. And since alpha blending is something that actually *requires* whatever overdraw it has, you have to pay the piper on it.

While the load is most likely small enough in the case of HS, that doesn't mean it would be true in all cases.
 
As ERP says the reason not to use NAO32 is blending. The colour space is okay for small lerps (like in an AA downsample) but gives visual errors for more extreme lerps (like blends).


Of course what you can do, is render the opaque renderables in NAO32 then switch to FP16 for the alpha-blending etc.

Certainly for our title, we have much more opaque geometry than translucent... so its a win for us. We trade a little pixel shader power to get all the benefits of INT8 rendering for most of our geometry.

A bit of an aside is that most of our effects have always been LDR not HDR for art reasons... generally special FX (particles etc.) look better in LDR (we support both)... So the actual amount of HDR alpha we have is very very small.
 
blakjedi said:
So like games that take place in water or foggy environments?

Or smoky battlefields with lots of explosions, rockets and flamethrowers. And even blood.
 
Another bus-related problem is that if you spend all the GDDR bandwith on the framebuffer, then what will you do with the remaining part of that 256 MB RAM? No framebuffer should be that big, even two 1080p double buffers should fit in there, with FSAA. But if you use the remaining memory for anything else, then it'll eat into the framebuffer bandwith as well.
Splitting the framebuffer between the two memories isn't really likely either.

Thinking about it, using RSX for audio could make use of the extra GDDR RAM... One could store voices and music there, as those won't need a lot of bandwith but a lot of space instead...
 
Laa-Yosh said:
Another bus-related problem is that if you spend all the GDDR bandwith on the framebuffer, then what will you do with the remaining part of that 256 MB RAM? No framebuffer should be that big, even two 1080p double buffers should fit in there, with FSAA.

Exactly why a smart developer won't allow their framebuffer to consume all of VRAM's bandwidth. You'd probably want to reserve at least a few GB/s for texture reads etc. It means sacrificing framebuffer activity, but it would increase PS3's bandwidth advantage for extra-framebuffer activity.

Laa-Yosh said:
Thinking about it, using RSX for audio could make use of the extra GDDR RAM... One could store voices and music there, as those won't need a lot of bandwith but a lot of space instead...

I think the reference to RSX in relation to audio on PS3, on the Devstation website, isn't referring to RSX the GPU, but a piece of audio middleware called RSX.
 
Laa-Yosh said:
Another bus-related problem is that if you spend all the GDDR bandwith on the framebuffer, then what will you do with the remaining part of that 256 MB RAM?
One might want to use that 'remaining part' to store all those kind of data requiring a moderate amount of bandwith.
 
nAo said:
One might want to use that 'remaining part' to store all those kind of data requiring a moderate amount of bandwith.

Could you not see "regular" FP16 being used in cutscenes and their like? Not for NT but for other devs. Or have you got a very positive feedback from many in the "game" so too say :) that they are going to use your method also?
 
Hhmmm, what's the deal with the flexio, and all this talk of mass latency from that pool? Wasn't this supposed to be used to allow multiple cells to work together(probably even the original hybrid gpu too.), those should be more sensitive to mass latency, no? So what's the deal?
 
overclocked said:
Could you not see "regular" FP16 being used in cutscenes and their like? Not for NT but for other devs. Or have you got a very positive feedback from many in the "game" so too say :) that they are going to use your method also?
If a game already uses an alternative color space that gives extremely good quality and improve perfomances as well I can't see any specific reason to switch to another color space just for cut scenes (at least not in the general case)
I really don't know what other devs are doing..
 
zidane1strife said:
Hhmmm, what's the deal with the flexio, and all this talk of mass latency from that pool? Wasn't this supposed to be used to allow multiple cells to work together(probably even the original hybrid gpu too.), those should be more sensitive to mass latency, no? So what's the deal?
Well, a Request from the GPU has to pass FlexIO, Cell and XDR. Each of those steps introduce Latency and are certainly much more expensive that the Local Pool - I remember reading FlexIO having a pretty bad effective/theoretical Bandwith ratio under real-World use( Im not motivated to find it, but it was from the Mouth of an engineer related to Cell ).

Clearly you want to send big Chunks of Data over the FlexIO-Bus, and not using it for Texture-Lookup and Z or Framebuffer . The Multiple-Cell approach is that you spread out SPE-Programms and keep asmuch data of each Programm on the local Chunk of Memory and transfer the rest of it in big chunks.
 
Npl said:
Well, a Request from the GPU has to pass FlexIO, Cell and XDR. Each of those steps introduce Latency and are certainly much more expensive that the Local Pool - I remember reading FlexIO having a pretty bad effective/theoretical Bandwith ratio under real-World use( Im not motivated to find it, but it was from the Mouth of an engineer related to Cell ).

Clearly you want to send big Chunks of Data over the FlexIO-Bus, and not using it for Texture-Lookup and Z or Framebuffer . The Multiple-Cell approach is that you spread out SPE-Programms and keep asmuch data of each Programm on the local Chunk of Memory and transfer the rest of it in big chunks.
wow, hopefully, if he's right, it was fixed in a later revision.

Anyway, I just can't believe that after delivering and overdelivering all these years, they'd fail to impress this time. Look at the psp, the ps2, for their time they were definitely not comparable to year old gpus in their respective markets.
 
IMO, If Sony was really serious about crushing Xbox 360, and future-proofing the PS3, so that it's 'good' for 6 years or so, they would use a 256-bit bus to GDDR4 memory instead of a 128-bit bus to GDDR3 memory.

As well as using a ~51 GB/sec bandwidth implementation of XDR


the 35 GB/sec bandwidth between RSX and Cell will be good for some things, but not for others.
 
http://www.techreport.com/reviews/2006q1/radeon-x1900/index.x?pg=1

You may recall how we noted a while ago that design limitations prevent some GPUs from achieving optimal performance at really high resolutions. This limitation particularly affects GeForce 6 series GPUs; their performance drops off markedly at resolutions above two megapixels, like 1920x1440 or 2048x1536. NVIDIA isn't saying exactly what all is involved, but certain internal buffers or caches on the chip aren't sized to handle more than that. Recent ATI graphics chips, including the Radeon X1800 series, have a similar limitation: their Hierarchical Z buffer can only handle up to two megapixels of resolution. The performance impact isn't as stark as on the GeForce 6, but these Radeons can only use Hierarchical Z on a portion of the screen at very high resolutions; the rest of the screen must be rendered less efficiently. The R580 has a 50% larger on-chip buffer that raises that limit to three megapixels, so super-high-definition graphics should run more efficiently on Radeon X1900 cards.

RSX 4mb on-chip cache. :)
 
Back
Top