*Game Development Issues*

Status
Not open for further replies.
AFAIK nvidia doesn't give their IP to console manufacturers for the product they make, they just sell licences to them to produce\sell them.
I was more or less referring to the stance I would expect nVidia to take, not so much any specifics of the deals themselves. By "giving up", I was thinking about no longer having the GPU and variants thereof to themselves, and having a hold on enthusiasts. If for instance, a GPU between 8600 and 8800-level appeared in a console which sells for cheaper than a top-end video card, there might be fear of cannibalizing the market on the PC who actually buy these beasts and open the gates for the rest of the products in the line (and those that follow).

I am curious about one thing though. I know you're an ace at PS3 hardware, but did you guys have a 360 counterpart to you, an ace Xenos coder to optimize its rendering as well?
I'd also be curious how many there are of the latter in the first place. People who need to know the ins and outs of the PS3 are in demand, but the 360 is something that many studios have found that people can grow into the role of knowing how to handle the issues (excluding the obvious exceptions).
 
Last edited by a moderator:
I'd also be curious how many there are of the latter in the first place. People who need to know the ins and outs of the PS3 are in demand, but the 360 is something that many studios have found that people can grow into the role of knowing how to handle the issues (excluding the obvious exceptions).

Couldn't one (or someone else) argue that since the 360 has a rather longer list of games out there, there's bound to also be more 360 experience out there?
 
Couldn't one (or someone else) argue that since the 360 has a rather longer list of games out there, there's bound to also be more 360 experience out there?
Experience, yes, but experts is something else. I tend to see a lot more people who have high-level experience and learned "this is a good idea, this is a bad idea," but few who have gotten to the point of being a real "ace" with Xenon and all of its idiosynchracies and ways to get the most out of it. And that's mainly because there hasn't been strong motivation for it. PS3, otoh, has strong motivation to groom people into the path of trying to polish every dirty corner (that motivation unfortunately being "do it or your game slows to a crawl").
 
Couldn't one (or someone else) argue that since the 360 has a rather longer list of games out there, there's bound to also be more 360 experience out there?

Not to mention the additional year on the market ;)

Experience, yes, but experts is something else. I tend to see a lot more people who have high-level experience and learned "this is a good idea, this is a bad idea," but few who have gotten to the point of being a real "ace" with Xenon and all of its idiosynchracies and ways to get the most out of it. And that's mainly because there hasn't been strong motivation for it. PS3, otoh, has strong motivation to groom people into the path of trying to polish every dirty corner (that motivation unfortunately being "do it or your game slows to a crawl").

I guess the used API is the key player for this motivation.
 
I didn't include EDRAM resolve time in my tests, and as we all know theory and practice are different things, expecially with such complicated architectures.
Theory and practice differ when there's a factor that you didn't consider in the theory, like bandwidth, or on the PC vertex buffer location. When you're talking about rendering shadow maps and z-only passes into EDRAM using 12 bytes per vertex, there isn't really any room for theory to deviate.

There's a decent chance of early-Z on RSX being more effective (due to finer granularity) than Hi-Z on Xenos.

Jittered samples are not a problem, most of the ppl don't notice the difference anyway, the only issue I have with using MSAA is related to alpha tested meshes silhouettes that suddenly look lower res.
Alpha tested meshes are low poly (since that's their whole raison d'etre). Why not just set a multisample mask and draw those primitives multiple times with centroid sampling?

Alpha tested meshes are then rendered about the same speed as w/o AA, but everything else gets rendered faster (4x on Xenos, 2x on RSX).
 
While I have yet to see this for other passes as I haven't followed closely enough, I've also seen things like shadowmaps, and in particular pre-filtered shadowmaps (e.g. VSMs) to run faster on PS3.
Yeah, I can definately see some places where RSX's texel rate is an advantage. Gaussian and Poisson blur filters, spherical harmonics, etc.

However, I would expect that with VSM, doing the actual rendering is faster on Xenos since it has high precision filtering (32-bit integer) whereas RSX doesn't, and I know from personal experience that VSM with FP16 is ridiculously limited. You'd have to do it all in the shader.

Well, the idea of a downscaled down-spec'ed G80 derivative as RSX was brought up before, and we've all heard stories about the idea from various people. Still, even with the assumption that it could have worked on a technical level, there's no way it could have happened on a business level.
Oh sure, but that wasn't my point.

fireshot was questioning how RSX could possibly be a problem when G70 was so competitive a few years ago and R600 is eating G80's dust today. That's screwed up logic, and I was just pointing that out.
 
Last edited by a moderator:
The best example that I can give you is from what I've just finished working on, which is procedural grass generation and rendering. I did develop the first version on X360 and it was heavily optimized. We push hundred thousand blades, no sweat.
When I got to add the grass technology to the PS3 build the results were horrendous. The same VS/PS run at 1/3 the speed. After spending a month moving the procedural generation part to SPUs (btw SPUs are damn fast at that), reducing the RSX's vertex load to bare minimum - the grass runs at 1/2 speed, presumably limited by PS/fillrate/blending. So I'm still way off from my X360 target and I've consumed way more memory on the PS3 to hold temporary vertex buffers, and on top of that some features are missing, since they would require me sampling textures from the SPU which I'd rather not do.
Cool. This is exactly the sort of thing that I'd expect Xenos to excel at by large factors. Lots of alpha blending to make grass/hair/bushes look more real. I think grass is the last piece of the puzzle for a photorealistic driving game. To bad MS doesn't have a PD equivalent.

The vertex texturing sounds like fun too. Are you leaving footprints in the grass? :cool:

Regarding your optimization, do you think going from 1/3 to 1/2 was worth the effort? I hope you tried isolating the pixel and vertex aspects of performance (by changing vertex/pix ratio and analyzing) before you went down the route of SPU offloading.
 
See that's the key, "RSX gets vertex shader limited". The assumption is always that verticies are suddenly just not needed anymore. That a high vertex count means that an artist screwed up someplace, or that they are wasted someplace. Coincidently, I only hear this from PS3 coders :)
Rendering something that fills a couple of thousands of pixels but requires to process 10k or 20k or more vertices is a screw up in my department. Just cause a given architecture is very good a fixing those problems it doesn't mean we shouldn't have spent those cycles on something that actually givew a real contribution to the final image, and a minor contribution to some random developer ego.


I don't know when it happened that all of a sudden it was determined that 2 million or so verticies is all you ever need and everything else is a waste. I just don't agree with that.
I don't agree with that either, I think you got a bit confused here.
No one said (certainly not me) that 2 (3, or 4..) milion vertices are all you ever need and everything else is a waste. What I said many times is that simply pushing the number of triangles on screen is not the answer cause we are already able to process all the triangles we need to theoretically produces very high quality images with super smooth geometry.
Mindlessly increasing the number of triangles we push is counter productive on current architectures in at least two different ways:
1) the vast majority of the extra geometry pushed doesn't contribute in any relevant way to the final image, but in burns ALUs and mem cycles
2) extra geometry generates small triangles which in turn drastically reduce pixel shaders efficiency (and this is something not even a unified architecture can fix

I'd bet though that vertex count will jump in the next generation.
What an amazing prediction :)
Not because artists are dumb, or because they are a waste. The count will jump because they are needed. The only alternative is if someone offers a realtime vertex distribution system suitable for games.
Some studios were doing stuff like that in the previous gen, and it's probably going to happen this gen as well. It's just a matter of time.

Or were you comparing your optimized PS3 pipeline to whatever the 360 had at the time and/or whatever it inherited from your PS3 related changes?
Do you think I spend my time optimizing PS3 and crippling 360? I already wrote I increased performance on both platforms.
 
Theory and practice differ when there's a factor that you didn't consider in the theory, like bandwidth, or on the PC vertex buffer location. When you're talking about rendering shadow maps and z-only passes into EDRAM using 12 bytes per vertex, there isn't really any room for theory to deviate.
This is what you're doing here, missing some factors, just to cite some: data alignement, pre and post transformed cache lines size and replacement policies. Plus other factors that can't be mentioned here. Just because architecture A is on paper better than architecture B at something.
There's a decent chance of early-Z on RSX being more effective (due to finer granularity) than Hi-Z on Xenos.
Who has told you that? :)
Alpha tested meshes are low poly (since that's their whole raison d'etre). Why not just set a multisample mask and draw those primitives multiple times with centroid sampling?
Cause you don't wanna render your vegetation 4 times per N shadow maps, that's why.
BTW..even if ROPs can fill N zixels per clock cycle it doesn't automatically mean that the rest of the system will keep up with that.
 
we are already able to process all the triangles we need to theoretically produces very high quality images with super smooth geometry.

But nAo, nothing is super smooth, that's why we use normal mapping. All the detail has to come from something and a lot of things take a LOT of geometry to get right. Hair has already been mentioned in another topic, realistic cloth needs high tesselation levels too, and any monster or scifi armor or even a 21st century soldier's heap of equipment needs geometry badly. Vegetation. Rocks. Realistic tires and detailed rims. All the many many dozen little items to bring a world to life. Normal and parallax maps are a good cheat but not the real thing...
 
But nAo, nothing is super smooth, that's why we use normal mapping. All the detail has to come from something and a lot of things take a LOT of geometry to get right. Hair has already been mentioned in another topic, realistic cloth needs high tesselation levels too, and any monster or scifi armor or even a 21st century soldier's heap of equipment needs geometry badly. Vegetation. Rocks. Realistic tires and detailed rims. All the many many dozen little items to bring a world to life. Normal and parallax maps are a good cheat but not the real thing...
Don't worry, in my perfect universe normal maps wouldn't exist :)
I'm finding very hard to get this message clear: we can already process in real time all the geometry needed to have smooth meshes. we can easily render 3-4 milion triangles per frame, and game titles would look amazing if we were able to distribute that stuff in a decent way.
That's all I'm saying, nothing more, nothing less.
At these resolutions more geometry will simple kill pixel shader efficiency.
I'd prefer to shift to something like REYES: tesselate at the rate that you need and stochastically compose your shaded stuff for happy AA/Motion Blur/DOF (and that would be a nice use for some edram)
 
The vertex texturing sounds like fun too. Are you leaving footprints in the grass? :cool:

Regarding your optimization, do you think going from 1/3 to 1/2 was worth the effort? I hope you tried isolating the pixel and vertex aspects of performance (by changing vertex/pix ratio and analyzing) before you went down the route of SPU offloading.

Heh, I was trying to be vague, but I can't hide anything from you guys :)
Anyways, was it worth the effort - well reading my previous posts, what do you think? At the end I had no choice though, even going from 1/3 to 1/2 was still better than nothing. The alternative was to remove a lot of the grass and the game just doesn't look as sexy without it.
 
This is what you're doing here, missing some factors, just to cite some: data alignement, pre and post transformed cache lines size and replacement policies. Plus other factors that can't be mentioned here.
So you're saying RSX is beating Xenos in vertex throughput? I still find it shocking that these factors you mention could, for example, make RSX perform at its peak rate but keep Xenos at half speed when both are optimized. I mentioned 12b/vert because it's so far below BW limitations (even at 500M verts/s) that it wouldn't be a problem.

Who has told you that? :)
Hi-Z is based on 8x8 tiles and performed at reduced precision. I don't know if RSX also has a similar tile based rejection, but a full precision Z test is done at the top of the pipe (assuming no shader depth output) at increased speed, whereas for Xenos the full test for fragments surviving Hi-Z is done on the EDRAM.

Cause you don't wanna render your vegetation 4 times per N shadow maps, that's why.
If you're rendering 1/4 the pixels and the polygons are big enough not to be vertex limited, is that so bad?

Alpha-tested polys: 1/4 the pixels with 4xAA, rendered 4 times.
All other polys: 1/4 the pixels with 4xAA, rendered once.

You seriously don't see an opportunity to gain speed here? Yes, drawing alpha tested polys 4 times at a quarter res could be slightly slower than once at full res, but you save so much on all other pixels. If you were completely setup limited, at the very least you could render a substantially larger shadow map without a perf hit.
BTW..even if ROPs can fill N zixels per clock cycle it doesn't automatically mean that the rest of the system will keep up with that.
But that doesn't mean you won't gain any perf benefit going N/4 to N zixels per clock or IQ benefit going from M pixels to 4M pixels in the shadow map.
 
Don't worry, in my perfect universe normal maps wouldn't exist :)
I'm finding very hard to get this message clear: we can already process in real time all the geometry needed to have smooth meshes. we can easily render 3-4 milion triangles per frame, and game titles would look amazing if we were able to distribute that stuff in a decent way.
That's all I'm saying, nothing more, nothing less.
At these resolutions more geometry will simple kill pixel shader efficiency.
I'd prefer to shift to something like REYES: tesselate at the rate that you need and stochastically compose your shaded stuff for happy AA/Motion Blur/DOF (and that would be a nice use for some edram)

OT

If you could choose would you go with a scanline renderer or global illumination?
 
I don't agree with that either, I think you got a bit confused here.
No one said (certainly not me) that 2 (3, or 4..) milion vertices are all you ever need and everything else is a waste.
That's pretty much exactly what you're saying, and you reiterated it in the next paragraph. Not only are extra vertices not needed, but they're bad, according to you. You're basically saying that if RSX is ever vertex shader limited, then there are too many vertices.

I agree that there's a limit to how many polys are needed, but we're not there yet. Sub-pixel polys in areas of high detail also act like an automatic selective supersample where multisampling is inadequate. Yes, this decreases shader efficiency, but there's no other way as the details you want to render simply don't have quad-level parallelism.

What an amazing prediction :)
Well, according to you 2-4M is enough, so there's no need to increase poly count. The whole point of his statement is that poly count will indeed increase next gen, thus proving that more polys are needed.

Do you think I spend my time optimizing PS3 and crippling 360? I already wrote I increased performance on both platforms.
That's not what he's trying to say. He's asking whether you're optimizing Xenos with as much effort. From what you're describing, you're spending time to increase RSX's speed and if Xenos benefits, then great, but you're never making optimizations specifically for Xenos.
 
Not only are extra vertices not needed, but they're bad, according to you.
I think you missed the fact that Joker references were related to previous threads where I wrote much more on the subject than I did in this thread. Since I don't like selective memories I tried to remind him what I was talking about.
What I wrote many times on this forum is that simply throwing more triangles with the given ratio between primitives and pixels we have now it doesn't really look the most sensitive thing to do, expecially given the quad based architetures we have now.

You're basically saying that if RSX is ever vertex shader limited, then there are too many vertices.
No, I said that IN THE TITLE I'M WORKING ON when RSX is vertex shader limited is because we are throwing an insane amount of geometry that covers only a few hundred pixels.
I usually don't comment on the work of other ppl are doing on other title that different art, different requirements, etc.
If there is someone overgeneralizing here is not me (and not you either) :)

I agree that there's a limit to how many polys are needed, but we're not there yet. Sub-pixel polys in areas of high detail also act like an automatic selective supersample where multisampling is inadequate. Yes, this decreases shader efficiency, but there's no other way as the details you want to render simply don't have quad-level parallelism.
I'd be happy to throw all those subpixels polys where it's needed, not just anywhere, thank you :)
Sorry if I remind you again about this but you were not believing me even when I was telling you that decoupling shadowing computations from other shading operations was a big win due the current quad based architectures in very low pixel/primitive scenarios..current architectures are already very inefficient, too bad I can't quote numbers.
Almost 2 years later Crytek guys publish a paper where they show exactly the same technique..and I guess you believe me now :)
I've seen a few next gen engines/games on different hw, profiled them and I can tell you that the VAST majority of geometry we throw at the problem goes unnoticed.
So, one last time, my statement is: We really don't need more geometry than this IF WE COULD distribute it in a clever way.


Well, according to you 2-4M is enough, so there's no need to increase poly count. The whole point of his statement is that poly count will indeed increase next gen, thus proving that more polys are needed.
Of course it will increase, more is always better. Though I prefer to see next gen GPUs going towards a different direction.

That's not what he's trying to say. He's asking whether you're optimizing Xenos with as much effort. From what you're describing, you're spending time to increase RSX's speed and if Xenos benefits, then great, but you're never making optimizations specifically for Xenos.
1) I thought some ppl here already wrote that Xenos is so efficient that you really don't need to do much to improve its performance. Anyway..of course I did what I could do, it's not my fault if on RSX ppl have more cards to play. All the optimizations that could have been possibly done on Xenos were already in place (except one, which would affect both platform which is not place right now..)
2) Again it's not my fault (as I did the best I could do..I guess someone better than me can get better performance out of it) ft some piece of hw doesn't exactly live up to expectations.
We heard AA was free..blending was free, 95% efficiency, etc.. (as we heard on RSX about amazing 128 bit HDR and crap like that..)

One last thing: do you think I really care about RSX or Xenos or whatever? I care about getting out the most from anything I'm working on at any given moment.
If I 'defend' RSX is because my personal opinion and experience doesn't match what sometime I read here. How many times we heard that RSX is crap if CELL is not pulling some trick to help it? We heard about RSX having half or a quarter vertex shading perf of Xenos, well...as I already said I think there are already 2 or 3 games on the shelves that kind of disprove these statements , but what do I know? ;)
So how do those statements fit reality? cause it seems to me someone need a reality check here (again, not you).

Marco
 
Last edited:
If I 'defend' RSX is because my personal opinion and experience doesn't match what sometime I read here.

I get the exact same impression, no matter what you say it´s still supposed to "suck" eventhough there is games out now that back you up :)
 
I get the exact same impression, no matter what you say it´s still supposed to "suck" eventhough there is games out now that back you up :)

I agree the first and second party games for the PS3 have outstanding graphics. The problem is for every great first/second party title there are 10 really bad 3rd party ports. The people who are having the problems are the multiplatform people. People can put thier heads in the sand or ignore it but as of right now there is a problem with multiplatform developers and the PS3. Even games with the PS3 as the lead platform like dirt were delayed on the PS3. It could be tools it could be hardware but does it really matter? Look at last gen the xbox even with the worst efforts multiplatform games looked better than the PS2.

I am no developer but I would say the evidence at this point is on the side of the 3rd party guys who are having a harder time with the PS3. That could change a year from now but as of right now anyone with 2 eyes can see the 360 has better tools or is easier to developer for. It is a shame that people are so quick to blame "lazy developers" for the last year instead of sony,ibm and nvidia.
 
Status
Not open for further replies.
Back
Top