*Game Development Issues*

Status
Not open for further replies.
it's much more than that..characters have to be rendered in the shadow maps as well, and maybe even in a zprepass (dunno if they use it or not)

I assume backface culling etc isn't accounted for either.

Your notation is definitely wrong (multiplication has precedence you know), but more importantly next MLB 2K is typically pushing around a million polygons/per frame @ 60 fps.
That is 60 million polygons/sec. You need environments to compare.

My bad, Obviously that assume single pass. Anyone wanna clue me in how much they think it is?.
 
Don't really know. I was getting 95.6% predicated tiling success on that pix grab, so only 4.4% waste. Pretty sweet :) I don't know why others don't tile. It's so easy. Split the screen in two, figure out whats above and whats below the tile line. Presto, tiling complete and you get very fast 4xmsaa as a bonus. I mean sure, I would rather not tile if I didn't have to of course! But it's pretty easy to do.

First: Nice results :)

Second: Shouldn't there be at least three tiles @720p?
 
= 2.07 million polys / second for the pirates + drake only, unfortunately we don't have any numbers on environments.

My maths could be wrong though, (highly likely :???:)
Not your maths, but maybe your point! ;) We're talking about 2 million vertices per frame being rather ordinary this gen. 2 million polys per second is utter small fry. No need to use SPUs or ICE or any other TLA than RSX for 2 million tiddly triangles a second!
 
(nAo, I was more interesting in continuing our discussion of using AA to increase shadow map speed/resolution, so if you reply to my last post on that subject I'd really appreciate it.)
What I wrote many times on this forum is that simply throwing more triangles with the given ratio between primitives and pixels we have now it doesn't really look the most sensitive thing to do, expecially given the quad based architetures we have now.
...
I'd be happy to throw all those subpixels polys where it's needed, not just anywhere, thank you :)
...
So, one last time, my statement is: We really don't need more geometry than this IF WE COULD distribute it in a clever way.
I understand that, but there's no escaping the nature of 3D workloads. Consider visible edges are only 10 pixels long, which means you still have room for improvement for high frequency details. The triangles near these edges will have very few pixels since they're so angled, but the ones viewed head on are pretty big from a quad point of view. This is why higher polycounts near silhouettes (i.e. your intelligent distribution idea) isn't going to achieve that much better quad efficiency than higher polycounts everywhere, as the triangles at the edge are the ones that really hurt efficiency in the first place. I'll admit that culling/clipping gets more efficient, but it seems you're focusing on quad efficiency in this post.

I'm not saying increase poly count for the heck of it. There's a point where smooth surfaces are smooth enough, and for R&C type art we don't need high tesselation. However, any details that affect visibility have no other solution (except alpha testing in limited circumstances), particularly if you want to avoid aliasing. It's inefficient for quad-based rendering, but there's really no other choice.

Sorry if I remind you again about this but you were not believing me even when I was telling you that decoupling shadowing computations from other shading operations was a big win due the current quad based architectures in very low pixel/primitive scenarios..current architectures are already very inefficient, too bad I can't quote numbers.
Yeah, it took a while for that to sink in. :smile: After all, doing more shader ops per pixel to save on pixel load is a little hard to swallow at first.

In light of antialiasing, though, it's still questionable whether that's a good way to do things in general (by that I mean deferring computations to preserve quad-level efficiency). Sure, for N sample PCF you can distribute the shader load across the samples like in KZ2. But most shaders (including VSM) can't do that, and if you start looking at which samples are equal for selective supersampling, you're back to square one wrt efficiency.

Being able to perform a shader op once and copying the result to all samples in the quad affected by the current polygon is a good route to efficiency. Trying to be clever and increasing parallelism through more complicated shaders is not the way to go IMHO.

We heard AA was free..blending was free, 95% efficiency, etc.. (as we heard on RSX about amazing 128 bit HDR and crap like that..)
So blending isn't free? Aside from imperfect seperation during tiling and the additional quads, AA isn't free? That's news to me.

EDIT: Oh, you're talking about blending FP10/I16, aren't you. Yeah, that's a shame...
We heard about RSX having half or a quarter vertex shading perf of Xenos, well...as I already said I think there are already 2 or 3 games on the shelves that kind of disprove these statements , but what do I know? ;)
Well that's absolutely true, but vertex shading is rarely the bottleneck now, is it...

Maybe you're talking about triangle setup. But Joker didn't measure that.
Isn't that the real bottleneck most of the time? If he has 10M verts per frame counted the way that you're describing, that likely means ~10M tris/frame, right?

BTW, I'm curious about how fast RSX can cull/clip vertices. If we know peak setup is 250Mtri/s in the simplest case, why can't someone tell what the culling/clipping rate is? I originally assumed it was the same because many people (including myself) consider culling/clipping to be part of setup.

I'd like to see some games doing SAT-VSM on PS3 via SPUs :)
Wouldn't 16 texture fetches per pixel be rather ugly for an SPU?
 
Last edited by a moderator:
32-bit integer?
And I don't know why people think X360 has 32 bit integer filtering.
More specifically, I meant 32-bit fixed point. AFAIK, Xenos does filter that, and even does FP16 filtering this way (i.e. as 16.16).

You can't write at that precision, but you can split a FP32 value and render into a 2 channel 16-bit fixed point texture, right? Then you just have to reinterpret the result as single channel 32-bit fixed point. There's no correct blending, but you don't need that for VSM.
 
More specifically, I meant 32-bit fixed point. AFAIK, Xenos does filter that, and even does FP16 filtering this way (i.e. as 16.16).

You can't write at that precision, but you can split a FP32 value and render into a 2 channel 16-bit fixed point texture, right? Then you just have to reinterpret the result as single channel 32-bit fixed point. There's no correct blending, but you don't need that for VSM.

hm... Jawed's document (from April 2006) says there is no filtering for the FP32 texture format.

:?:

http://forum.beyond3d.com/showpost.php?p=746115&postcount=5


edit:

ok whoops FP...floating point...fixed point.... My bad. :p
 
How do you do 4xAA with only 20 bytes per pixel?

1280x384 * (4B colour + 4B Z) * 4 samples = 15 MiB.

That one is very easy to explain! It's because I'm currently recovering from shingles and very heavily medicated at the moment :oops: You're right, its 3 tiles for 4xmsaa, 0 to 255, 256 to 511, and the rest (height wise). The tile calculation is still pretty simple though, just a few lines different from the 2 tile calc.
 
Either way, we typically only use VSMs for indoor lighting, so depth range isn't really enough for FP16 (using -1..1 range) to be a problem and there are ways of fighting it even when it does become an issue.

Also, on my own little experiments with gradients and LoG filtering actually produce more admissible results without hardware filtering anyway.
The thing about VSM is that you need twice the precision in the depth squared term to match the interpolation granularity of the depth term. The peak value of the variance is 0.25*(caster-receiver distance)^2 for the majority of shadow edges, so you need a precision floor well below that. If you want nice shadow edges for objects separated by 1/20th of the total shadow range, you'll need 10e-4 precision. The -1...1 range of FP16 (12-13 bits) wouldn't even give you that much. So unless your scene looks good using 6-bit shadowmaps with PCF, it's unlikely that FP16 will be adequate. 16-bit fixed point is better, but still pretty limited. You really need 32-bit filtering.

Care to explain a bit more about your software filtering? Sounds neat.
 
The thing about VSM is that you need twice the precision in the depth squared term to match the interpolation granularity of the depth term. The peak value of the variance is 0.25*(caster-receiver distance)^2 for the majority of shadow edges, so you need a precision floor well below that. If you want nice shadow edges for objects separated by 1/20th of the total shadow range, you'll need 10e-4 precision. The -1...1 range of FP16 (12-13 bits) wouldn't even give you that much. So unless your scene looks good using 6-bit shadowmaps with PCF, it's unlikely that FP16 will be adequate. 16-bit fixed point is better, but still pretty limited. You really need 32-bit filtering.
Oh absolutely. And it's quite easy to come across a situation where even 64-bit fixed point isn't even approaching good enough. I'm just saying we haven't hit those, FWIW. Part of that is a series of hacks we throw into to hide all the artifacts at the cost of softness. But since those are artist-controlled hacks, many of our lighting artists like it that way. Also, since I haven't touched it in a long time, and there have been changes, I wouldn't be surprised if we were doing something more on the PS3 to take advantage of the formats it does filter well enough (e.g. using 3 channels on FP16 to extend the precision).

I will say, though, that the relative performance difference and the fact that we get texel fillrate limited on Xenos and not on RSX is for the complete render. Even if the filtering passes on PS3 are slower, the difference is small enough that it doesn't counter the advantages over the rest of the scene render. Bearing in mind that we do have scenery where the average pixel has 9 texture layers (and in some cases, as high as 14). It's a case of otherwise being not limited on Xenos (e.g. when using PCF), but then finding the straw that breaks the camel's back when you throw in a pair of fullscreen filtering passes for each shadowmap.

Care to explain a bit more about your software filtering? Sounds neat.
Well, the general idea is just to model higher-order representations of the distance function. Namely looking at the first and second derivative, filtering, and using differentials as information about softening. The main thing I have to figure out (and I would if I had time to work on it), are how it changes the boundary conditions of the shadow. I had a dream a few days ago about several different ways to combine the terms and trying to think about where the falloff curves would hit zero... The concept itself works and has fewer precision issues, and there are also ways of producing *some* progressive softening as a freebie (albeit with an upper bound on how soft you can get), but on top of the as-yet-unsolved boundary problem, there are more filtering passes than "vanilla" VSMs, so performance is still quite wounding.
 
Last edited by a moderator:
Actually, if you feel the need to explain to me why you think my posts are off-topic, you can feel free to PM me.

I feel it's entirely on point, since the entire discussion is about PS3 development issues, in the context of timelines, budgets and realworld issues.

If one of the premiere PS3 titles, which was delayed and launched with significant technical issues, then it is certainly relevant to the discussion at hand.

If you want to pretend these issues don't exist, be my guest. I think that attitude only gets in the way of a meaningful discussion however...instead of discussing WHY the title had these issues, you want to debate whether or not they actually exist. I don't feel that is really up for debate, since the VAST majority of user reviews I'm seeing cite these issues. But you've made your opinion known so you can leave it at that.
Well any frame rate issues in HS are irrelevant to what nAo and Joker are discussing, HS is not bottlenecked by RSX nor SPUs most of the time.

So any framerate problems you see aren't caused by shaders, fill-rate or triangle counts but by game play system running on the PPU.
 
So any framerate problems you see aren't caused by shaders, fill-rate or triangle counts but by game play system running on the PPU.

?!?!

What is so calculation-intensive in this game, given that you've talked about individual military unit AI on the SPUs?
 
It's not calculation-intensive, it's just processing-intensive. Gameplay tends to be branchy and have subpar cache hit rates, for example. The PPE isn't exactly the best processor out there for that kind of task, sadly. Its core clock is impressive, but IPC on gameplay code must be massively lackluster compared to a PC CPU.

Another thing to consider is that gameplay programmers tend not to be the best optimizers in the world. Otherwise, y'know, they'd tend to work on the engine instead... Sad but (often) true.
 
In future titles (some already do this) I'm sure you'll see more work being transferred to the SPUs, for this very reason. I'm guessing that for Heavenly Sword, that option came too late ... Or am I completely off the ball here?
 
Is this an insider info or are you making an assumption based on ICE team's involvement?
In any case, what's the geometry Uncharted is pushing and how are they using spu's?

Well, one could make a very good argument that the ICE team involvement with Edge IS a proof of SPU usage in Uncharted. After all they formed the ICE team foremost for their own needs and THEN shared a pared down version of their tools with the dev community.
 
I recall your difficulties with fitting ( debug ) code on SPUs and proposal to rectify it for the next iteration of Cell. Is there nothing that can be done for you now? ...

My idea was to have zero LS wasted for code. Pure hardware streaming, and the hardware takes care of suspending/resuming your "thread" etc, and latency can be managed via several hyperthreads, very similar to how GPUs handle Vertex/Pixel shader streaming.

One might be able to stream code "just in time" manually, but that might be very difficult to pull off. It is easier if you partition the code like Insomniac suggested and bring snippets with your data.

What do you need out of the C++ compiler before the generated code is optimal enough for you? (or at least acceptable enough for you to use it freely in abstraction)

Well, some issues are inherent to C/C++, such as aliasing problems (exemplified with the "this" pointer), but I think compilers can definitely improve more. Some of the issues are caused by the ABI as well, for example, the inability to return a Matrix class in a set of 4 registers. Touching memory on those PPC cores is a nightmare because of the rampant LHS/cache issues. On SPU it's much much simpler really, there, it's just getting GCC to output decent code, that's all. For now even Sony admits that C with Intrinsics is the way to go - the ICE team was quoted as saying they get 20x improvements compared to any vector abstraction.

Insomniac Games has at least proposed SPU "shader" have you or any of your team members contacted them on their progress?

You feel Sony needs to provide more high level libraries ( not to suggest there aren't any ) where do you think their focus should be?

I actually really like the openness that Insomniac show with their technology blogs. I work for a 3rd party developer and we very rarely get a glimpse of what the 1st party studios are doing, let alone get access to their tech.

I do believe A LOT of the developers can benefit from Sony delivering a healthy mix of high level libraries coupled with low level access when needed. Sony DID try that with the initial OpenGL implementation but nobody was happy because it run poorly. Then they switched to GCM which is very low level library, but unfortunately it was (last i checked) incompatible with the OpenGL layer, and hence fracturing the development efforts and having studios making a choice which way to go, way early on, possibly before they even knew what they needed.
 
Status
Not open for further replies.
Back
Top