(nAo, I was more interesting in continuing our discussion of using AA to increase shadow map speed/resolution, so if you reply to my last post on that subject I'd really appreciate it.)
What I wrote many times on this forum is that simply throwing more triangles with the given ratio between primitives and pixels we have now it doesn't really look the most sensitive thing to do, expecially given the quad based architetures we have now.
...
I'd be happy to throw all those subpixels polys where it's needed, not just anywhere, thank you
...
So, one last time, my statement is: We really don't need more geometry than this IF WE COULD distribute it in a clever way.
I understand that, but there's no escaping the nature of 3D workloads. Consider visible edges are only 10 pixels long, which means you still have room for improvement for high frequency details. The triangles near these edges will have very few pixels since they're so angled, but the ones viewed head on are pretty big from a quad point of view. This is why higher polycounts near silhouettes (i.e. your intelligent distribution idea) isn't going to achieve that much better quad efficiency than higher polycounts everywhere, as the triangles at the edge are the ones that really hurt efficiency in the first place. I'll admit that culling/clipping gets more efficient, but it seems you're focusing on quad efficiency in this post.
I'm not saying increase poly count for the heck of it. There's a point where smooth surfaces are smooth enough, and for R&C type art we don't need high tesselation. However, any details that affect visibility have no other solution (except alpha testing in limited circumstances), particularly if you want to avoid aliasing. It's inefficient for quad-based rendering, but there's really no other choice.
Sorry if I remind you again about this but you were not believing me even when I was telling you that decoupling shadowing computations from other shading operations was a big win due the current quad based architectures in very low pixel/primitive scenarios..current architectures are already very inefficient, too bad I can't quote numbers.
Yeah, it took a while for that to sink in. :smile: After all, doing more shader ops per pixel to save on pixel load is a little hard to swallow at first.
In light of antialiasing, though, it's still questionable whether that's a good way to do things in general (by that I mean deferring computations to preserve quad-level efficiency). Sure, for N sample PCF you can distribute the shader load across the samples like in KZ2. But most shaders (including VSM) can't do that, and if you start looking at which samples are equal for selective supersampling, you're back to square one wrt efficiency.
Being able to perform a shader op once and copying the result to all samples in the quad affected by the current polygon is a good route to efficiency. Trying to be clever and increasing parallelism through more complicated shaders is not the way to go IMHO.
We heard AA was free..blending was free, 95% efficiency, etc.. (as we heard on RSX about amazing 128 bit HDR and crap like that..)
So blending isn't free? Aside from imperfect seperation during tiling and the additional quads, AA isn't free? That's news to me.
EDIT: Oh, you're talking about blending FP10/I16, aren't you. Yeah, that's a shame...
We heard about RSX having half or a quarter vertex shading perf of Xenos, well...as I already said I think there are already 2 or 3 games on the shelves that kind of disprove these statements , but what do I know?
Well that's absolutely true, but vertex shading is rarely the bottleneck now, is it...
Maybe you're talking about triangle setup. But Joker didn't measure that.
Isn't that the real bottleneck most of the time? If he has 10M verts per frame counted the way that you're describing, that likely means ~10M tris/frame, right?
BTW, I'm curious about how fast RSX can cull/clip vertices. If we know peak setup is 250Mtri/s in the simplest case, why can't someone tell what the culling/clipping rate is? I originally assumed it was the same because many people (including myself) consider culling/clipping to be part of setup.
I'd like to see some games doing SAT-VSM on PS3 via SPUs
Wouldn't 16 texture fetches per pixel be rather ugly for an SPU?