SM 3.0, yet again.

Frank · Mar 8, 2005

Xmas said:
DiGuru said:

Well, the last time we discussed this, we came to the conclusion that the dynamic branching of the NV6x00 is essentially your first method, with batches of about a 1000 pixels each. Only if all pixels in that area take the same path and this can be determined by the driver are the instructions skipped. Which doesn't sound very dynamic to me.

Click to expand...

The driver couldn't do such a fine-grained decision in the shader pipeline. The hardware decides whether to run one or both branches.

Ok. But Tridam's results showed, that the batches are expanded to cover the whole area that uses the shader if there are pixels that run the other branch, while there is still a penalty of 9 clocks for each branch instruction.

So, in how far wouldn't it be better to use a single (lineair) shader or multipassing if branches are actually taken? And if they aren't, why not use a different shader for that frame? That would save you at least 9 clocks per pixel.

I'm still trying to understand in what cases that would be useful.

Xmas · Mar 8, 2005

Tridam said:
The size of the batch is huge at least with current drivers (under 1024 quads both branches are always computed). The size seems to be at least 1024 quads but seems to be expanded to the number of quads that use the same branch.

Batch size is only increased if more quads take the same branch.

KimB · Mar 8, 2005

And then there's the demo that 99160 just posted that shows a dramatic performance improvement by implementing dynamic branching.

So it's clearly obvious that it is possible to extract a performance improvement through dynamic branching vs. "compute all paths and choose the correct result" path.

And it is further clearly obvious that any multipass technique that one chooses to use will require one to pass the geometry multiple times to the video card, and thus will not run nearly as fast as either of the above solutions in a geometry-limited case.

Dave Baumann · Mar 8, 2005

Given how much things are geometry limited in the first place (not particularly frequently!) it doesn't seem like there are that many occasions that are going to be geometry limited if the rendering quirements are asking for an extra pass.

John Reynolds · Mar 9, 2005

DaveBaumann said:
Given how much things are geometry limited in the first place (not particularly frequently!) it doesn't seem like there are that many occasions that are going to be geometry limited if the rendering quirements are asking for an extra pass.

And Dave, the perennial tease, has changed his sig again.

KimB · Mar 9, 2005

DaveBaumann said:
Given how much things are geometry limited in the first place (not particularly frequently!) it doesn't seem like there are that many occasions that are going to be geometry limited if the rendering quirements are asking for an extra pass.

Except that extra passes increase the geometry limitations, since each pass is inherently shorter than one single pass.

Dave Baumann · Mar 9, 2005

Yes, but you must have reached some other limitations to require that extra pass and in many circumstances those are likely to be more limiting than a geometry pass.

KimB · Mar 9, 2005

Like what? And why?

After all, here's a specific instance where you could get geometry-limited through multipassing very quickly:

Imagine Humus' stencil-based dynamic branching multipass demo that he did some time ago. The basic idea is that you don't render if the pixel in question is some distance away from the light source. This rendering is done in two passes per light source (1. Check for visibility. 2. Render visible pixels.)

Now, in this situation, if you are even getting close to geometry-limited in, say, a 4-light scene, this technique multiplies the required geometry rendered by a factor of eight, for approximately the same total pixel processing (as true dynamic branching), and therefore is rather likely to reduce performance.

DemoCoder · Mar 9, 2005

It's not just extra geometry, it's bandwidth as well in many cases, since the latter passes may have to use stencil/z-test or blending to combine results.

Frank · Mar 9, 2005

Has anyone done some more testing in the last months? I would really like to see the results of those tests. That might be better than speculating.

nAo · Mar 9, 2005

It's nice to note that suddenly to have SM3.0 support is cool again, LOL

Frank · Mar 9, 2005

nAo said:
It's nice to note that suddenly to have SM3.0 support is cool again, LOL

Did you check the dates?

nAo · Mar 9, 2005

DiGuru said:
Did you check the dates?

No, I didn't. I can just smell it in the air..

digitalwanderer · Mar 9, 2005

nAo said:
It's nice to note that suddenly to have SM3.0 support is cool again, LOL

Not to me, I still don't see much need/use for it.

nAo · Mar 9, 2005

digitalwanderer said:
Not to me, I still don't see much need/use for it.

Just wait the reviews.. new spin is coming

DemoCoder · Mar 10, 2005

Yep, full reverse spin I bet. Once ATI has SM3.0, I think all the SM3.0 naysaying will disappear, and all of a sudden a whole crop of "only possible with SM3.0" scenarios will appear. And people in the past who were exclaiming no big deal between sm2.0b and sm3.0 will suddenly be at the head of the bandwagon, especially if ATI's performs better. For example, if ATI's dynamic branching performs better, then dynamic branching support will suddenly be an achilles heel, despite the fact that previously, it wasn't, and the real life scenarios where it was used were few and far between. Now, such support will be seen as *crucial*

opcorn mode engaged:

(who remembers how horrible it was to waste *two* MB slots and how terrible it was not to support small form factor pcs, until.....)

ANova · Mar 10, 2005

I know I won't, at least not unless ATI's solution manages to show something actually worthwhile pertaining to SM3's use, which nvidia certainly hasn't done yet.

AndrewM · Mar 10, 2005

ANova said:
I know I won't, at least not unless ATI's solution manages to show something actually worthwhile pertaining to SM3's use, which nvidia certainly hasn't done yet.

That is exactly what DemoCoder was talking about.

Humus · Mar 10, 2005

Chalnoth said:
Like what? And why?

After all, here's a specific instance where you could get geometry-limited through multipassing very quickly:

Imagine Humus' stencil-based dynamic branching multipass demo that he did some time ago. The basic idea is that you don't render if the pixel in question is some distance away from the light source. This rendering is done in two passes per light source (1. Check for visibility. 2. Render visible pixels.)

Now, in this situation, if you are even getting close to geometry-limited in, say, a 4-light scene, this technique multiplies the required geometry rendered by a factor of eight, for approximately the same total pixel processing (as true dynamic branching), and therefore is rather likely to reduce performance.

A factor of 8 is quite stretching it. If you're limited by vertex fetch, then you can in some cases get near that. In normal situations where the shader is of decent length and you're more limited by the vertex shader, you won't get anywhere near a factor of 8. First of all, the visibility pass is very cheap, it's more or less just transform. Compared to the lighting shader it's short. In my demo it's maybe half the instructions of the lighting shader. For more advanced lighting the relative cost of the visibility shader goes down even further. Also, it's not like you can pop my lighting vertex shader right into a ps3.0 dynamic branching case. The vertex shader needed for that case of course gets larger since you still have to do computations for all lights and pass to the fragment shader. So it's not cutting the workload with 75%, but rather closer to like 30%-50% or so. So realistically we're not talking about a factor of 8 but rather something in the range 2-3.

Geo · Mar 10, 2005

Dave Orton in May of 2004 said:
I think the main feature that people are looking at is the 3.0 shader model and I think thatâ€™s a valid question. What we felt was that in order to really appeal to the developers who are shipping volume games in â€™04 Shader 2.0 would be the volume shader model of use. We do think it will be important down the road.

I think ATI has been pretty consistent in their message. . .its the hallelujah chorus that at times has been off message.

SM 3.0, yet again.

Frank

Certified not a majority

Xmas

Porous

KimB

Dave Baumann

Gamerscore Wh...

John Reynolds

Ecce homo

KimB

Dave Baumann

Gamerscore Wh...

KimB

DemoCoder

Frank

Certified not a majority

nAo

Nutella Nutellae

Frank

Certified not a majority

nAo

Nutella Nutellae

digitalwanderer

nAo

Nutella Nutellae

DemoCoder

ANova

AndrewM

Humus

Crazy coder

Geo

Mostly Harmless

Similar threads