Offical clarification about the PS & VS in NV30!

alexsok

Regular
http://www.cgshaders.org/forums/viewtopic.php?t=236

One of the NVIDIA guys replied to me:

To reiterate:

vertex processing: 256 instr with data-dependent branching and subroutines

fragment processing: 1024 instr without data-dependent branching and subroutines


And let's all try to be civil, please.

Thanks -
Cass

So there u go! :D

P.S
I can't belive that I went through the other Siggraph 2002 paper released by Nvidia about the architecture and never noticed the following:

Vertex processor has high resource limits
256 instructions per program (effectively much higher w/branching)
 
Here is something intresting as well:

http://www.nvmax.com/cgi-bin/community.pl?num=1027942595

I have recieved many emails and comments about the NV30 article we posted recently and I would like to assure all readers that information posted is not speculation, we followed NVIDIA's guidelines 'to a T' and did not post any information that would violate the NDA we signed to get it.

I tried registering to post at the Beyond 3D forums to validate our claim that NVIDIA's new 'whoop ass' shading engine can indeed handle 1024 static instructions per operation, although we can not confirm the amount of operations per second.

The following came directly from NVIDIA and should therefore be absolutely accurate. As it happens I would bet on it as a 'Cinematic' GPU will require a powerful shading system to create some of the images that were posted at nvnews. Without being bias and without releasing performance related data I can only say that this is going to be one hell of a GPU, being that we have some performance comparison graphs here.

NVIDIA says... "Vertex Shaders beyond DirectX 9 - up to 1024 static instructions, with up to 65536 instructions executed in loops, branches and subroutines."

"Pixel Shaders beyond DirectX 9 - up to 1024 instructions. Long programs for both pixel shading and vertex shading."

UPDATE: Cass @ NV says it was an error in the document we recieved and if fact it is 256 static instructions, where 1 static instruction contains 64 sub instructions. I shall correct the article to reflect this.
 
So basically your saying that there are upto 256 instructions that can be branches or other condition/flow control Ops, each of which can choose to execute an additonal 64 none conditional vertex operations?

wouldn't that give a total of 64*256 (16384) instructions in the vertex shader?

Am I missing something here?

Just seems to me that NVidia would be advertising the larger number if this were true.
 
ERP said:
So basically your saying that there are upto 256 instructions that can be branches or other condition/flow control Ops, each of which can choose to execute an additonal 64 none conditional vertex operations?

wouldn't that give a total of 64*256 (16384) instructions in the vertex shader?

Am I missing something here?

Just seems to me that NVidia would be advertising the larger number if this were true.

I'm not qualified enough to give u a decent answer on this one but I think that u are correct, but that number is probably with branching, as mentioned in that other paper Nvidia released.

Please correct me if I'm wrong (which I'm sure I am! :) )
 
Again, from the same thread, a great reply explaining many things:

1) i think yes, the main reason is what i told you. there isn't much use of vertexprocessing at all for shading and all that. transforming takes 4 instructions. the most of the shading capabilities now relie in the pixelshader, wich makes much more sence. the whole lightingequation now works in the pixelshader, no vertexshader needed anymore.
the only use for the vertexfunction now is transforming. (mainfunction). and with this transforming i mean skinning, blending etc. for animation of the mesh. that needs real branching, but with branching and looping it doesn't need much instructions (again, 4 instructions for the actual transformation:))

2) for what you need the 1024 in vertexshading? the rest, see 1)

3) again, i don't really understand your formulation (just english problems;)). i'll try to answer anyways to what i think i understand: programs like 3dsmax, languages like renderman all are great in one thing: realistic materials. and how are they done? procedural, to 99%. they use noise, complex math functions, etc to shade their stuff to make it look like brick walls, wood, water surfaces, metall, glass etc. these are REALLY complex but are needed for realistic, stunning visuals. and for implementing them, you need much instructions. they provide that much instructions to make even offline rendering attractive on the gpu. i dont think looking up 1024 textures with anysotropic trilinear filtering on a 512x512x512 3d texture does work yet with 100fps, but simply the possibility to use that power gives renderers like for example for 3dsmax the possibility to expose ANY shader they want.

and if its not enough, most real materials are actually 2, or more.
take wood. most wood has a thin lack over it, wich happens to act like glass. take a pc screen. actually you have 2 or 3 glasses in front of the actual image. cars have twolayered materials, skin does have it even. exposing them takes much instructions, as you need to do complex lighting equations, complex material equations and combinations of them..

oh, and then you want those materials even animated?.. even more instructions;)

other posibilities: you can supersample, gives you the possibility to scatter trhough clouds to do the whole lightingequation of full3d clouds on a simple 2d texture, so you update that texture, and bind this then for the actual image (you don't need to update that often..). you can render real 3dtextures volumetric with the supersampling..
 
wouldn't that give a total of 64*256 (16384) instructions in the vertex shader?

The 64 is probably referring to the length of subroutines and loops.

Thus, a while (true) { do 64 things } is completely valid; however, the program will stop execution after 65k instructions so as to not cause your computer to hang.
 
gking said:
wouldn't that give a total of 64*256 (16384) instructions in the vertex shader?

The 64 is probably referring to the length of subroutines and loops.

Thus, a while (true) { do 64 things } is completely valid; however, the program will stop execution after 65k instructions so as to not cause your computer to hang.

Taken from Nvidia's paper:

Provides greater flow control: dynamic loops and branches provide for forward and
backward changes in flow; call and return functions have been introduced, and
vertex processing can also invoke an early exit on program termination.

That's basically what u meant, right?
 
I think I get it now....

Your vertex shader can be made of up to 256 instructions.

Taking looping and routine calls, the total number of instructions that vertex shader hardware can execute is 16384 instructions.

Sort of strange if you ask me.

-Colourless
 
alexsok said:
[
Taken from Nvidia's paper:

Provides greater flow control: dynamic loops and branches provide for forward and
backward changes in flow; call and return functions have been introduced, and
vertex processing can also invoke an early exit on program termination.

That's basically what u meant, right?

To me that means that you can terminate a shader early if a condition arises. For example: You're going round a loop which is slowly converging on an answer. Part of your loop checks to see if the answer changed by more than 0.01% this time round the loop. If it didn't you assume that you've finished converging and don't bother with the rest of the iterations. You "Exit Early"

The 65K limit is more likley to try to make sure that a program can't be loaded that will hang the processor.

e.g.:

while (true) {
flibble;
}

will never exit. Oh dear, your graphics chip wont respond to anything else. Sounds like NV-30 will abort such a program after 65K cycles.
 
fragment processing: 1024 instr without data-dependent branching and subroutines

So does this mean that NV30's pixel shader won't have data-dependent branching? Or does it just mean data-dependent branching will consume instruction slots unlike the vertex shader? This statement isn't very clear at all...
 
There is no dynamic branching, per se, in the pixel shader. Apparently what you can do is instead execute all branches (which makes loops much harder to do), and at the end of processing select which branch gets to output to the framebuffer.

So, there is effective dynamic branching, but it's not as flexible, and you don't have as many instructions to play around with.
 
So does this mean that NV30's pixel shader won't have data-dependent branching? Or does it just mean data-dependent branching will consume instruction slots unlike the vertex shader? This statement isn't very clear at all...

In many cases, if you write your shader using Cg, you won't really notice this limitation.

if, for example, you have a statement like this:

if ( myVector > myOtherVector) { doSomething; }
else { doSomethingElse; }

The compiler will use per-component write masking to execute this, and the results will be what you expect. However, both branches will always be executed (even if all 4 conditions are true, or all 4 conditions are false). If doSomething and doSomethingElse are long statements, then the amount of time it takes to execute a shader will noticeably increase.

Recursion and data-dependent loops are not supported.
 
Back
Top