Is there any evidence to show NV40 can handle PS3.0 Well?

From what I read, they [3dlabs] stated they [P20] had PS 3.0 support but only VS 2.0 support.

EDITED: to clarify that I was talking about the 3DLabs card, which someone had asked about earlier...
 
With the exception of dynamic branching, the answer is yes, since PS3.0 and PS2.0 are very similar. With respect to dynamic branching, the answer is: it depends on what you are doing.
 
The Baron said:
Do we have any evidence that it handles it worse than any other PS3.0 cards? (Where are those Series 5 Shadermark benchmarks, damnit!)

I suspect the original question is more along the lines of "Is there any evidence PS3.0 is USABLE on this card, or is it the nv3x PS 2.0 equivilent." Having another card to compare to is irrelevant, 10 FPS is crap regardless of whether everyone else is doing 5 or 50.

What it boils down to... is this what FP32 was to the nv3x, a feature beyond ATI, or a checkbox feature in there so they look good? It's a good question, considering how many people on boards are already saying that PS3.0 support could determine their product purchase.
 
ps3.0 doesn't add much ontop of ps2.0.

1) much longer shaders. of course they run slower. a twice as long shader will be about twice as long at execution
2) some new functions like the derivation ops. they are for image quality only
3) some other new functions like the normalisation ops done directly in one instruction. they can give some performance gain
4) the dynamic branching stuff. that is the big feature, it makes the whole thing programable the way normal programs work. this can give gain, but it mostly gives FREEDOM.

look at nalu. thats one mesh. can be done in one drawing call (except the hair, i guess). this means we are finally where we where years ago with having one mesh having one texture. now one mesh can have one shader, and one rendercall again. this gives quite some features (combined with the batch-drawing shown in the space/asteroid demo, this means nalu style characters in thousands (same shading quality, that means).

but else, ps3.0 is nothing new, like ps2.0 was. it adds ontop of ps2.0, and uses the same hardware. this means, if ps2.0 is working good, ps3.0 will, too. the only thing that can differ in performance is the branching. but it will be only useful for big jumps anyways, and that means, quite long shaders. there, it should not be slow at all then.

anything i've forgotten?


oh, and.. wait for nv50, of course.. or bether nv60 directly, as they start to work on it now, and will of course remove all the bottlenecks and bugs of nv40 AND nv50!!!! BOAH, that has to be the perfect card.. *cough*
 
DarN said:
Yes, but does it handle it Well? ;)

The answer is "No" at least with the current drivers (60.72). PS3.0 support isn't exposed until DX9.0c is installed. I've done some quick tests for our review. Branching is very costly. So costly that it looks like the drivers are buggy. 9 pipeline passes for the simplest branching I was able to think about :

Code:
ps_3_0

dcl vFace

def c0, 0, 0, 0, 0
def c1, 1, 0, 0, 0
def c2, 0, 1, 0, 0


if_ge vFace, c0.x
  mov oC0, c1
else
  mov oC0, c2
endif


Caps with DX9.0c :
IMG0007683.gif
 
Tridam said:
DarN said:
Yes, but does it handle it Well? ;)

The answer is "No" at least with the current drivers (60.72). PS3.0 support isn't exposed until DX9.0c is installed. I've done some quick tests for our review. Branching is very costly. So costly that it looks like the drivers are buggy. 9 pipeline passes for the simplest branching I was able to think about :

Try branching on something besides the face register, like an Vn input register. 9 is not the minimum penalty.
 
Tridam said:
The answer is "No" at least with the current drivers (60.72). PS3.0 support isn't exposed until DX9.0c is installed. I've done some quick tests for our review. Branching is very costly. So costly that it looks like the drivers are buggy. 9 pipeline passes for the simplest branching I was able to think about :
Have You tried the 65.04's?
 
DarN said:
Tridam said:
The answer is "No" at least with the current drivers (60.72). PS3.0 support isn't exposed until DX9.0c is installed. I've done some quick tests for our review. Branching is very costly. So costly that it looks like the drivers are buggy. 9 pipeline passes for the simplest branching I was able to think about :
Have You tried the 65.04's?

Not yet.
 
The answer is "No" at least with the current drivers (60.72). PS3.0 support isn't exposed until DX9.0c is installed. I've done some quick tests for our review. Branching is very costly. So costly that it looks like the drivers are buggy. 9 pipeline passes for the simplest branching I was able to think about :

Code:
ps_3_0

dcl vFace

def c0, 0, 0, 0, 0
def c1, 1, 0, 0, 0
def c2, 0, 1, 0, 0


if_ge vFace, c0.x
mov oC0, c1
else
mov oC0, c2
endif

from 3dc here: http://www.forum-3dcenter.org/vbull...readid=137570&perpage=20&pagenumber=7

sowas soll man ja laut nVidia eigentlich auch mit Predication machen. Zudem steht im SDK das man vFace nur mit >0 bzw <0 vergleichen soll. Also if_gt oder if_lt. Ein if_ge ist Blödsinn an dieser Stelle. Ich vermute aber mal stark das hat der HLSL Compiler da hingesetzt. Bei mir macht er öfter auch solche Scherze.

rough translation:
according to nvidia you should use predication here. further the sdk states
you should only compare vface with >0 or <0, ie if_gt or if_lt. an if_ge doesnt make sense here. i stongly assume thats a failure of the hlsl compiler.
 
christoph said:
The answer is "No" at least with the current drivers (60.72). PS3.0 support isn't exposed until DX9.0c is installed. I've done some quick tests for our review. Branching is very costly. So costly that it looks like the drivers are buggy. 9 pipeline passes for the simplest branching I was able to think about :

Code:
ps_3_0

dcl vFace

def c0, 0, 0, 0, 0
def c1, 1, 0, 0, 0
def c2, 0, 1, 0, 0


if_ge vFace, c0.x
mov oC0, c1
else
mov oC0, c2
endif

from 3dc here: http://www.forum-3dcenter.org/vbull...readid=137570&perpage=20&pagenumber=7

sowas soll man ja laut nVidia eigentlich auch mit Predication machen. Zudem steht im SDK das man vFace nur mit >0 bzw <0 vergleichen soll. Also if_gt oder if_lt. Ein if_ge ist Blödsinn an dieser Stelle. Ich vermute aber mal stark das hat der HLSL Compiler da hingesetzt. Bei mir macht er öfter auch solche Scherze.

rough translation:
according to nvidia you should use predication here. further the sdk states
you should only compare vface with >0 or <0, ie if_gt or if_lt. an if_ge doesnt make sense here. i stongly assume thats a failure of the hlsl compiler.

That's a failure of myself when writing this yesterday ;) I tested with _gt for the review. Because it was slower than expected I also tested with _ge to see if it makes a difference. But the result was the same with gt and with ge.

The test was made to look at the cost of branching. It wasn't made to show a real use of branching.
 
Curious. If those caps for NV40 are real then Microsoft has reduced one of the requirements for PS3.0. In particular the MaxPShaderInstructionsExecuted value. This is what's in the 9.0b SDK

The MaxPShaderInstructionsExecuted cap in D3DCAPS9 should be at least 2^16.

Now, with 9.0c it would appear that has been reduced to 2^10.
 
Yes that's strange. I will check this again with newer drivers. That's probably a limitation of the 60.72 drivers.

BTW MS has introduced some limitations with DX9.c. For example : no dependant texture read (and ddx/ddy) into a branch.
 
DarN said:
Yes, but does it handle it Well? ;)


This thread pierces to the heart of the matter, I think. Those who say, "Yes, it does," are certainly being less than honest, because if they were honest they'd say, "I do not know if it does, since no one's done any investigative testing with ps3.0 code on an nV40 to date, but I assume it will handle it well, since it seems to handle ps2.0 fairly well from the available test data so far, and that suggests a probability of being able to do ps3.0 well."

Of course, using the ps2.0 API path in a 3rd-party mod for Far Cry, which will also run under ps3.0, does not in any way answer this question; especially, considering that nVidia purported visual differences between screen shots to illustrate a difference in render quality between 2.0 and 3.0, when in fact they demonstrated a render-quality difference between ps1.x and ps2.0, instead (according to Far Cry developers.) That deliberate prevarication, in my opinion, seems to add extra emphasis to the question of just what nV40 does with ps3.0, in reality.

I've noted several times that I think it unusual that nVidia would take this particular route (3rd-party Far Cry PS2.0 mod) to promote ps3.0 support in nV40, especially if nVidia considered that support to be an important distinction between ps2.x and ps3.0 support generally. Rather, I would have expected nVidia to have released several in-house demos along with the nV40 launch to illustrate the differences (as opposed to a 3rd-party mod for a game that actually demonstrates rendering differences between ps2.0 and ps1.x.) Since nVidia had nothing of this nature to offer upon the launch of the nV40 product, my own personal conclusion is that nVidia sees no tangible differences between ps3.0 and ps2.0 rendering quality, at least differences made possible with nV40, and is attempting to push nV40 ps3.0 support exclusively as a marketing tactic.
 
DemoCoder said:
Tridam said:
DarN said:
Yes, but does it handle it Well? ;)

The answer is "No" at least with the current drivers (60.72). PS3.0 support isn't exposed until DX9.0c is installed. I've done some quick tests for our review. Branching is very costly. So costly that it looks like the drivers are buggy. 9 pipeline passes for the simplest branching I was able to think about :

Try branching on something besides the face register, like an Vn input register. 9 is not the minimum penalty.
Nuts. I guess that kills the possiblity of NV40 implementing your idea of optimised branching with the vFace register.

BTW, does anyone know how you can use this for two sided lighting? Isn't the vFace register wrt the camera, i.e. only useful if backface culling is disabled? I guess transparent objects could do different inside/outside lighting, but that's all I can think of.

Tridam, I take it from your example that NV40's drivers don't choose between predication and branching, right?
 
It doesn't kill anything Mint. I asked him to try different registers because I'm interested to see if there is any performance difference between branching on a vertex output register,a "special" register (pos/face), and a temporary.

9 cycles is not the correct minimum, and it doesn't kill the usefulness either.
 
I think there are two questions to be asked..Maybe more..
1. How much of a performance increase will SM3.0 give on games like Farcry and others, who have just rewritten SM2.0 shaders to SM3.0?
2. Is this performance increase big enough to overcome any speed advantages ATI may or may not have with SM2.0 (We have already seen the 9800XT can outperform the 6800 when 8Xaa is used, I am sure X800's will do much better, so how much will it outperform 8600 with AA/AF enabled? and will this offset the speed increased offered by 6800 via SM3.0?
3. How much of SM3.0 is ATI actually going to support..Rumor has it all os VS3.0 and part of PS3.0..And are the parts ATI does support equal to what part of SM3.0 8600 can actually perform. In other words, the parts that ATI do not support, the 8600, although supports it, can not run it..And ATI saved precious transistors to use on beefing up its SM2.0 capabilities. If this is the case..the whole argumeant is rendered obsolete because both cards are able to do exactly the same thing. But which one is faster.
4. When Will SM3.0 be fully implimented..I mean even games like far cry are not going all out with SM2.0 yet. I would say it would take at least a year..as , correct me if I am wronge, Designing a game around SM3.0 is more difficult than just rewriting shaders. And also, Does nv40 have the power to run these shaders, Or R420 for that matter.

My personal opinion is that any patches or games released in the next year will be just rewritten SM2.0 shaders, and if ATI can get a SM2.0 performance advantage will be made a mute point. True sm3.0 games, where possible IQ advances and other things will matter, Will not be for a year/ year and a half. and will require a NV50/R500 to run them properly..Correct me if I am wrong? And I guess the answers to the above questions will not come out until R420/NV40 and DX9.0c hit the streets and we can actually put them to the test..
 
Back
Top