R420 Hints from people under NDA!

DemoCoder said:
I take issue with this, because dynamic branching is a performance hit, but 95% of the time you don't need it (most branches in shaders are small, and predicates work fine) But when you absolutely do need it, it's a big performance win, and it sucks not to have it. Utilizing dynamic branches is about knowing when to use them.

I would fully expect you to be able to use dynamic branches effectively but I have reservations that the majority of programers would be so careful. Is it possible that dynamic branches would , for a large part, just make for lazier programers who write fewer but larger shader programs to the detriment of performance consideration? Or am I off base here?
 
nelg said:
I would fully expect you to be able to use dynamic branches effectively but I have reservations that the majority of programers would be so careful. Is it possible that dynamic branches would , for a large part, just make for lazier programers who write fewer but larger shader programs to the detriment of performance consideration? Or am I off base here?

Oh come on... The coding on the PC-port of Halo was so superb. What makes you think they'd screw up a shader?
 
hmmm said:
nelg said:
I would fully expect you to be able to use dynamic branches effectively but I have reservations that the majority of programers would be so careful. Is it possible that dynamic branches would , for a large part, just make for lazier programers who write fewer but larger shader programs to the detriment of performance consideration? Or am I off base here?

Oh come on... The coding on the PC-port of Halo was so superb. What makes you think they'd screw up a shader?

sarcasm?

I believe coders are becoming lazier, but that does not mean good coders are going bad becouse their tools are better, and easier to use now.
 
compres said:
sarcasm?

I believe coders are becoming lazier, but that does not mean good coders are going bad becouse their tools are better, and easier to use now.

Yeah, I was being a little facetious. Something that runs well on an Xbox shouldn't, at least in my mind, bring a 3GHz PC with a R9800P to its knees.

The point is branching is like fire for the moment. It is cool; it can really help, but it can also really burn. Eh, bad coding can kill anything, so this isn't specific to branching shaders.
 
nelg said:
I would fully expect you to be able to use dynamic branches effectively but I have reservations that the majority of programers would be so careful. Is it possible that dynamic branches would , for a large part, just make for lazier programers who write fewer but larger shader programs to the detriment of performance consideration? Or am I off base here?
Well, ideally, static branching, which is what one would use to pack many shaders into one, would not cause any performance hit. It may, in fact, increase performance, as it may reduce state changes (it depends on whether or not the pipeline can start selecting another branch without a state change). This is provided that static branching incurs no per-pixel performance hit. Hopefully this is the case, and will be a very important thing to benchmark when the 6800 hits the shelves.

Dynamic branching is the potentially dangerous form, even if implemented well in hardware. But, given time and improved compilers, one can hope that the drivers will select the most optimal way to execute a dynamic branch: either execute all possibilities and select the final result, or actually do the branch at some performance penalty.

Executing all possibilities will be a performance win for small branches (where the developer really should have done this anyway in the shader, and simply used a compare or predicate register instead of a true dynamic branch), whereas doing a dynamic branch will be a performance win if enough instructions are skipped (though textures may throw a wrench into performance....depending).

Anyway, a very important thing to notice, of course, is that the performance of any branching will be heavily hardware-dependent. Branching is not an easy problem to solve, but it is one that will need solving. Compilers will be especially important in using the correct branching algorithm to get the job done in the most optimal manner.
 
WaltC said:
anaqer said:
Richard Huddy said:
Steer people away from flow control in ps3.0 because we expect it to hurt badly. [Also it’s the main extra feature on NV40 vs R420 so let’s discourage people from using it until R5xx shows up with decent performance...]

I'd say that's enough confirmation of no PS3.0 support in R420...

Was this quote lifted from some "leaked" document that nobody can verify actually originated with ATi?

this is from a powerpoint presentation with personal notes.

The PP presentation was directly pulled off ATI's site - IMHO (!) its genuine. I dont buy in the 'planted' theory. The stuff was pulled off ATi's site a bit later when they found out someoen leaked it :)

I ask because the statement itself seems contradictory and illogical: if "flow control" is something ATi thinks "hurts badly" in regard to performance

There have been sooo many comments regarding this....BUT ok, have one more...

If you already say "flow control hurts [performance ????]" then (agre with me) FLOWCONTROL [in R420] must be existent !!!
If it wouldnt exist, then it's performance can not be HURT

BUT....i have to agree....these notes are VERY hard to read and can be seen MANY ways...thats the problem.

He could also talk froma general persepctive (note: he's talking to coders) knowing that 3.0 flowcontrol hurts *in general*.

Then we cant use this sentence as proof that R420 has SOME FORM of flow control. He might just give advice to coders not to use it since it pushed down performance.

Also......the "..discourage people from using it until R500 shows up with decent performance"

COULD mean R420 has SOME form of it (but SLOW) and 'performance' could be a comparison to R420 - but this ALSO could just be a general comment that R500 will be so advanced that its faster than what its right now.

The only HALFWAY decryptable statement is that "NV40 will have the extra feature of flow control over R420" - wich kind of contradicts above thoughts that R420 has SOME form of flow control.

My personal GUESS (and nothing more):

R420 has some from of PS3.0 flow-control via drivers but NOT in hardware. There might be ways to 'convert' the 3.0 shader code when its loaded. Some driver feature.
THUS - would make the card ITSELF 100% ps3.0 compliant since it would be able to run PS3.0 shader code.
As i said: My own opinion :)

It would make sense with the powerpoint-notes,so you can read the notes that he makes a reference to the "'bad performance of flow ontrol on R420"...but it is THERE..and that R500 will be faster in that respect.

Again..this is all very tricky and just another example of "null-information" - means the more you read the more contradicting infomation you get, the less you know :)
 
flexy said:
R420 has some from of PS3.0 flow-control via drivers but NOT in hardware. There might be ways to 'convert' the 3.0 shader code when its loaded. Some driver feature.
THUS - would make the card ITSELF 100% ps3.0 compliant since it would be able to run PS3.0 shader code.
As i said: My own opinion :)

The biggest reason as for why i doubt that it has PS3.0 is that the minimum FP precision has been changed to FP32 (i'm basing the changed part on discussions here were lot of people said that FP24 would be the standard for SM3.0). And my "based on no facts" theory is that they changed it because the R420 doesn't support SM3.0 and then they (as in MS) might have thought that it was just as good to raise it up to FP32. All this of course assumes that the R420 is still FP24.
 
The biggest reason as for why i doubt that it has PS3.0 is that the minimum FP precision has been changed to FP32 (i'm basing the changed part on discussions here were lot of people said that FP24 would be the standard for SM3.0). And my "based on no facts" theory is that they changed it because the R420 doesn't support SM3.0 and then they (as in MS) might have thought that it was just as good to raise it up to FP32. All this of course assumes that the R420 is still FP24.

note that the ATI PP presentation mentions that the 3.0 spec requires FP32, and in the *same* presentation he says that flow control is the ONLY main extra feature NV40 has over R420.

Can we [do they] consider FP32 over FP24 *another* main extra feature ? If yes, then it [Fp32] must be in R420 because he says NV40 lacks only the flow control :)
 
flexy said:
Can we [do they] consider FP32 over FP24 *another* main extra feature ? If yes, then it [Fp32] must be in R420 because he says NV40 lacks only the flow control :)

My guess would be that Ati doesn't count it as a feature but NVidia surely does (in this case at least) :)
 
Bjorn said:
The biggest reason as for why i doubt that it has PS3.0 is that the minimum FP precision has been changed to FP32 (i'm basing the changed part on discussions here were lot of people said that FP24 would be the standard for SM3.0). And my "based on no facts" theory is that they changed it because the R420 doesn't support SM3.0 and then they (as in MS) might have thought that it was just as good to raise it up to FP32. All this of course assumes that the R420 is still FP24.

Yes, I agree.

If ATI had made it clear to MS that they would not have PS 3.0 capability until they move to FP32, then it would make sense for MS to require FP32 for PS 3.0, to raise the bar.

This would make even more sense if there's another vendor (like Img Tech), who is planning on a FP32 SM 3.0 part.
 
flexy said:
Can we [do they] consider FP32 over FP24 *another* main extra feature ? If yes, then it [Fp32] must be in R420 because he says NV40 lacks only the flow control :)
There are many other things that are probably lacking in the R420:
1. FP16 blending/texture filtering (this is a big one, as it allows the first true HDR rendering).
2. Facing register (generalization of the two-sided stencil feature: basically allows one to compute one shader for front-facing polygons, and another shader for back-facing polygons...will reduce the number of passes for certain algorithms).
3. Predicate register (Allows a programmer to switch on and off certain instructions via a boolean value, operates as another form of flow control).
4. Arbitrary Swizzling (can reduce the number of instructions needed for some shaders).
5. Gradient Instructions (Allow custom texture filtering on textures where normal filtering won't work: procedural textures, for example).
6. Dependent Read Limit (May allow fewer passes for some extreme cases: probably won't be used in games commonly, if ever, but would be nice for non-realtime stuff).

....this is assuming the PS 2.0b is the target that the R420 will use (which I currently find about 99.5% certain).
 
Chalnoth said:
flexy said:
Can we [do they] consider FP32 over FP24 *another* main extra feature ? If yes, then it [Fp32] must be in R420 because he says NV40 lacks only the flow control :)
There are many other things that are probably lacking in the R420:
1. FP16 blending/texture filtering (this is a big one, as it allows the first true HDR rendering).
2. Facing register (generalization of the two-sided stencil feature: basically allows one to compute one shader for front-facing polygons, and another shader for back-facing polygons...will reduce the number of passes for certain algorithms).
3. Predicate register (Allows a programmer to switch on and off certain instructions via a boolean value, operates as another form of flow control).
4. Arbitrary Swizzling (can reduce the number of instructions needed for some shaders).
5. Gradient Instructions (Allow custom texture filtering on textures where normal filtering won't work: procedural textures, for example).
6. Dependent Read Limit (May allow fewer passes for some extreme cases: probably won't be used in games commonly, if ever, but would be nice for non-realtime stuff).

....this is assuming the PS 2.0b is the target that the R420 will use (which I currently find about 99.5% certain).

How can you say what is probably lacking????

P.S. isnt R3xx already capable of HDR while NV3x was not or did I miss something??
 
Just refresh my memory here please.

I was of the thought that the R3xx was 32bit for a large stage of its internal structure but only the final stages of the PS were 24 bit. If this is indeed the case would the change in transistors be that great for them to have not moved to being fully 32bit?
 
Stryyder said:
How can you say what is probably lacking????
Read the last line of my post.

P.S. isnt R3xx already capable of HDR while NV3x was not or did I miss something??
HDR in current cards is done through tricks. You can render a HDR image on the NV40 essentially just like you would a normal image, by using a FP16 framebuffer and FP16 textures.
 
Headstone said:
I was of the thought that the R3xx was 32bit for a large stage of its internal structure but only the final stages of the PS were 24 bit. If this is indeed the case would the change in transistors be that great for them to have not moved to being fully 32bit?
All pixel shader processing is done at FP24.

All vertex shader processing is done at FP32.

The majority of the FP units on the R3xx are FP24.

Just because the pixel shader comes after the vertex shader doesn't mean it takes up less of the core.

As for transistor count, I really don't know. That depends entirely upon how much of the core is taken up by the math units themselves. Given what we know of the NV40, it's probably close to 20% or so.
 
Back
Top