David Kirk interview

991060

Regular
http://firingsquad.gamers.com/hardware/kirk_interview/page1.asp

FiringSquad: Could you give us specific examples of where maybe you feel you guys, you mentioned you guys can do some things better than they can, can you give us some specific examples of that?

Kirk: Well one example is if you’re doing geometric calculations with reflections or transparencies and you need to do trigonometric functions. Our sine and cosine takes two cycles theirs takes eight cycles, or seven cycles I guess. Another example is if you’re doing dependant texture reads where you use the result of one texture lookup to lookup another one. There’s a much longer title time on the pipeline than there is in ours. So it just depends on the specific shader and I feel that for the calculations I mentioned are pretty important for effects and advanced material shaders and the types of materials that people use to make realistic movie effects. So they will get used as developers get more used to programmable GPUs and we’ll have less of a performance issue with those kinds of effects.
 
991060 said:
http://firingsquad.gamers.com/hardware/kirk_interview/page1.asp

FiringSquad: Could you give us specific examples of where maybe you feel you guys, you mentioned you guys can do some things better than they can, can you give us some specific examples of that?

Kirk: Well one example is if you’re doing geometric calculations with reflections or transparencies and you need to do trigonometric functions. Our sine and cosine takes two cycles theirs takes eight cycles, or seven cycles I guess.


well, they at least should make some facts clear before making any claims based on guesses. :LOL:
 
991060 said:
can anyone explain it to me why sin/cos calculation and dependent read are much faster on the FX ?
FX has native support for sin/cos, the R3x0 not - the ati driver will convert sin/cos to some polynomial (taylor expansion) I believe so the chip has to execute 7/8 instructions.
There was some recent discussion about dependent texture reads here, and Kirk is probably wrong on this one.
 
thanks,mczak
could you give a short explaination about the dependent read behaviour on FX and Radeon. It's really not that easy to locate those threads.
 
Well the paragraph:
I think that people are finding that although there are some differences there really isn’t a black and white, you know this is faster that is slower between the two pieces of hardware, for an equal amount of time invested in the tuning, I think you’ll see higher performance on our hardware.

Doesn't harmonize with Valve's five times the effort claim.

As for trig functions. There are alot of advantages to doing them in software:

1.) Save silicon as opposed to a dedicated unit. All you need are a few constants
2.) Trade off precision for speed. Do fewer iterations to get more speed but with lower precision.
3.) Calculating sin x and cos x (note: same argument)only takes a few percent (~25%) more than calculating one (9.3 vs 7.3 cycles in this paper, -on the Itanium).

Cheers
Gubbi
 
I don't know where the misconception came that NV3x is better at dependant texture reads than R3xx. Here's something I posted in another thread (slightly modified), but I think the thread was already dead, and not many people bothered to read it:

... the problem with your statement is that a lot of data shows NV30 as having very poor dependent texure performance.

Remember Ilfirin's benchmark? NV3x was about 1/8 of R300's performance . The register limitation will probably come into play here, but looking at the original version of MDolenc's fillrate tester, before he made the shader more complex, NV3x did quite well with ordinary shaders.

The most convincing evidence of NV3x's poor dependent texturing is PS 1.4 benchmarks, like ShaderMark, HL2, 3DM03, 3DM2001. It seems like NV3x still has NV2x's register combiners, and does PS 1.1 effects with them to keep performance high (although who knows what's happening with ATI insane performance in ChameleonMark). However, PS 1.4 effects, which generally involve arbitrary dependant texture reads (or else can be made into PS 1.1) must be run through the regular PS pipeline, and slow down a lot on NV30, even though they use fixed point.

Sure, NV30 has no limit on dependant texture reads, but how often will you need more than 4 levels of dependancy? I find it's quite rare to even need 2 levels, which runs well on R300 according to ATI's optimization guide. 0 levels is most common by far, and 1 level seems to be popping up in many new games for water surfaces. Besides, there is also multipass.

sireric's post in that thread points out that even 3 or 4 levels of dependancy are handled well.

Anyone think we should ressurect that thread with Ilfirin's test using the Det50's? Reverend suggested it at one point in there.
 
Mintmaster said:
I don't know where the misconception came that NV3x is better at dependant texture reads than R3xx. Here's something I posted in another thread (slightly modified), but I think the thread was already dead, and not many people bothered to read it:

... the problem with your statement is that a lot of data shows NV30 as having very poor dependent texure performance.

Remember Ilfirin's benchmark? NV3x was about 1/8 of R300's performance . The register limitation will probably come into play here, but looking at the original version of MDolenc's fillrate tester, before he made the shader more complex, NV3x did quite well with ordinary shaders.

The most convincing evidence of NV3x's poor dependent texturing is PS 1.4 benchmarks, like ShaderMark, HL2, 3DM03, 3DM2001. It seems like NV3x still has NV2x's register combiners, and does PS 1.1 effects with them to keep performance high (although who knows what's happening with ATI insane performance in ChameleonMark). However, PS 1.4 effects, which generally involve arbitrary dependant texture reads (or else can be made into PS 1.1) must be run through the regular PS pipeline, and slow down a lot on NV30, even though they use fixed point.

Sure, NV30 has no limit on dependant texture reads, but how often will you need more than 4 levels of dependancy? I find it's quite rare to even need 2 levels, which runs well on R300 according to ATI's optimization guide. 0 levels is most common by far, and 1 level seems to be popping up in many new games for water surfaces. Besides, there is also multipass.

sireric's post in that thread points out that even 3 or 4 levels of dependancy are handled well.

Anyone think we should ressurect that thread with Ilfirin's test using the Det50's? Reverend suggested it at one point in there.

False AFAIK.
NV3x is capable of FX12, while PS1.4. needs FX16.
So in theory, they'd need to use FP32 there if they want to adhere to DX regulations. I'm not sure whether they did so at that time, however.


Uttar
 
Uttar said:
False AFAIK.
NV3x is capable of FX12, while PS1.4. needs FX16.

No, for temp registers FX12 is sufficient.
The higher requirement [-8;8] is for texture registers that are floating-point in NV3x anyway.
 
Gubbi said:
3.) Calculating sin x and sin x (note: same argument)
Yeah.... even I can think of an optimisation for that one:)

Perhaps you meant cos x ;)
 
Uttar said:
False AFAIK.
NV3x is capable of FX12, while PS1.4. needs FX16.
So in theory, they'd need to use FP32 there if they want to adhere to DX regulations. I'm not sure whether they did so at that time, however.


Uttar

If NVidia is demoting registers and constants in PS 2.0 shaders down to FP16 and FX12 in their "optimizations", why on earth wouldn't they do fixed point for PS 1.4? If this wasn't the case, why would NVidia mention making PS 2.0 shaders into PS 1.4 in their HL2 rebuttal? I seriously doubt they were so short sighted as not to include simple bitwise shift modifiers (like _x2, _x4, _d2, _d4) to scale the -8...+8 range to their -2...+2 range and back. Most of those modifiers are part of PS 1.1 anyway.

I think R2xx have 16-bit precision much of the time, but I, like HypX, don't think its mandatory for PS 1.4. Not that NVidia cares that much about adhering to spec anyway.
 
overclocked said:
Good read and no pr-shit. Would be nice too see the new Dets debut soon.
I think you meant "nothing but pr-shit".

Here's an example:
David Kirk said:
If you look at processors FP24 doesn’t exist anywhere else in the world except on ATI processors and I think it’s a temporary thing. Bytes happens in twos and fours and eights -- they happen in powers of two. They don’t happen in threes and it’s just kind of a funny place to be.
I guess no one told ATI that bytes come in twos and fours and eights!
Certainly one of the choices that Microsoft could have made is that it has to be 32 or nothing. They could have also made the choice that it has to be 16 or nothing. So it’s just kind of unfortunate for the whole industry, people really want to have predictable precision and predictable results
What's "unpredictable" about FP24? Is it inheriently chaotic?
I think that people are finding that although there are some differences there really isn’t a black and white, you know this is faster that is slower between the two pieces of hardware, for an equal amount of time invested in the tuning, I think you’ll see higher performance on our hardware
He's assuming that people are actually taking time to optimize for the R300 platform. Even John Carmack has stated that he is investing a lot more time optimizing for the NV3x (custom codepath) than the R300 (generic code path).

Need I go on?

-FUDie
 
Everything I read from nvidia these days just inflames me more and more not to buy their hardware. every response still has the pr generated feel to me.
 
Nappe1 said:
well, they at least should make some facts clear before making any claims based on guesses.

This was a phone interview and Firingsquad did ask a very specific question... You can't blame Kirk for not knowing a specific answer on the spot to a direct question like that. What I find interesting about this kind of interview is that it probably has the least amount of PRization :) of all the other Kirk interviews. At least he tried to answer it.
 
FUDie said:
What's "unpredictable" about FP24? Is it inheriently chaotic?

If I could hazard a guess, I'd say exactly what he said. FP24 is found nowhere in the professional graphic/visualization industry, nor in the hardware computing realm.

It defies basic mentality and (AFAIK) isn't an IEEE standard like FP16 or FP32. Why couldn't they just support industry standards like FP32 instead?

Infact, I question why FP24 (instead of an interdisciplinary standard like IEEE-754, baselined @ FP32 for DX implimentations) would even be supported by Microsoft in an ideal world free of politic. Unless I was the only one who was thinking, "WTF... FP24?!" when it was announced. But, that's a story for another day.
 
Back
Top