Anyone got a NV40...?

Tridam · May 31, 2004

Chalnoth said:
By the way, as far as the compiler is concerned, I really wouldn't expect much benefit for short shaders like this one. The benefit will be in longer, more complex shaders where latency hiding can be done.

Of course that's what I was talking about. I've tried to hide the latency of the texldd but it isn't possible with current drivers.

Xmas · Jun 1, 2004

Uh oh, i got a bad suspicion why it might be so slow, and why it's supposed to take two cycles on NV3x... actually, it makes sense. But it also makes texldd almost completely useless...

Damien, could you please do a comparison between bilinear and trilinear filtering (no AF, sampler 0 should be point or bilinear in both cases)?

edit: Oh, and make sure the s0 texture contains all zeroes.

Evildeus · Jun 1, 2004

Xmas said:
Uh oh, i got a bad suspicion why it might be so slow, and why it's supposed to take two cycles on NV3x... actually, it makes sense. But it also makes texldd almost completely useless...

Can you give us a bit more clues?

Xmas · Jun 1, 2004

texldd breaks quad coherence. Every pixel needs its individual sampling.

Damnit, why isn't there an instruction that takes "virtual" and "real" texture coordinates, and calculates LOD from the virtual coordinates? That would mean one LOD value per quad instead of per pixel.

991060 · Jun 1, 2004

Xmas, I think you should post on the directxdev mailing list or write email to nvidia's developer relation directly for faster/more accurate response.

Tridam · Jun 1, 2004

Xmas said:
Uh oh, i got a bad suspicion why it might be so slow, and why it's supposed to take two cycles on NV3x... actually, it makes sense. But it also makes texldd almost completely useless...

Damien, could you please do a comparison between bilinear and trilinear filtering (no AF, sampler 0 should be point or bilinear in both cases)?

I run the shader on a surface // to the screen so trilinear is never used.

I've tried with point sampling, bilinear. The fillrate is the same. I've also tried with FSAA. It's strange... 3% faster with fsaa 4x on...

Tridam · Jun 1, 2004

Xmas said:
texldd breaks quad coherence. Every pixel needs its individual sampling.

That could explains why it is slow. But can that explain why it is SO slow ?

Hyp-X · Jun 1, 2004

I have vague memories that someone measured that texldd takes 4 cycles on NV30, not 2.
And it would make much more sense - because breaking the quad results in 4x the work being done (even if it's not really needed).

But 9 cycles? That's just too long.

Xmas · Jun 2, 2004

NV30 has two TMUs per pipe, so it should be able to take four independent bilinear samples in two cycles.

Anyone got a NV40...?

Tridam

Xmas

Porous

Evildeus

Xmas

Porous

991060

Tridam

Tridam

Hyp-X

Irregular

Xmas

Porous

Similar threads