Because NVIDIA NV4x implementation of it, is slower than a dead tortoise? As there no other card that supports its currently availible, there is little incentive to use vertex texturing.Mintmaster said:There's vertex texturing, but it doesn't look like anyone is too keen on using it in the near term. Not sure why, 'cuz there are plently of neat things you can do with it.
Pete said:Are we discussing the MS SM3 feature set, or nV's superset with FP blending and PCF?
I don't think it's a matter of the effects that can be created, but how one goes about rendering those effects. SM3.0 allows for certain things to be done much easier and in some cases potentially faster. HDR and other effects requiring the interaction of the results of multiple floating point render passes are good examples. Then there's the most basic example: handling a variable number of lights in a single render pass. They can all be done in SM2.0(b), but the equivalent implementation in SM3.0 is much more elegant.DiGuru said:Time for a new thread about this, with the ongoing confusion about it. So, is there a use for it yet that can't be done in SM 2.0b and that is actually used?
So far, FarCry makes use of the extra interpolated registers to support one more light source per pass in SM3 than it does with SM 2.0b. The performance difference of this will clearly vary depending on where the limits are in the scene.DiGuru said:Time for a new thread about this, with the ongoing confusion about it. So, is there a use for it yet that can't be done in SM 2.0b and that is actually used?
DeanoC said:Because NVIDIA NV4x implementation of it, is slower than a dead tortoise? As there no other card that supports its currently availible, there is little incentive to use vertex texturing.Mintmaster said:There's vertex texturing, but it doesn't look like anyone is too keen on using it in the near term. Not sure why, 'cuz there are plently of neat things you can do with it.
Of course with new hardware coming along, expect to see it used more and more.
Probably they 'slapped' vertex texturing support in their vertex shader engines without significantly addressing the latency problem. Vertex shaders without vertex textures don't have to hide huge latencies.Ailuros said:Doesn't vertex texturing come with a sizeable amount of latency anyway?
Probably they 'slapped' vertex texturing support in their vertex shader engines without significantly addressing the latency problem. Vertex shaders without vertex textures don't have to hide huge latencies.
If you think you're going to do a bunch of sampling, you need overlap those with lots of raw shader ops.
Well..this can be seen as a positive side effect, but if you have a relatevitely (to texture sampling latency) short shader with vertex texturing you're still going very slow.Ailuros said:I was under the impression that for vertex texturing in order to not stall the entire rendering process, one could take advantage of said latency and get a very high amount of instructions (not related to the texture fetch) nearly for free; and that entirely irrelevant to architecture or approach.
nAo said:Well..this can be seen as a positive side effect, but if you have a relatevitely (to texture sampling latency) short shader with vertex texturing you're still going very slow.Ailuros said:I was under the impression that for vertex texturing in order to not stall the entire rendering process, one could take advantage of said latency and get a very high amount of instructions (not related to the texture fetch) nearly for free; and that entirely irrelevant to architecture or approach.
That's why nvidia is not advocating vertex texturing for stuff like skinning (storing bones matrices in a texture..)
Our vertex shaders are quite simple nowadays, and just perform skeletal blending and linear interpolant setup on behalf of the pixel shaders. All of the heavy lifting is now on the pixel shader side -- all lighting is per-pixel, all shadowing is per-pixel, and all material effects are per-pixel.
Once you have the hardware power to do everything per-pixel, it becomes undesirable to implement rendering or lighting effects at the vertex level; such effects are tessellation-dependent and difficult to integrate seamlessly with pixel effects.
991060 said:complex shader which will run insanely slow on no-branching HW, such as relief mapping and heavily blured shadow mapping.
A stall mean VS is doing nothing, and that is bad. Moreover VS and PS are decoupled, so VS don't stall in order for the PS to complete, unless the buffers that sit between VS and PS are full of to be rasterized primitives.Ailuros said:Assuming a simplistic VS (even w/o vertex texturing) and a very complex PS, wouldn't a stall be already possible in order for the PS to complete?
The problem with actual VT implementation can be solved (as it already 'solved' or alleviated in PS!) spending more transistorsWould additional logic actually help in the end or will we see better results with future hardware and future APIs?
DiGuru said:991060 said:complex shader which will run insanely slow on no-branching HW, such as relief mapping and heavily blured shadow mapping.
Would you qualify 6x00 as branching HW, even if it would be slower overall using those branches than using the SM 2.0 solutions, like unrolling?
Unrolling can only be used for static branching. It's certainly not absolutely slower than SM 2.0 solutions, either.DiGuru said:Would you qualify 6x00 as branching HW, even if it would be slower overall using those branches than using the SM 2.0 solutions, like unrolling?
Chalnoth said:Unrolling can only be used for static branching. It's certainly not absolutely slower than SM 2.0 solutions, either.DiGuru said:Would you qualify 6x00 as branching HW, even if it would be slower overall using those branches than using the SM 2.0 solutions, like unrolling?
With SM 2.0 you have two options:
1. Execute all branches and use compare.
2. Multipass.
With the first situation, the NV4x will be faster for anything like conditional loops, or anything where you end up skipping a large number of instructions. For the second situation, the NV4x will be faster whenever you are geometry-limited (and you are more likely to become geometry-limited when doing multipass rendering, as the pixel shaders become shorter).
The driver couldn't do such a fine-grained decision in the shader pipeline. The hardware decides whether to run one or both branches.DiGuru said:Well, the last time we discussed this, we came to the conclusion that the dynamic branching of the NV6x00 is essentially your first method, with batches of about a 1000 pixels each. Only if all pixels in that area take the same path and this can be determined by the driver are the instructions skipped. Which doesn't sound very dynamic to me.