T&L Via Vertex Shader Programs

Dave Baumann · Apr 7, 2003

I just read the following in The Tech Reports GFFX 5800 Ultra review

[url=http://tech-report.com/reviews/2003q2/geforcefx-5800ultra/index.x?pg=6 said:
Tech-Report[/url]]The FX excels here, probably because old-school T&L runs on a single vertex shader unit, and the FX's higher clock speed becomes more of an asset.

Does anyone know the basis for this quote, or whether it holds truth?

Reverend · Apr 7, 2003

Your thread header is a little confusing based on Tech Report's statement. You can have DX7 TnL or you can have vertex shaders.

If I am not wrong, I believe an app (with or without vertex shaders, the latter of which I gather to be what TR refers to as "old school TnL") can be configured to run on as many, or as little, vertex shader units as available in detected hardware.

LeStoffer · Apr 7, 2003

Reverend said:
Your thread header is a little confusing based on Tech Report's statement. You can have DX7 TnL or you can have vertex shaders.

Or you can have both.

But I still haven't seen any evidence that the GeForce FX have a hardware DX 7 T&L unit besides the vertex processor array.

On the other hard the very high legacy T&L performance is no coincident in my book since it's a key point in Quadro FX family. So I would asume that nVidia put a lot of effort into making sure that those vertex processor could handle legacy T&L at top speed for the professional OpenGL-based programs. (More so than ATI did with the R300 anyway).

demalion · Apr 7, 2003

I'm not sure technical meaning can be taken from it. It looks like he was trying to get across that if the GF FX had less capable vertex units, but the same "amount" as the R300, the clock speed advantage would allow it to lead in simple fixed function vertex processing.

Seems like an incomplete thought rather then technical commentary that might indicate something.

I do think that the issue with the vertex array parallels the fp32 pixel processing, and that intermediate register usage (i.e., as used for general vertex processing but not for fixed function T&L) limits the amount of processing that can be done, but I don't see how that comment releates significantly (did I miss something else earlier or later in the article?).

K.I.L.E.R · Apr 7, 2003

You sure old school T&L ONLY uses 1 VS under ALL circumstances?

If so then the NV30's VS efficieny will come into play. The clock speeds will mainly contribute to efficieny.

KimB · Apr 7, 2003

Reverend said:
Your thread header is a little confusing based on Tech Report's statement. You can have DX7 TnL or you can have vertex shaders.

Well, to be perfectly accurate, you can't use both at the same time, but can use them together to create a frame, by using different methods with different geometry and/or different passes.

fresh · Apr 7, 2003

The FX has both a fixed function pipeline and a programmable one (vertex shaders). The FFP is obviously going to be much faster.

The R250 (and up) "emulates" the FFP with vertex shaders. I think that's a better/smarter design. You're not going to see too many FFP games anymore, and the older games which do use the FFP will run plenty fast with emulated FFP.

LeStoffer · Apr 7, 2003

fresh said:
The FX has both a fixed function pipeline and a programmable one (vertex shaders).

Fresh, where did you get this confirmation from? I'm asking because we had this speculation about the FFP before, but besides the benchmark numbers we never got any evidence on that being the case.

fresh said:
The R250 (and up) "emulates" the FFP with vertex shaders. I think that's a better/smarter design. You're not going to see too many FFP games anymore, and the older games which do use the FFP will run plenty fast with emulated FFP.

True, but the NV30 is also about the Quadro FX which BTW seems to be the most succesful part of the NV3X line up for it's intended users.

Pete · Apr 7, 2003

I thought it was common knowledge NV had both fixed and programmable shader hardware in the NV30? It seems to me Dave's Q is whether fixed-function TnL is limited to a single shader, which the NV25 seems to indicate is not the case (why include dual vertex shaders if one will sit unused?). Damage may well have been misinformed, or, as is more likely, I'm misunderestimating my ability to gather info from B3D's forums.

KPixel · Jul 30, 2003

LeStoffer said:
fresh said:

The FX has both a fixed function pipeline and a programmable one (vertex shaders).

Click to expand...

Fresh, where did you get this confirmation from? I'm asking because we had this speculation about the FFP before, but besides the benchmark numbers we never got any evidence on that being the case.

I relaunch this topic for one question :

Is it true, or not, that all NV3x don't have FFP and emulate it via the VS ?

Demirug · Jul 30, 2003

Every time I can here the same question some time after a new chip is released.

"The T&L output is too good if it is calculate with an vertexshader. There must be an Hardware T&L (FFP) in the chip"

There answer is Yes and No at the same time.

If you take a look at the patent that described how the FFP in NV10 works you can see that it is allready a kind of vertex shader. OK it have many restriction about what you can do but at least it is programmable.

The "real" vertex shader is an extended version of the NV10 vertex processing unit. All they old possibilities are still there. That is the reason why you get done the same work each clock if you only use FFP. If you use a real vertex programm the driver can not use the vertex processing unit in such an good way.

I see no good reason why nVidia should have change this strategie with the NV3X produktline.

KPixel · Jul 31, 2003

Demirug said:
If you take a look at the patent that described how the FFP in NV10 works you can see that it is allready a kind of vertex shader. OK it have many restriction about what you can do but at least it is programmable.

The "real" vertex shader is an extended version of the NV10 vertex processing unit. All they old possibilities are still there. That is the reason why you get done the same work each clock if you only use FFP. If you use a real vertex program the driver can not use the vertex processing unit in such an good way.

I'm not sure we are talking about the same thing :
By VS, I mean the unit that run shaders from DX8 (VS1.1), DX9 (VS2_0), ect. the NV10 is a DX7 card, so we can't speak about VS...

When I say emulation of the FFP, I mean when we say pd3dDev->SetTextureStageStuff() ... then Render(), the driver transform all states in a Vertex Program (in VS1.1 or like) and run it.

Demirug · Jul 31, 2003

KPixel said:
Demirug said:

If you take a look at the patent that described how the FFP in NV10 works you can see that it is allready a kind of vertex shader. OK it have many restriction about what you can do but at least it is programmable.

The "real" vertex shader is an extended version of the NV10 vertex processing unit. All they old possibilities are still there. That is the reason why you get done the same work each clock if you only use FFP. If you use a real vertex program the driver can not use the vertex processing unit in such an good way.

Click to expand...

I'm not sure we are talking about the same thing :
By VS, I mean the unit that run shaders from DX8 (VS1.1), DX9 (VS2_0), ect. the NV10 is a DX7 card, so we can't speak about VS...

When I say emulation of the FFP, I mean when we say pd3dDev->SetTextureStageStuff() ... then Render(), the driver transform all states in a Vertex Program (in VS1.1 or like) and run it.

I was talking about what happend inside the chip. The NV10 use for the vertex processing a unit that is similar to a vertex shader unit. But you can not call it an DX8 VS 1.1 because it is not as much programmable as required. The vertex processing unit that is used in NV2X is an improved version of the NV1X vertex processing unit that is programmable enough. Because all necessary features are there nv can claim that they have an VS 1.1. There is always an difference between the API model an the real hardware. The driver is responsible to translate between this two worlds.

The â€œSetTextureStageStuffâ€ is for the pixel processing so I am not sure why you use this here as an example.

You can translate the configuration of the DX7 Hardware T&L unit to an DX8 vertex shader program. Normally this is not an good solution in a driver. If you first build a vertex shader program you always need a second step to translate this program to the micro codes the hardware can execute. If you translate direct from the DX7 configuration to micro code you will have a faster driver and you have more information during this translation that will help you to build better micro code.

The point here is that the vertex processing unit is not build like the picture you can see in the DX documentation. In the real chip there are many different function units. If you use the unit as a vertex shader the driver is most time not able to use all functions units at the same time if you use the old DX7 T&L interface for this unit the driver can program the whole vertex processing unit in a much more effective way that allows to use the different function units at the same time.

KPixel · Jul 31, 2003

Well, Sorry if I wrote "SetTextureStageStuff", I was thinking about transformation and ligthning infos...

http://www.beyond3d.com/articles/nv30r300/index.php?p=5#vpu said:
Itâ€™s sure that VS unit in NV30 must have a complex and optimized architecture too, because NVIDIA claims that older VS1.0 and VS1.1 programs go faster too. More detail would require additional information from NVIDIA, though it is almost impossible that NV30 holds on the antique hardware T&L unit.

So, I'm looking for a confirmation.

Arun · Aug 1, 2003

Is it true, or not, that all NV3x don't have FFP and emulate it via the VS ?

You actually make an interesting point there: "All NV3x".
Well, there MUST be a fundamental difference between NV30/NV35 and NV31/NV34 and, maybe, NV36.

Because the NV30 & NV35 get a HUGE boost when using traditional T&L. And the NV31 & NV34 simply don't get any ( a french hardware site showed that very well a few months ago )

So, it seems obvious what nVidia did there with the NV30/NV35 cost them transistors, otherwise it'd still be there in the NV31/NV34.

What IS possible is that the Vertex Shader is "stealing" FP32 Pixel Shading power, maybe there's some silicon in the Pixel Shader units to be able to do T&L.

Uttar

T&L Via Vertex Shader Programs

Dave Baumann

Gamerscore Wh...

Reverend

LeStoffer

demalion

K.I.L.E.R

Retarded moron

KimB

fresh

LeStoffer

Pete

Moderate Nuisance

KPixel

Demirug

KPixel

Demirug

KPixel

Arun

Unknown.

Similar threads