FP16 and market support

You know, I got a pair of legs. I can run a marathon with them. Doesn't mean I will enjoy running the marathon.

Man, this thread is getting more amusing by the page.
 
radar1200gs said:
The only fud around here is coming from the ATi fanboys - "FP16 is not supported by DX9", "Dawn is a DX9 showcase"...

The 5200 runs half-life2 - it may run it slowly, thats beside the point - if you want performance, buy a more expensive card, not a value/entry level card.

Once again I repeat... You have failed to show us the numbers... Show us the numbers of DX9 video cards out there. Prove to us that Nvidia is the top dog as you claim they are. If you don't then shut up with your FUD spreading from your anus already.
 
jvd said:
radar1200gs said:
The 5200 is capable of running HL2 at DX9 levels if the developer allows it to.

I'm sure that if Valve hardcodes 5200 & 5600 to DX8 with no option for the user to alter it, some enterprising person will devise a patch that tells the game the card is really a 5900 or whatever.

hl21.gif


Yea it runs it alright. 8fps . Thats with a 2.8 ghz cpu which is not entry lvl at all. Imagine what it will run like with a 2ghz cpu. Thats pathatic on nvidias part and yours for defending such a crap part. The 5600 should be in the price zone of the 5200 and even then i'd have trouble recommending it for any dx 9 code.

Yes, the 5200 is functional for DX9. Being functional does not necessarily imply good performance.

BTW: I never reccomended the 5200 in any post, simply stated that the non 64 bit variants are okay for the entry level market.

Personally, the lowest end discrete graphics part I sell to clients is the 5600 Ultra or GF4 4200. For entry level PC's I use IGP nforce1 (business) or IGP nforce2 (home/'net pc).
 
jvd said:
hl21.gif


Yea it runs it alright. 8fps .
You wouldn't run the 5200 at the same resolution as the other cards.

And you also wouldn't use "full precision."

Side note:
It cannot be that difficult to figure out what instructions can use FP16, and which require FP24-FP32. A naiive implementation would simply make all color data use _PP, and all other data use full precision. A bit more analysis may show other situations where you can use _PP. For example, _PP may work just fine for normals read off of a bump map.
 
I think those charts are w/o the mixed mode, aren't they?

In any case for the time being I'd still recommend a Ti4200 anytime over a 5200 or 5600. Not only isn't there any real gain from the added features of the latter two, you get clean anisotropic quality on the first.
 
Ostsol said:
Well, really any RGBA8 texture could be sampled into a FP16 register (then maniputed in FP32, I suppose).
Well, if the source data is only RGBA8, there would be no reason to use FP32 at all...
 
Chalnoth said:
Ostsol said:
Well, really any RGBA8 texture could be sampled into a FP16 register (then maniputed in FP32, I suppose).
Well, if the source data is only RGBA8, there would be no reason to use FP32 at all...
It depends entirely on the situation. . . In your own example of normal maps, one would be using the sampled pixel as a parameter in a lighting equation along with data not derived from an FX8 source, but from FP32/FP24 interpolated and normalized data. This is, of course, unless one is using an RGB8 cubemap for normalization (in which case basically all parameters of the equation are FX8), instead of a higher precision or simply performing arithmetic normalization. Also, if the texture is simply to be combined with other textures without any significant other operations or if that texture is simply to be modulated with the result of the lighting equation at the end of the shader, then just about any low precision -- even FX12 -- is fine. However, when used as a parameter amongst other data that's at FP32/FP24, running everything at a lower precision could result in a loss of potential quality.

EDIT: The way it's been sounding seems to indicate that you think that FP32 is necessary only for operations with an expectancy of high precision input data, such as dependant texture reads or render-to-vertex-buffer. Am I close?
 
Ostsol said:
EDIT: The way it's been sounding seems to indicate that you think that FP32 is necessary only for operations with an expectancy of high precision input data, such as dependant texture reads or render-to-vertex-buffer. Am I close?
I would say this is typically the case. Additionally, FP16 will be enough for most any calculation on color data. For example, if a specific normal map happens to have problems if you use FP16, so FP32 is used for most calculations involving that normal map, you could still go back to FP16 once color information is obtained.

The above would also be true if FX12 were exposed in DX9, and is true for OpenGL games using nVidia's proprietary extensions. You would just not be able to use FX12 for color data where you need high dynamic range (i.e. it'd be okay for diffuse lighting, but not specular).

In other words, what I'm trying to say is that with a few simple rules, one should be able to easily determine which instructions can use what precision, with no perceivable loss in image quality.
 
Chalnoth said:
In other words, what I'm trying to say is that with a few simple rules, one should be able to easily determine which instructions can use what precision, with no perceivable loss in image quality.
Would that not imply there is no need for precision modifiers, as it can be done automatically and easily by analysis?
 
Dio said:
Chalnoth said:
In other words, what I'm trying to say is that with a few simple rules, one should be able to easily determine which instructions can use what precision, with no perceivable loss in image quality.
Would that not imply there is no need for precision modifiers, as it can be done automatically and easily by analysis?

This would work in a high level language with type inferencing and numerical analysis in the compiler using a heuristic like "minimize error". The problem is, there is no standard for telling the HLSL shader what the precision of the input textures are and the precision of the output framebuffer.

The DCL shader instruction in DX9 only allows you to specify the input mask, and whether something is 2d, a cube, or volume. Similarly, there is no way, specified in the shader itself, what the desired output precision is.

This is not insoluable, but increases the workload for the driver/compiler and renders FXC even more impotent. The driver would have to do on-the-fly compilation of shaders based on the pipeline state (detect texture format being used and infer precision) and the output render target, and use that to reorder expressions and select instructions to minimize error.

Of course, this is just part of the ongoing debate over statically typed languages (C, C++, Eiffel, Java, etc) vs dynamically typed (ML, Haskell, scripting languages, etc) The typeless languages typically have more expressional power leading to more concise less buggy code (e.g. OCaml, Ruby, Haskell), but at the cost of compiler complexity and speed.

So yeah, they could get rid of _PP, hell, they could get rid of alot of stuff with type inferencing. You don't even really need destination write masks in a HLSL.
 
OpenGL guy said:
AlphaWolf said:
The 5200 runs half-life2 - it may run it slowly, thats beside the point - if you want performance, buy a more expensive card, not a value/entry level card.
I believe it was mentioned that the 5200's will be running hl2 on a dx8 path. As will the 5600. The 5900 will be running it with _pp.
I believe Valve said the 5900 will be running with a mixed DX8/DX9 mode.

There are two kind of "Mixed Mode" in HL2.

1. using PS 2.X with a mix of FP32/FP16.
2. using PS 1.1 for most effects and use the 2.x shader only for the better effects like water.

5200 and 5600 use the second 5900 the first "Mixed Mode". I am not sure about 5700 and 5800.
 
DemoCoder said:
This would work in a high level language with type inferencing and numerical analysis in the compiler using a heuristic like "minimize error". The problem is, there is no standard for telling the HLSL shader what the precision of the input textures are and the precision of the output framebuffer.

The DCL shader instruction in DX9 only allows you to specify the input mask, and whether something is 2d, a cube, or volume. Similarly, there is no way, specified in the shader itself, what the desired output precision is.

This is not insoluable, but increases the workload for the driver/compiler and renders FXC even more impotent. The driver would have to do on-the-fly compilation of shaders based on the pipeline state (detect texture format being used and infer precision) and the output render target, and use that to reorder expressions and select instructions to minimize error.
Well, it sounds very complex, but might be doable. When a texture is uploaded to DirectX the driver could detect the format and the minimum and maximum values and such to feed the pixel shader compiler. The shader compiler can analyse the code to see if a lower precision type can be used with almost no precision loss for the result. With enough correct data, automatic type selection might be possible without any perceivable loss of precision.

If nVidia manages to do something like that in their drivers, I will bow for their developers. ;)
 
sonix666 said:
DemoCoder said:
This would work in a high level language with type inferencing and numerical analysis in the compiler using a heuristic like "minimize error". The problem is, there is no standard for telling the HLSL shader what the precision of the input textures are and the precision of the output framebuffer.

The DCL shader instruction in DX9 only allows you to specify the input mask, and whether something is 2d, a cube, or volume. Similarly, there is no way, specified in the shader itself, what the desired output precision is.

This is not insoluable, but increases the workload for the driver/compiler and renders FXC even more impotent. The driver would have to do on-the-fly compilation of shaders based on the pipeline state (detect texture format being used and infer precision) and the output render target, and use that to reorder expressions and select instructions to minimize error.
Well, it sounds very complex, but might be doable. When a texture is uploaded to DirectX the driver could detect the format and the minimum and maximum values and such to feed the pixel shader compiler. The shader compiler can analyse the code to see if a lower precision type can be used with almost no precision loss for the result. With enough correct data, automatic type selection might be possible without any perceivable loss of precision.

If nVidia manages to do something like that in their drivers, I will bow for their developers. ;)

John Carmack seemed to be saying something very similar as to what nvidia were saying about their ARB_fragment_program performance. They might possibly be doing this, or might in the future?
 
sonix666 said:
Well, it sounds very complex, but might be doable. When a texture is uploaded to DirectX the driver could detect the format and the minimum and maximum values and such to feed the pixel shader compiler. The shader compiler can analyse the code to see if a lower precision type can be used with almost no precision loss for the result. With enough correct data, automatic type selection might be possible without any perceivable loss of precision.
This definitely looks like a fairly complex problem.

You really have very little knowledge at run-time of the usage of a texture - basically as far as I see it if you have two differing input precisions and want to guarantee the correct result that you really need to default to the higher of the input precisions.

Imagine an extreme case where I modulate a 32-bit floating point value with a 1-bit value. What should my precision be to make sure I don't affect the visual quality of the output?

It seems that it should be 32 bits - the 1-bit texture is exactly representing the values 0 and 1, all of the detailed information is coming from the high-precision texture and this needs to be preserved - if I use any precision lower than that of the high-precision texture then I am losing information that may eventually prove to be visible.

If I now increase the bit-depth of the lower-precision texture in the example above then I can't at the same time be decreasing the precision requirements, so in the most general case I may need to retain the highest precision of any set of inputs.

In the example above I actually might be able to sacrifice some precision unnoticably, but I can't do it as a general case without real knowledge of how the data is used - this knowledge just isn't usually available to a generalised compiler or to the driver, which have access to only a small part of the rendering puzzle at any given time, but only to the creator of the application software who has access to all of it.
 
Reverend said:
You know, I got a pair of legs. I can run a marathon with them. Doesn't mean I will enjoy running the marathon.
ROFL~~ :LOL:

And if you're in the kind of shape I'm in it doesn't mean you'll survive the marathon either! :LOL:
Man, this thread is getting more amusing by the page.
Totally agreed, it's fun when people choose impossible positions to argue and then insist on arguing them passionately...'specially at this place where everyone knows what the what is and doesn't put up with any FUD. :)
 
Dio said:
Chalnoth said:
In other words, what I'm trying to say is that with a few simple rules, one should be able to easily determine which instructions can use what precision, with no perceivable loss in image quality.
Would that not imply there is no need for precision modifiers, as it can be done automatically and easily by analysis?
Which you could do, but it is dangerous. There may still be problems with shaders in which errors accumulate, so that the problem isn't related to the input or output formats, or the dynamic range, but is rather due to recursive errors. A perfect example is a mandelbrot set.

Since recursive errors don't occur because any one instruction can't be completed "properly," but are rather due to small errors creeping in over time, these cannot be detected on a per-instruction level.

If such a shader pops up in a game, you would have two courses of action:

1. Spend a lot of time trying to figure out ways to reduce the recursive error from building up without increasing the precision, or only by increasing the precision in a few specific places.
2. Just make all instructions use max precision available.

Fortunately, I don't think this will be a problem for 99.99% of shaders used in games for a few years yet.
 
Yes, the 5200 is functional for DX9. Being functional does not necessarily imply good performance.

BTW: I never reccomended the 5200 in any post, simply stated that the non 64 bit variants are okay for the entry level market.

Personally, the lowest end discrete graphics part I sell to clients is the 5600 Ultra or GF4 4200. For entry level PC's I use IGP nforce1 (business) or IGP nforce2 (home/'net pc).
If you buy a card that says its dx 9 it better be able to play dx 9 class games. Otherwise its false advertising. Just as if you got a car and it wouldn't work on a road you'd be pretty damn pissed wouldn't u ?
 
Chalnoth said:
Dio said:
Chalnoth said:
In other words, what I'm trying to say is that with a few simple rules, one should be able to easily determine which instructions can use what precision, with no perceivable loss in image quality.
Would that not imply there is no need for precision modifiers, as it can be done automatically and easily by analysis?
Which you could do, but it is dangerous. There may still be problems with shaders in which errors accumulate, so that the problem isn't related to the input or output formats, or the dynamic range, but is rather due to recursive errors. A perfect example is a mandelbrot set.
So the point is, it's not just 'a few simple rules'.

By the way, I think you're misusing the word 'recursive'. One algorithm to generate a mandelbrot set is recursive; the algorithm used on current VPU's is not recursive.
 
jvd said:
If you buy a card that says its dx 9 it better be able to play dx 9 class games. Otherwise its false advertising. Just as if you got a car and it wouldn't work on a road you'd be pretty damn pissed wouldn't u ?
People buy Festivas (or Echos, or Geos) all the time. They're not (in my opinion) fit to drive on Texas freeways as their acceleration stinks and they're fragile. It doesn't make them any less of a car.

If you buy a cheap car, you get a cheap car.

Why you insist on calling it (the 5200) incapable is beyond me. It's quite plainly capable--just not at the speeds you (or most people) would desire it to be.
 
Back
Top