FP16 and market support

sonix666 · Jan 1, 2004

YeuEmMaiMai said:
do I really have to make you look like an arse?

...blah blah blah...

YeuEmMaiMai, as John said, take your fanboyism somewhere else. You have no clue what you are talking about. Your whole 'technical' stance on FP16/FP32 is solely based on bias.

With your last post you have made yourself look like a ass:

1) Because you resort to attacking a well respected member of this forum.

2) Because your whole counterattack about image quality has NOTHING to do with FP16 or FP24 or FP32. Unreal Tournament 2003 does NOT use pixel shaders with version 2.0.

KimB · Jan 1, 2004

YeuEmMaiMai said:
do I really have to make you look like an arse?

OK well here goes

GFFX max IQ settings
http://www.hardocp.com/image.html?image=MTA0MzYyMDg1OTVjVVNkMzFISXhfNV8xM19sLmpwZw==

notice that the FPS counter reads 27 current and average fps

Radeon 9700Pro max settings
http://www.hardocp.com/image.html?image=MTA0MzYyMDg1OTVjVVNkMzFISXhfNV8xNF9sLmpwZw==

notice that the FPS counter reads 62 current and average fps

You're also comparing a supersampling FSAA mode to a multisampling FSAA mode. That's not exactly a fair performance comparison.

Tim Murray · Jan 1, 2004

Chalnoth said:
You're also comparing a supersampling FSAA mode to a multisampling FSAA mode. That's not exactly a fair performance comparison.

As often as I think Chalnoth is a fanb0y, I have to agree with him in this regard, sorta--he's wrong that comparing the two as a measure of image quality is wrong (because it's each card's max IQ), but he's right (indirectly) that FSAA has absolutely nothing to do with this. Those [H] screenshots aren't even in the same fucking place.

It's alarming how often the fanb0ys scream about Kyle and Tom and how biased they are and then turn around and use their articles as proof of the argument. Then again, the existence of fanb0ys is alarming, too.

Dave, John, Anthony--you guys ever considered a private forum similar in scope to the 3D Industry forum where you could remove the access of the fanb0ys? It sure would be nice sometimes.

Ostsol · Jan 1, 2004

Chalnoth said:
YeuEmMaiMai said:

do I really have to make you look like an arse?

OK well here goes

GFFX max IQ settings
http://www.hardocp.com/image.html?image=MTA0MzYyMDg1OTVjVVNkMzFISXhfNV8xM19sLmpwZw==

notice that the FPS counter reads 27 current and average fps

Radeon 9700Pro max settings
http://www.hardocp.com/image.html?image=MTA0MzYyMDg1OTVjVVNkMzFISXhfNV8xNF9sLmpwZw==

notice that the FPS counter reads 62 current and average fps

Click to expand...

You're also comparing a supersampling FSAA mode to a multisampling FSAA mode. That's not exactly a fair performance comparison.

His idea probably is that one should compare framerates where the quality, if not the settings, is equivalent. Of course, the problem is that one can't really say that any particular settings on either card (save for no FSAA and no aniso) are equivalent since they do things in quite different ways.

Reverend · Jan 2, 2004

The Baron said:
Dave, John, Anthony--you guys ever considered a private forum similar in scope to the 3D Industry forum where you could remove the access of the fanb0ys? It sure would be nice sometimes.

Private forum? But then like many, you'll feel left out!

Tim Murray · Jan 2, 2004

Reverend said:
The Baron said:

Dave, John, Anthony--you guys ever considered a private forum similar in scope to the 3D Industry forum where you could remove the access of the fanb0ys? It sure would be nice sometimes.

Click to expand...

Private forum? But then like many, you'll feel left out!

Aw, Rev, don't feel bad! You can come too if you're on your bestest behavior!

(then again, if S3 actually has good MSAA (read: ATI level), I might become an S3 fanboy...)

DemoCoder · Jan 2, 2004

Reverend said:
Reading back a few pages, I read about determination about X or Y
IMO, it would be INSANE for any compiler to try to do automated error analysis and reduce precision where it thinks the results aren't significant.

I already explained why it is a "hard' problem, but it is not a theoretically uncrackable problem with the right semantics added to the API/Language, and an interval arithmetic compiler.

Precision isn't something a compiler should mess with at all. As a programmer, I want everything precisely specified by me and me alone, as either 32-bit or 64-bit floating point (IEEE).

Well, that's you. Different programming languages exist because different people have different preferences about how much how keeping they'd like to do. Some people don't even like the compiler choosing what instructions "to mess with" and write everything directly in assembly. Most people like SQL and regular expressions because they don't have to tell the DB imperatively how to satisfy queries. It depends on what you like to do.

Personally, I like to use scene graph libraries because I don't really care about managing those datastructures.

Those dynamically typed languages are completely deterministic and predictable when it comes to evaluating basic arithmetic operations and user-defined functions constructed from them. The compiler doesn't go in and arbitrarily decide to evaluate some expression with a different precision or data type; though the precision isn't statically known, it is deterministically derived from the dynamic types attached to your data.

No, a type inferencing functional compiler has the same problems with the "range and statistical nature" (as you put it) of I/O related input data. The only way to avoid this is to have the error ranges specified with the input data. Otherwise, a compiler must make the conservative choice, which is to choose the highest precision of a given type (integer or floating point) Some compilers insert code to do type inferencing at runtime, so that for example a factorial() function will use a native INT for any results < 2^32, but then switch over to an arbitrary precision Integer object afterwards.

This is not different from a hypothetical shading language compiler (we're not talking DX9 here, but a future language) that has range information specified by the developer.

A DX9 driver could at best, do conservative type promotions. E.g. 8-bit * 8-bit promoted to 16-bit, etc. For the vast majority of DX8 this would be a win. For many DX9 shaders, it would also be a win, since most people are still working with 8-bit art work.

DemoCoder said:
DemoCoder said:

The core issue is multipass. The compiler can estimate error in a single pass, but how can it estimate ACCUMULATED ERROR without saving it in an additional buffer (E-Buffer? Error Buffer?) for every pixel? Perhaps it would use MRT or some kind of packing to store the error to pick it up in later passes.

Click to expand...

Omigod! Argh!

If you're going to do deferered rendering by shoving the pipeline state into a fat buffer, why not shove a few range variables too?

Look, no one is suggesting this actually be done, since it super-burdens the compiler/driver developers and developer too. All this just points out why types (e.g. _PP) are neccessary and why Dio's comment was not fully thought throuugh. Shaders should not be typeless. The developer should maximize the amount of semantic information given to the driver, so that it has the most amount of information to work with.

p.s. you might want to look at this paper for the technique being applied to SIMD computations on Intel (MMX)
http://www.acm.org/sigs/sigmm/MM98/electronic_proceedings/pollard/

YeuEmMaiMai · Jan 2, 2004

OK here goes.

1. NV3X is not comparable to R300 is totally BS Nvidia claimed that they were gonna blow every one away with cinematic computing.....

GFFX is released compared to R300 when running in DX9 full percision it is 1/2 as fast as R300. John carmack states that when you force NV30 to run the STANDARD ARB2 path the R300 appears to be twice as fase. So please tell me where I am wrong in stating that when performance is similar NV30 IW is inferior....

2. NV35 comes out and while there is some inprovement in FP32 speed Nvidia still forces FP16 and you guys want to compare that to FP24 saying that FP16 is sufficient while FP24 is not.... when NV35 is running full percision, you are lucky if you can get 70% of the R350's performance

Now we have developers going back and saying "holy crap it takes us forever to program for NV3X"

R300 runs DX9 code right out of the box without having to resort to PP hints

and yes those screen shots are taken in nearly the same locaiton (they are off by a few pixels.

I put my money where my mouth is by backing up what I said with various benchmarks.

Futuremark's 3dmark scores are better on my radeon 9500 Pro than they are on NV3x when you force the drivers NOT TO CHEAT.......now that is sad...... my score is 4125 Nv30 was 37XX

so tell me something why is it that you guys keep insisting that FP16 is acceptable? oh that's right IQ is not as important as speed is...

sorry guys FP16 is lower IQ since the minimum spec called for is FP24 without partial percision. maybe next time Nvidia will not storm out like a child when the next version of DX is developed.......

sonix666 said:
YeuEmMaiMai said:

do I really have to make you look like an arse?

...blah blah blah...

Click to expand...

YeuEmMaiMai, as John said, take your fanboyism somewhere else. You have no clue what you are talking about. Your whole 'technical' stance on FP16/FP32 is solely based on bias.

With your last post you have made yourself look like a ass:

1) Because you resort to attacking a well respected member of this forum.

2) Because your whole counterattack about image quality has NOTHING to do with FP16 or FP24 or FP32. Unreal Tournament 2003 does NOT use pixel shaders with version 2.0.

K.I.L.E.R · Jan 2, 2004

I thought UT03 used PS 1.4 at most on cards which support PS 1.4?

FP16 is lower precision than FP32, it doesn't necessary mean "low IQ". FP16 can be acceptable under most circumstances anyway.

Hellbinder · Jan 2, 2004

It's alarming how often the fanb0ys scream about Kyle and Tom and how biased they are and then turn around and use their articles as proof of the argument. Then again, the existence of fanb0ys is alarming, too.

Baron....

When people do Good work its good work. When they post a Bunch of Biased Bull**** its Biased Bull****. Its about Method and process not just the results.

I am commenting on this because of a recent comment similar to this you made at Nvnews when i posted a link to Toms and Nordic hardware Videocard Roundups.

The Bottom line is the Results shown in those roundups are the HONEST and ACCURATE results that have been true and should have been the ones causing people to make their Purchasing decisions from the Begining. Not the usual use of Apps and Benchmarks and Levels everyone KNOWS Nvidia is Pulling every trick they can get away with yet does not reflect actual in game performance or experience.

Oh well... I guess i have never been much for "Private" Clubs anyway ...

Ailuros · Jan 2, 2004

K.I.L.E.R said:
I thought UT03 used PS 1.4 at most on cards which support PS 1.4?

FP16 is lower precision than FP32, it doesn't necessary mean "low IQ". FP16 can be acceptable under most circumstances anyway.

Yes on both accounts (for the time being for the latter, yet in the future and in the cases where FP16 will not be adequate, FP24 won't most likely be not enough either).

----------------------------------

YeuEmMaiMai,

A simple example where you obviously want to see only half of the picture would be here:

John carmack states that when you force NV30 to run the STANDARD ARB2 path the R300 appears to be twice as fase.

Carmack's specific statement wasn't concentrated on performance alone; he also commented on image quality differences between different modes. I'd urge you to re-read the full statement.

If the differences in a game like Doom3 are miniscule between different accuracy depths, then it's senseless to torture a specific accelerators performance for nothing. Riding over it makes to me as much sense as the R3xx 5bit LOD precision.

FXs are by the way yielding better performance due to stencil op performance with the special NV30 path.

----------------------------------------------------------------

I think it's time that someone sits down and writes an educated article about different floating point formats, implementations and what not. Ideally even with an attempt to analyze where and why each format is required and what the differences would look like.

I don't think that there's anyone with half a brain that cannot acknowledge that the R3xx family has an advantage in terms of arithmetic efficiency; yet that doesn't mean that the FXs are completely worthless either. In fact wouldn't anti-aliasing quality take such a high persentage in my own preferences I'm not so certain I'd own a R3xx today.

radar1200gs · Jan 2, 2004

I think the best way to look at multi-precision rendering is that the lower formats such as FP16 allow a developer to focus higher formats such as FP32 where they are most needed in a frame at a lower total computing/resource cost as compared to rendering the whole lot at one high precision.

This is is exactly how .jpg and .mp3 work their magic too. by cutting the quality imperceptibly in areas where it isn't needed they can maintain high image/sound quality where it is required and still have a tiny footprint (computing/resource cost) as compared to a .bmp or .wav file containing the same thing.

bloodbob · Jan 2, 2004

No that analogy is compeletly false. Mp3 and jpegs are lossy formats they store data that sort of looks similar its like in a shader replacing a=exp( 1 + x ); with a=1+x because where x is small they look pretty similar. This is what nvidia used to do with the pixel shaders in 3dmark and other apps thankfully they don't do this anymore.

I think the best way to look at multi-precision rendering is that the lower formats such as FP16 allow a developer to focus higher formats such as FP32 where they are most needed in a frame at a lower total computing/resource cost as compared to rendering the whole lot at one high precision.

This is what VBR mp3 is analogous ( contrasting VBR with CBR/ABR ( this isn't contrasting to PCM encoded wav ) ) to because you can similar quality to a ABR/CBR mp3 of a slighty higher bitrate. The parts that need that handle the compressions the best get compressed more and that parts that come out bad with compressions are compressed the least. You spend the most bits on the hard to compress data and make up the slack by saving bits where you can get away with it.

So in my example

128kbit CBR mp3 = full fp32
96kbit VBR mp3 = 1/2 fp32 + 1/2 fp16
96kbit CBR mp3 = full fp24
64kbit CBR mp3 = full fp16

Okay you get a cd encode them with all those setting and rank them
You will find that the order I listed them in is the order of the quality.

Now the VBR one won't always be 96 kbit per second exactly it will change slight depending on the mp3 jsut like some shaders you will have more full precision then others.

The problem is developers like CBR but they don't like the crap quality of 64 CBR mp3s so they go for full precision because to work out which bits of code need full precision and which bits they can get away with only fp16 takes alot of time and effort and is trail and error to some degree. In the right circumstances nvidias cards can do the same amount of work as the ATI cards yet come out with better quality ( though they might do less work per second then ati ) but the developers have to put in a lot of time and effort.

radar1200gs · Jan 2, 2004

I never meant to imply shader replacement. Your example was probably better than mine.

as long as the final pixel color is the same, if you output an fp16 calculated pixel instead of an fp32 calculated pixel, you have saved a lot of processing power and bandwith. For every 2 fp16 pixels output you get 1 fp32 pixel "for free" or 2 more fp16 pixels to add detail to the scene with.

This gives you the ability to add a lot more detail to the scene than you could have afforded using fp32 alone, at no (or very little IQ cost to the scene).

Reverend · Jan 2, 2004

radar1200gs said:
For every 2 fp16 pixels output you get 1 fp32 pixel "for free" or 2 more fp16 pixels to add detail to the scene with.

Is that an assumption? Because it's absurd.

radar1200gs · Jan 2, 2004

How so?

As I've already said multiple times, don't expect the performance delta between fp16/32 to remain what it is in nv3x.

jvd · Jan 2, 2004

Its funny . I love hearing that 16bit with 32 bit once inawhile is good enough . Yet 24bit isn't good enough .

Get over it people ati made the better card. Either buy a radeon 3x0 or buy a geforce fx and upgrade before a dx 9 card comes out .

I mean that is idealy what nvidia wants you to do. Buy two "dx9" cards in the amount of time that an ati user needs to buy 1 card. That way they can claim a larger installed base .

Dio · Jan 2, 2004

radar1200gs said:
Your example was probably better than mine.

Both analogies are rather flawed. As you say, jpeg/mpeg/mp3/etc. work on lossy encoding of perceptibly lossless differences. However, the work of determining what is lossless is done automatically by the compressor rather than by hand by a programmer.

As we seem to have finally agreed here that this cannot be done automatically, but instead must be done by a programmer, this is a nightmare for programmers who will soon be shipping thousand-shader applications.

So the better analogy is with the early days of fractal compression, (I remember one quote that was something like 'it takes maths PhD's 50-100 hours to encode each image').

Even worse, in the RenderMan industry the majority of people writing shaders are not programmers, but artists. I expect the games industry to move the same way.

KimB · Jan 2, 2004

jvd said:
Its funny . I love hearing that 16bit with 32 bit once inawhile is good enough . Yet 24bit isn't good enough .

And yet even ATI agrees with this. Their fixed function texture addressing units are FP32.

bloodbob · Jan 2, 2004

Dio said:
So the better analogy is with the early days of fractal compression, (I remember one quote that was something like 'it takes maths PhD's 50-100 hours to encode each image').

Though I did imply its a manual time consuming processes

The problem is developers like CBR but they don't like the crap quality of 64 CBR mp3s so they go for full precision because to work out which bits of code need full precision and which bits they can get away with only fp16 takes alot of time and effort and is trail and error to some degree.

one could develope tools that went through a varient of ingame sistuations and then did standard devations from a full fp32 shader and work it out. Yeah the fractal compression did work automatically but the results were rather crap I presume it isn't much better now because we aren't using them.

I just didn't really like radar1200gs 10-100 times better compression being considered the same as fp32->fp16/32 mix.

On the artist issue yeah that is a real problem thankfully I don't think its going to be a major issue yet because I don't think there are going to be that many PS2.0 shaders I believe valve only has about 20ish BASE ps 2.0 shader which are statically branched which they then use M$ tools to compile into the seperate shaders. Static branches are just used to add things like details textures or bumpmapping. When developers start doing things like using procedual textures then we are going to have to really keep out eyes on those artists.

FP16 and market support

sonix666

KimB

Tim Murray

the Windom Earle of mobile SOCs

Ostsol

Reverend

Tim Murray

the Windom Earle of mobile SOCs

DemoCoder

YeuEmMaiMai

K.I.L.E.R

Retarded moron

Hellbinder

Ailuros

Epsilon plus three

radar1200gs

bloodbob

Trollipop

radar1200gs

Reverend

radar1200gs

jvd

Dio

KimB

bloodbob

Trollipop

Similar threads