FP16 and market support

Rugor · Jan 1, 2004

Why O why would ATI replace the tried and true R3xx design with something that looks a lot like the NV3x with 8 real and 8 pseudo pipelines that just doesn't make sense.

Ostsol · Jan 1, 2004

radar1200gs said:
ATi impliments r4xx as FP32 with FP16 partial precsion. The partial precsion is achieved by splitting the FP32 registers. This allows the claim of extra pipelines, but only when partial precision is used. It does not affect the number of pixels output per cycle (still only 8 of those).

How the heck does that make any sense whatsoever? Splitting registers allows for more registers. . . and that's it.

Pete · Jan 1, 2004

radar1200gs said:
There has been ongiong speculation that ATi will increase the number of pipelines in r4xx. I haven't closely followed the speculation, but what follows is my opinion/speculation on one easy way to do it. (I'm not suggesting that what follows is fact).

ATi impliments r4xx as FP32 with FP16 partial precsion. The partial precsion is achieved by splitting the FP32 registers. This allows the claim of extra pipelines, but only when partial precision is used. It does not affect the number of pixels output per cycle (still only 8 of those).

Call the prospectors, this man has struck comedy GOLD!

Ailuros · Jan 1, 2004

radar1200gs said:
There has been ongiong speculation that ATi will increase the number of pipelines in r4xx. I haven't closely followed the speculation, but what follows is my opinion/speculation on one easy way to do it. (I'm not suggesting that what follows is fact).

ATi impliments r4xx as FP32 with FP16 partial precsion. The partial precsion is achieved by splitting the FP32 registers. This allows the claim of extra pipelines, but only when partial precision is used. It does not affect the number of pixels output per cycle (still only 8 of those).

Same WTF reaction to the speculative part.

For the record the speculation about increased pipelines in a future product were talking about a 3rd block of pipelines (vs. 2 blocks in current R3xx). The theory went (more or less) as follows:

high end = 3 blocks
mainstream = 2 blocks
value = 1 block

Dave Baumann · Jan 1, 2004

Rugor said:
Why O why would ATI replace the tried and true R3xx design with something that looks a lot like the NV3x with 8 real and 8 pseudo pipelines that just doesn't make sense.

Well, a certain application will make an optimised stencil path very important next year.

Heathen · Jan 1, 2004

Radar1200gs comedy act quotes

do you practice at being this funny?

Well, a certain application will make an optimised stencil path very important next year.

Well that's just accelerated stencil ops. Can't see ATi dumping all of the other advantages of the R3xx architecture just for one game.

Bouncing Zabaglione Bros. · Jan 1, 2004

radar1200gs said:
There has been ongiong speculation that ATi will increase the number of pipelines in r4xx. I haven't closely followed the speculation, but what follows is my opinion/speculation on one easy way to do it. (I'm not suggesting that what follows is fact).

That's just unfounded speculation, ususally by those trying to legitimise the use of FP16. FP16 is only being raised as an issue because NV3x is not competative without dropping to this lower precision. Remember, FP24 is on the way out - what makes you think it's going to be replaced by FP16?

Why do you think it wasn't mentioned at any of the big NV3x launches? It doesn't fit in with Nvidia's Cinematic Computing, and Nvidia's primary focus is only being shifted to FP16 by as a stopgap measure to try and make up the performance deficiet with ATI. As always with Nvidia, they trade off IQ for speed.

KimB · Jan 1, 2004

Hellbinder said:
I gurantee, you (and i think everyone knows this), would be making the exact same arguments i am if the Shoe was on the other foot.. Except you would likely also be pushing how FP24 is tha "wave of teh Future" or something.

Have I ever defended nVidia's AA? Have I ever defended nVidia's use of partial trilinear? No? You have no basis for that statement.

Dio · Jan 1, 2004

Genghis Presley said:
Dio said:

SUV's? There tends to be more interest in sports cars. Except for hiring them to go up to Tahoe.

Click to expand...

Are you aiming this comment at anyone specific?

I think it hit right between the eyes. But you're not the only one.

Bjorn · Jan 1, 2004

Heathen said:
Well that's just accelerated stencil ops. Can't see ATi dumping all of the other advantages of the R3xx architecture just for one game.

That's assuming that the biggest reason for Ati's advantages is the 8*1 (vs 4*2) configuration, which i doubt.

Arun · Jan 1, 2004

Aaaargh! My eyes, my poor eyes!
Splitting registers?! Pipelines without color output?! It's FP16 registers uniting into FP32 ones, you know...
WTF am I gonna hear next? That NVIDIA's FP16 is exactly twice as fast as their FP32? ... Don't make me sick.

Maybe I could get a laugh at how dumb he was initially, but this is just plain sad. Wait, this is Beyond3D? Really? You're not kidding me? Damn...

Uttar

cthellis42 · Jan 1, 2004

YeuEmMaiMai said:
Ati is not going to use FP16...........so why even hope they will follow nVidia?

I would necessarily state that. I'd fully expect them to go to FP32 the way they're at FP24 now, but _pp hinting is out there, and if there's a way for them to utilize FP16 to increase performance without having to make a notable sacrifice elsewhere, I see no reason for them NOT to. Developers will be programming with them in mind regardless, so if they can take good advantage of it they should. They'd just be coming at it from the other way.

Reverend · Jan 1, 2004

Reading back a few pages, I read about determination about X or Y precision for a host of instructions without any perceived loss in quality. This is an important thing.

If one were really hardcore, one could use error analysis to compute the effect of reduced precision on the final brightness of screen pixels, to determine whether the errors are significant. But it's probably better to just try lower precision, and see if the result looks noticeably worse.

DemoCoder said:
Dio said:

Chalnoth said:

In other words, what I'm trying to say is that with a few simple rules, one should be able to easily determine which instructions can use what precision, with no perceivable loss in image quality.

Click to expand...

Would that not imply there is no need for precision modifiers, as it can be done automatically and easily by analysis?

Click to expand...

This would work in a high level language with type inferencing and numerical analysis in the compiler using a heuristic like "minimize error". The problem is, there is no standard for telling the HLSL shader what the precision of the input textures are and the precision of the output framebuffer.

IMO, it would be INSANE for any compiler to try to do automated error analysis and reduce precision where it thinks the results aren't significant. A compiler can only see the local information on the computation performed by a shader, it has no idea about the range and statistical nature of the input data or of what you're going to do with the output, for example drawing it to the screen or recycling it as a render target into a more precision-sensitive shader program.

DemoCoder said:
The DCL shader instruction in DX9 only allows you to specify the input mask, and whether something is 2d, a cube, or volume. Similarly, there is no way, specified in the shader itself, what the desired output precision is.

Precision isn't something a compiler should mess with at all. As a programmer, I want everything precisely specified by me and me alone, as either 32-bit or 64-bit floating point (IEEE). All of these 12-bit, 16-bit, and 24-bit stuff by NVIDIA and ATI are (should be :!:

) temporary and won't (shouldn't :!:

) have any long term impact on graphics programming. I mean, if you're Valve and you're going to ship HL2 in the next few months, this stuff matters to you, but if you're developing new tools or renderers today, you can just ignore it knowing it will (should :!:

) go away.

DemoCoder said:
Of course, this is just part of the ongoing debate over statically typed languages (C, C++, Eiffel, Java, etc) vs dynamically typed (ML, Haskell, scripting languages, etc) The typeless languages typically have more expressional power leading to more concise less buggy code (e.g. OCaml, Ruby, Haskell), but at the cost of compiler complexity and speed.

Those dynamically typed languages are completely deterministic and predictable when it comes to evaluating basic arithmetic operations and user-defined functions constructed from them. The compiler doesn't go in and arbitrarily decide to evaluate some expression with a different precision or data type; though the precision isn't statically known, it is deterministically derived from the dynamic types attached to your data.

sonix666 said:
When a texture is uploaded to DirectX the driver could detect the format and the minimum and maximum values and such to feed the pixel shader compiler.

I would want to shoot a driver writer that does this!

DemoCoder said:
The core issue is multipass. The compiler can estimate error in a single pass, but how can it estimate ACCUMULATED ERROR without saving it in an additional buffer (E-Buffer? Error Buffer?) for every pixel? Perhaps it would use MRT or some kind of packing to store the error to pick it up in later passes.

Omigod! Argh!

Heathen · Jan 1, 2004

That's assuming that the biggest reason for Ati's advantages is the 8*1 (vs 4*2) configuration, which i doubt.

Well you're right the 8*1 is just one advantage of the R3** architecture. From the R420 pipeline thread it's pretty obvious it's undergoing a major overhaul but it doesn't seem to me ATi are changing their fundamental approach.

utilize FP16 to increase performance

what sort of cost in transistors would this take? ATi's approachmay be to deal with the PP hint the same way they do today. Ignore and run everything at the one precision at full speed. From what I understand giving the chip the ability to handle multiple precisions would complicate things design wise.

John Reynolds · Jan 1, 2004

YeuEmMaiMai said:
so let nvidia continue with FP16 as ATi's performance with FP24 is superior......law down some tasty eyecandy and there really is NO COMPARISON between the excellent image you get from r3X0 and the mediocre image you get with NV3X when performance is similar....

You know, this is the kind of fanb0y-based drivel I can just do without reading on this board. Please show me a fair, unbiased review from a reputable source that states such a difference in image quality exists. No, I don't want to hear your opinion, I want to see you cite a credible source that supports your claims.

Reverend · Jan 1, 2004

You know what John? Sometimes I think it's best to just ignore certain folks for the better of the site and/or for those that matter. Responding to certain things they say in the way you did merely serves to potentially deteriorate a thread.

And yes, that's my only new year resolution!

John Reynolds · Jan 1, 2004

Reverend said:
You know what John? Sometimes I think it's best to just ignore certain folks for the better of the site and/or for those that matter. Responding to certain things they say in the way you did merely serves to potentially deteriorate a thread.

And yes, that's my only new year resolution!

Yeah, I know, I know. And 99.5% of the time I do ignore such comments, and lord knows I wouldn't want this quality thread to deteriorate any.

My resolution is to get you off my back by finding a really good toupee. 8)

digitalwanderer · Jan 1, 2004

John Reynolds said:
My resolution is to get you off my back by finding a really good toupee. 8)

I don't care if it's OT or not, do NOT get a toupee!

It'll be a constant worry, they look like shit, and there is absolutely no need. Shave your hair short and go for the Picard look. 8)

archie4oz · Jan 1, 2004

Precision isn't something a compiler should mess with at all. As a programmer, I want everything precisely specified by me and me alone, as either 32-bit or 64-bit floating point (IEEE)

But C compilers have been doing this for years now (albeit promoting precision; not demoting it). Quite often you have to pass a switch to the compiler to force it to not promote floats to doubles...

Give it time and I imagine you'll eventually start seeing some compilers getting crafty and start selecting precision types based on assumptions made on known constants passed to the compiler (probably only performed under aggressive optimization settings where larger degrees of error are permitted)...

YeuEmMaiMai · Jan 1, 2004

do I really have to make you look like an arse?

OK well here goes

GFFX max IQ settings
http://www.hardocp.com/image.html?image=MTA0MzYyMDg1OTVjVVNkMzFISXhfNV8xM19sLmpwZw==

notice that the FPS counter reads 27 current and average fps

Radeon 9700Pro max settings
http://www.hardocp.com/image.html?image=MTA0MzYyMDg1OTVjVVNkMzFISXhfNV8xNF9sLmpwZw==

notice that the FPS counter reads 62 current and average fps

Now lets compare more recent cards

cards set with no IQ enhancements UT 2K3

http://www.tomshardware.com/graphic/20031229/vga-charts-03.html#unreal_tournament_2003

tell me who is at the top of the list?

http://www.tomshardware.com/graphic/20031229/vga-charts-13.html#unreal_tournament_2003

tell me who is at the top of the list?

4Xfsaa

Ati Radeon 9800 XT 98.4
Ati Radeon 9800 Pro 256MB 92.3
Ati Radeon 9800 Pro 89.7
GFFX 5950U 85.1

8x ANSIO

Ati Radeon 9800 XT 93.8
Ati Radeon 9800 Pro 256MB 87.5
Ati Radeon 9800 Pro 87.5
Ati Radeon 9700 Pro 78.8
Ati Radeon 9800 78.5
GFFX 5950 U 69.1
GFFX 5950U 85.1

Obviously you do not know how to read well do you? I said when "performance is similar" Oh that's right, Nvidia has to substitute FP16 even when developer didn't ask for it.... (futuremark anyone?) and futuremark has to constantly patch their code to disable all of the driver cheats....

Now lets move to UT2K3 as we all know Nvidia does NOT do trilinear filtering while ATi does once again another IQ hack for performance

TRAOD overall it is accepted that the game is bad but still why did nVidia force edios to remove the bechmark mode from the program? could it be showing off their weak performance compared to ATi.

quote from firingsquad

As you saw in the in-game shots, this doesnâ€™t have a negative impact on visual quality per se. On the other hand, when we took this issue to Tony Tomasi back in August, he assured us that a fix was in the works to correct what could have been interpreted as a glitch. Now that the anticipated Detonator is upon us, it seems no fix was in the works, nor will be.

what's wrong, having a hard time seeing who offers the better product? I had reading all of this nVidia drivel about how FP16 is better and does not have the limitations of FP24.........

like I said it comes down to game play and when you enable all of the IQ settings to the max ATI has undisputed performance leadership. with Nvidia you have to lower the IQ to get the same performance level what is so hard to understand about that?

John Reynolds said:
YeuEmMaiMai said:

so let nvidia continue with FP16 as ATi's performance with FP24 is superior......law down some tasty eyecandy and there really is NO COMPARISON between the excellent image you get from r3X0 and the mediocre image you get with NV3X when performance is similar....

Click to expand...

You know, this is the kind of fanb0y-based drivel I can just do without reading on this board. Please show me a fair, unbiased review from a reputable source that states such a difference in image quality exists. No, I don't want to hear your opinion, I want to see you cite a credible source that supports your claims.

FP16 and market support

Rugor

Ostsol

Pete

Moderate Nuisance

Ailuros

Epsilon plus three

Dave Baumann

Gamerscore Wh...

Heathen

Bouncing Zabaglione Bros.

KimB

Dio

Bjorn

Arun

Unknown.

cthellis42

Hoopy Frood

Reverend

Heathen

John Reynolds

Ecce homo

Reverend

John Reynolds

Ecce homo

digitalwanderer

archie4oz

ea_spouse is H4WT!

YeuEmMaiMai

Similar threads