Dawn FP16/FX12 VS FP32 performance - numbers inside

♪Anode said:
Doing fp16 at full speed and fp32 at half is as good a choice as ati doing 24 bits if not better. It may not seem so at now but it will pay off as time goes on. ;)

I think you forgot the "because [...]" at the end??
 
Uttar: Actually your shaders might NOT test exactly what you want them to! I just ran over them quickly, but this is what I noticed:
1. There are some (didn't look all of them) fragment shaders that are written in Cg and they still use fp32 (float data type) all the way in your fp16 shader pack!
2. Your high quality shader pack is also doing something that I think it definitely shouldn't! You do all your math in fp32 and then output the color in fp16 (COLH) register instead of fp32 one (COLR)!? This might be a very big NO NO on NV3x...
 
I don't think I will be playing games that look like 'Final Fantasy:the Movie' next year...and there is a reason why Microsoft chose FP24 as a DX9 minimum spec and that was 99.9% decided on by far more qualified people than some arm chair experts on forums.

FP32 may be good for the Movie Industry, but if Carmack can use lower precision modes and get decent graphics..no need for FP32 in gaming cards today...Just Marketing PR..We can render at FP32 but we never use it ;)
 
The performance of fp32 on the 5900 FX seems to be a bit more than half of fp16. With code optimization, it should perform significantly better, even close to the same (if no more than 2 registers are used).

It ends up being like Amd vs. Intel. The athlon doesn't require much optimization, and will perform fairly predictably whether you structure things one way or another. The Intel P4, on the other hand, must be specifically supported in order to shine against the athlon.

Uttar, take a look at MDolnec's post and see if there is anything you can update in the patch.

If you do, it would be nice if results were arranged in a little table comparing the 5800 ultra to the 5900 ultra with a default percentage difference under mixed mode, fp16, fp32, ect. (and fx12, if possible).
 
Well, Anode, perhaps not, when you can't do fp16 as fast as you can do fp24, and fp32 worsens that comparison even more. I think your comments are what nVidia PR would like to be the case...but only once the very significant performance issues are addressed (NV40?) do I also think nVidia PR would be begin to have justification for making such statements. It would depend on what the "fp24 competition" performance figures look like at the time, and whether unique possibilities are brought to the table.

Actually, I think that PS/VS 3.0 will allow that idea to be put to the test most successfully, and I personally see no reason to doubt that NV40 will offer that around the end of the year. But when talking about the NV3x, I don't think the fp16/fp32 decision is a successful one at all. We'll see if the NV40 will sufficently address the performance problems, and if we can actually have a demonstration of fp32 versus fp24 advantage.
 
I had just finished the FP16 with FP32 registers patch, btu in light of MDolenc's comments, I'll have to delay it by a few minutes, a few hours, or a few days :)

MDolenc: Okay, so first I'm perfectly aware that the Cg parts are still done in FP32 in the default FP16 patch. As you might have not seen, I've later released a "FullFP16" patch ( yes, yes, I know, confusing ) which uses half instead of float for the Cg programs ( not TOO familiar with Cg though, I hope I didn't do a mistake ) - according to AnteP's tests, the difference was minuscule, in the order of 5% ( reason being that the bulk of the pixels were optimized in assembly language, only small details such as the eyes are still Cg )

As for your COLH/COLR comment... Icky, didn't know about that. I'll fix this and upload the updated versions at the same time as the FP16 in FP32 registers patch.

Please note that the FP16 in FP32 registers patch is based on the FP16 patch, and not the FullFP16 one - considering AnteP's results, however, this shouldn't matter much.


Uttar
 
♪Anode said:
IMO supporting full FP32 would be the way to go in the future. FP 24 is mostly a stop gap thing which ATI did so as to reclaim their performance crown which worked for them at this point of time. But I dont doubt for a second that they would be going FP32 for the future cards.
I'd agree to that, but...
The problem is having FP16 and slow FP32 today won't help you, as games in the foreseeable future won't benefit at all from the increased precision you get with FP32 over FP24 - but OTOH FP16 *might* not be enough always. And by the time FP32 will be useful (when shaders are much longer) it also won't benefit the FX 5900, as it will be ways too slow (and it won't support PS 3.0 etc.). So, for now, fp24 just seems to be the better choice, plain and simple. If you need more precision in the future, you'll just do this with your future product, it doesn't make sense to add it now.
(That said, FP32 might be useful for those much touted "pixar-in-a-box" type of applications, haven't heard much of that lately however.)
 
There, four patches in ONE zip file! :) It requires manual installation, though.

www.notforidiots.com/DawnPatches.zip

Dawn32: Full FP32 Dawn - uses COLR as suggested by MDolenc
Dawn16: FP16 Dawn, with the eyes/teeth/... in FP32 - uses COLH
DawnFull16: Full FP16 Dawn, including eyes/teeth/... - uses COLH
Dawn16IN32: Eyes/teeth/... are same as in Dawn16, thus full FP32. All other parts are using FP32 registers with FP16 instructions - uses COLH

There you go. Does Dave actually have a NV35? If so, I might want to ask him to run the tests... :)


Uttar
 
Ante P said:
♪Anode said:
Doing fp16 at full speed and fp32 at half is as good a choice as ati doing 24 bits if not better. It may not seem so at now but it will pay off as time goes on. ;)

I think you forgot the "because [...]" at the end??

For one thing 32 bit floating point is a standard for IEEE single precision floating point and most of the code done for older apps which was meant to be done on the cpu would be written using that.

I'd agree to that, but...
The problem is having FP16 and slow FP32 today won't help you, as games in the foreseeable future won't benefit at all from the increased precision you get with FP32 over FP24 - but OTOH FP16 *might* not be enough always. And by the time FP32 will be useful (when shaders are much longer) it also won't benefit the FX 5900, as it will be always too slow (and it won't support PS 3.0 etc.). So, for now, fp24 just seems to be the better choice, plain and simple. If you need more precision in the future, you'll just do this with your future product, it doesn't make sense to add it now.
(That said, FP32 might be useful for those much touted "pixar-in-a-box" type of applications, haven't heard much of that lately however.)

Workstation apps come to mind where fp32 help you. The thing is there wont be many Direct X 9 games till 1-1 1/2 years from now. And by that time people will be having the next generation of cards from ATI and Nvidia. Like it or not the first generation of cards made for DX 9 will be mostly used to play DX 8 games and what DX 9 features there are will be seen by people in form of demos or 3dmark 2003 ( one game test at that ). So it makes sense to expose 32 bits in the first generation of cards so that devs have something to work at. By the time the game developers are done and games hit the market there will be faster cards and what they developed on would come to fruit. Its like the geforce 3 thing. the NV20 brought dx 8 graphics and at that time there were no games to take advantage of it. Real direct X 8 games started coming out when there were geforce 4's in the market. Even ut2003 which is not fully directx 8 (uses PS minimally ) is used to bench DX 9 cards.
 
♪Anode said:
Ante P said:
♪Anode said:
Doing fp16 at full speed and fp32 at half is as good a choice as ati doing 24 bits if not better. It may not seem so at now but it will pay off as time goes on. ;)

I think you forgot the "because [...]" at the end??

For one thing 32 bit floating point is a standard for IEEE single precision floating point and most of the code done for older apps which was meant to be done on the cpu would be written using that.

Excuse my ignorance but how would that ever apply to a "gaming videocard"?
 
♪Anode said:
For one thing 32 bit floating point is a standard for IEEE single precision floating point and most of the code done for older apps which was meant to be done on the cpu would be written using that.
I have yet to see or hear about any FP32 shader program that couldn't run perfectly fine with FP24 precision. I don't doubt that someone could come up with a contrived case, but I'm talking about something interesting and/or useful.

There's also the issue of performance... if you can't run a given shader program with FP32 at a reasonable speed, but you could do so with FP24, then FP24 would be more useful.

Workstation apps come to mind where fp32 help you.
How, exactly?

So it makes sense to expose 32 bits in the first generation of cards so that devs have something to work at. By the time the game developers are done and games hit the market there will be faster cards and what they developed on would come to fruit.
But wouldn't it make more sense to release a card that can run FP24 precision shaders at useable speeds today, then to release one that runs FP32 shaders at unusable speeds today in the hopes that tomorrow's hardware will make them practical?
 
GraphixViolence said:
I have yet to see or hear about any FP32 shader program that couldn't run perfectly fine with FP24 precision. I don't doubt that someone could come up with a contrived case, but I'm talking about something interesting and/or useful.
Who needs 32 bit color? 16 bit is fine. Beyond that, who needs more than 640k? nobody will ever need more than that.

(in other words: in computers, more is never enough)
 
GraphixViolence said:
I have yet to see or hear about any FP32 shader program that couldn't run perfectly fine with FP24 precision. I don't doubt that someone could come up with a contrived case, but I'm talking about something interesting and/or useful.
What about this demo from Humus?? Zoom in on the left part of fractal on a Radeon and you'll see what fp24 does. If you are not convinced how much better fp32 can look there I'll provide some screenshots from reference rasterizer.
 
5800 Ultra:
Normal: 30 fps

Old results
FP32: 17 fps
FP16: 20 fps
FullFP16: 21 fps
FX12: Haven't tried it yet, work faster Uttar ;)

New results
FP32: crashes
16IN32: 17 fps
FP16: 20 fps
FullFP16: 21 fps
FX12: Haven't tried it yet, work faster Uttar ;)
 
RussSchultz said:
GraphixViolence said:
I have yet to see or hear about any FP32 shader program that couldn't run perfectly fine with FP24 precision. I don't doubt that someone could come up with a contrived case, but I'm talking about something interesting and/or useful.
Who needs 32 bit color? 16 bit is fine. Beyond that, who needs more than 640k? nobody will ever need more than that.

(in other words: in computers, more is never enough)
Um, that's a bit of an oversimplification, don't you think? If I show you two images, one rendered at 32 bpp and one rendered at 16 bpp, the difference will be obvious. If I did the same with FP32 vs. FP24, it would not. Of course, you could come up with contrived cases where the reverse was true (i.e. 16 bpp looks the same as 32 bpp, or FP32 looks much better than FP24), but what's the point? We're talking about useful products here, after all.

I'm sure in a few years when fast FP32-capable graphics boards cost $10 this point will be moot, but we're quite a ways from that yet.
 
Ante P said:
Excuse my ignorance but how would that ever apply to a "gaming videocard"?

Ok I am not very knowledgeable over these matters but I was more thinking along the lines of porting over the higher precision cpu shaders ( similar to what render farms do ) to the gpu if it were fp 32. As I said I am not very familar with this and maybe someone with more knowledge over these can comment on this and let me know if I am right or wrong :).

And the above posts do show cases where fp24 is not enough.

But wouldn't it make more sense to release a card that can run FP24 precision shaders at useable speeds today, then to release one that runs FP32 shaders at unusable speeds today in the hopes that tomorrow's hardware will make them practical?

What are you running right now that requires dx 9 class shaders ? 3dmark 03 ? :LOL: For most of the best looking games Unreal engine games etc. are DX 8 class games which will look just as good when rendered on FX12 as they will on FP32/FP24. The quality difference visible on screenshots is mostly due to AA/AF and not due to shader precision.

The only people who will be using these will be the developers. And at this point it makes more sense to give them something which represents the tech when their particular app is released than something else. Then they can do all the optimizations for a fp32 based architecture and then their results will come to realization when their game gets released 2 years from now and people can play them on the latest hardware which will run them in their full glory.
 
GraphixViolence said:
Um, that's a bit of an oversimplification, don't you think?
...
If I show you two images, one rendered at 32 bpp and one rendered at 16 bpp, the difference will be obvious. If I did the same with FP32 vs. FP24, it would not.
Yes, I think what you've said is a bit of an oversimplification. ;)

Why do we need FP24 when 32 bits borders on the realm of our resolvable limit? (and exceeds most LCDs)?

Because it wasn't enough for some effects--cummulative errors begin to creep in and also because we're not using simple additive math anymore in our shaders.
 
MDolenc said:
What about this demo from Humus?? Zoom in on the left part of fractal on a Radeon and you'll see what fp24 does. If you are not convinced how much better fp32 can look there I'll provide some screenshots from reference rasterizer.
Neat... but that's about the best example of a contrived case I could think of. No matter how much precision you have, you're going to eventually run out once you zoom far enough into a Mandelbrot set. FP32 would just let you get a little deeper.

And no offense to Humus, but this demo is quite a bit less sophisticated than the CPU-based fractal viewers that have been available for years now. They use scaling tricks to render details that would be impossible to see if you were limited to straight FP32 calculations. You just have to be prepared a few hours to wait for the rendering to complete.

It would be interesting to see if someone could implement something like this in a DX9 shader (perhaps with multiple passes). But the point is, we're talking more about a programming problem here than a precision problem per se. It's also tough to envision how this could ever be applicable in a gaming context, although I may just not be imaginative enough :)
 
Icky, it *crashes* ?! ...
Won't be fixed today, have to go to bed real soon now...
But anyway, the FP16 in FP32 results are quite normal. Exactly the same performance as what we had with FP32! So now we need the numbers for a NV35... I repeat my question: does Dave have one?

EDIT: Had a ridiculously wrong conclusion ( I'm tired... )


Uttar
 
RussSchultz said:
Why do we need FP24 when 32 bits borders on the realm of our resolvable limit? (and exceeds most LCDs)?

Because it wasn't enough for some effects--cummulative errors begin to creep in and also because we're not using simple additive math anymore in our shaders.
It's a matter of diminishing returns. It would be nice if we could all have 1000-CPU render-farms on our desktops, but given the relative cost and relative quality difference vs. a high-end gaming GPU, it doesn't make sense. So the question is, what's good enough given the cost/performance/quality trade-offs of existing technology? Or more correctly, what is the ideal balance?

If you could get FP32 at similar performance to FP24, and there was a noticeable image quality difference, then it would make sense. But with today's technology, you get FP32 at at ~25% the speed of FP24, and it's difficult to find any situations where the IQ difference is noticeable.

I can certainly see Nvidia's reasoning for building a FP32/FP16 chip, but I don't think they counted on their competitor being able to build a faster FP24 chip at the same or lower cost. If they had, I think they probably would have come to a different conclusion.
 
Back
Top