Why doesn't Nvidia simply lock NV3x cores at fp16

Sxotty

Legend
Every knows, and complains about how crappy Nv3x stuff is at shaders when it is running them at higher precision than ATI's products, so why doesn't NV just lock it at fp16 and say sorry folks this is the way the cookie crumbles. Like back in the day of 16bit, and 32bit color, and what 24bit was that 3dfx?

I suppose everyone would get really angry and cranky but hey they already are right?
 
After all their hype about "the dawn of cinematic computing" and the advantages of 32FP don't you think they'd be shredded for it by those of us who are less than pleased with the way they've been lately?

It ain't a bad idea, but they really are damned if they do and damned if they don't at this stage of the game.
 
and going to fp16 doesn't really help anything but the nv35 anyway, the other cards still choke. which reminds me, i really want to see how well that not so long ago "top of the line" 500$ 5800u does in hl2. i am guessing anyone who has one is going to be extreemely dissapointed.
 
kyleb said:
and going to fp16 doesn't really help anything but the nv35 anyway, the other cards still choke. which reminds me, i really want to see how well that not so long ago "top of the line" 500$ 5800u does in hl2. i am guessing anyone who has one is going to be extreemely dissapointed.
you don't htink we aren't extremely disapointed with the card already ?
 
Every knows, and complains about how crappy Nv3x stuff is at shaders when it is running them at higher precision than ATI's products, so why doesn't NV just lock it at fp16 and say sorry folks this is the way the cookie crumbles. Like back in the day of 16bit, and 32bit color, and what 24bit was that 3dfx?

I suppose everyone would get really angry and cranky but hey they already are right?

In order to be able to claim dx9 compliance they need to have fp24 or better available. Also even locking at fp16 wouldn't solve all of their problems, they are currently using fx12 in a lot of cases to get performance up to the level of the r350.

Really hope nvidia does a better job with nv40, they are currently living on name recognition and their name isn't exactly something to brag about at the moment.
 
umh... wasn't that fp16 isn't official mode for PS 2.0?? it certainly isn't supported by PS 1.4 either... so is it that they just can't do that, because it wouldn't be DX9 class hardware anymore... (fp24 is the minimum for DX9 class hardware, right??)

EDIT: ahh.. Alpha Wolf answered to my suspicious questions already. :)
 
Sxotty said:
Every knows, and complains about how crappy Nv3x stuff is at shaders when it is running them at higher precision than ATI's products, so why doesn't NV just lock it at fp16 and say sorry folks this is the way the cookie crumbles. Like back in the day of 16bit, and 32bit color, and what 24bit was that 3dfx?

I suppose everyone would get really angry and cranky but hey they already are right?

Psst. They already did that for NV30, NV31, and NV34. New drivers enable FP32 again.
 
Sxotty said:
Every knows, and complains about how crappy Nv3x stuff is at shaders when it is running them at higher precision than ATI's products, so why doesn't NV just lock it at fp16 and say sorry folks this is the way the cookie crumbles. Like back in the day of 16bit, and 32bit color, and what 24bit was that 3dfx?

I suppose everyone would get really angry and cranky but hey they already are right?

People are upset to discover that the hardware nVidia has been touting, pushing, and selling into the market all year long as "DX9" hardware is not actually DX9 compliant. That is the root and branch of the complaint. Just talking about the fp pipeline and nothing else--fp16 is not DX9 compliant. Period. nVidia knew this long before nV3x shipped, just as did ATi. The difference was that one company chose to make an API-compliant product and the other did not.

We live and breath under the API for Windows. The purpose of the API is that it benefits the *developer* who should be able to program for the API and not have to worry much about differences in IHV hardware because those are taken care of in the drivers, and the drivers are done by the IHV. However, in order for the API system to work the IHVs must voluntarily comply with it--which is utterly fair, since they all are granted latitude in helping to form it--the IHVs and the software developers working together, that is. The API is formed from a consensus of all interested parties.

It's a great system when it works--great for developers and IHVs alike because everybody benefits. But if an IHV decides not to comply--problems arise for everyone--not just the non-compliant IHV--but also for developers as their work becomes compounded. In this case it's especially egregious for DX9 developers like Valve--because even using partial precision (fp 16) their software won't run on one IHV's products nearly as well as on the other's with full precision, and in the interests of self-defense they are forced to explain that situation to their customers who have been led to believe the hardware they bought from an IHV is API-compliant when in fact it is not. Even having explained themselves so well, Valve is still facing unsupportable charges of patisanship and financial corruption. However, if Valve had not taken the trouble to explain things to people, the current charges are nothing to those which would have erupted had Valve just released the game and said nothing.

The issue is not about fp16 vs. fp24 vs fp32 at all. That argument is actually irrelevant. The issue is API compliance, and that is the only issue. You are either compliant or you are not. And the situation with nVidia that has occurred all year is a perfect example of what happens when an IHV takes it upon himself to define the API the way he sees fit without regard for anyone else involved in the process. It makes it very hard on developers--because who has the time and money to fart around doing work-arounds so that nVidia's DX8.1+ hardware can run their DX9 software with playable framerates--when all the developer should really have to do is to program the DX9 path for everybody selling "DX9" hardware?

Would you prefer to bring GLide back so that unless you bought a 3dfx card your 3d game wouldn't run well if at all???? Well, that's what happens when IHVs start mucking around with the *agreed upon* API and decide to go it on their own. Nobody wants that--even 3dfx before the end abandoned GLIDE. But that looks like where nVidia tried to go all right--for its own advantage--but as we can see the developers aren't having any of it. And nVidia--rightfully--has been left holding the bag. nVidia should have to explain to its customers why its DX9 products are not DX9 compliant as advertised. But nVidia isn't going to do that--so DX9 software developers are going to have to do it for them, it seems. I actually think this is great--these people tell it like it is and its refreshing not to have to hear vacuous lectures from nVidia on why it thinks everybody else is wrong about the "future of 3d gaming." nVidia better light a fire under its tail and get compliant, is all I can say.
 
Oh come now, Walt... They're DX9 compliant all right--it's just that being compliant makes them run at, like, half-speed. ;)
 
cthellis42 said:
Oh come now, Walt... They're DX9 compliant all right--it's just that being compliant makes them run at, like, half-speed. ;)

Now, why did I know this question would come up?...:D

OK, here's how I see it...

The API calls for FP24. Not fp16, and not fp32. Even if you argue that fp32 exceeds the API specification, you still must concede that fp32 is not fp24, and is therefore not compliant with the API.

OK, so why fp24 in the API and not fp16 or fp32? Well, the people formulating the API decided fp16 wasn't precision enough for DX9, and that fp32, while certainly precision enough, was unlikely to perform well due to current technical know how and manufacturing processes--and so fp24 was decided upon for DX9 compliance.

ATi goes out and builds an fp 24 part, and nVidia builds a hybrid fp16/32 part, with zero fp24 hardware support. So why did it take this route? Here's what I think...

nVidia felt that going to fp16 would be all the precision they needed with what they projected would be essentially DX8.1 games for the life of the product, but they also decided to sort of double-up in the architecture and throw fp32 in there as well, albeit without enough register/transistor support in the chip to really make fp32 competitive in the 3d gaming market (and it wanted to keep transistor count down as yields were yet to be determined and we all know what those have been this year.) They wanted to be able to talk about fp32 support in the chip as exceeding the API specification, but planned all along to run at fp16--if not FX12. The intention all along was to use fx12/fp16 while its competitors would be using fp24, and nVidia assumed this would provide it with a performance advantage in and of itself. I think this is further butressed by the fact that originally the nv3x drivers didn't do fp32 (since rectified to get WHQL compliance.)

However, ATi comes out with an 8x1 architecture that does only fp 24, and runs it faster than nVidia's 4x2 fp16--so nVidia lost that gambit straight up. But the important thing to me is that the consensus that fp24 should be the API turned out to be correct from the standpoint of performance, and so nVidia screwed itself more or less in trying to one-up everybody else with fx12/fp16, because they made architectural assumptions about their competitors that turned out to be faulty.

So, yes--fp32--as slow as Christmas as it is--exceeds the API specification, but nV3x is not technically in compliance with the API because the chip can't do fp24. So when running full precision DX9 nV3x has to do fp32 and runs at 1/3 to 1/2 the speed of R3x0, which is running at fp24 all the time. So...the consensus behind fp24 for the API was correct--even for nVidia--in terms of performance-to-rendering-precision benefits, and nVidia would have been wise to heed it. That said, even with partial precision nV3x isn't competitive with R3x0, and so there are things other than fp precision in the architecture which have caused this outcome.

The problem for developers such as Valve is that everybody is expecting a certain degree of DX9 performance from nVidia's hardware because that's what nVidia's been telling everybody to expect all year. Well, the chip is not technically API compliant for DX9 (there may be other things than simply fp24 that it lacks--not clear on that) because it doesn't do fp24, but *must do* fp32, which exceeds the API spec but is overkill for what the specification demands--which not surprisingly kills performance. This puts the developer in the unfortunate situation of having to explain this complex reality to its customers, who have been led to believe something by nVidia about their hardware which is not technically--or accurately--true. So in on order for Valve to explain why its DX9 software runs better on ATi, it's necessary to do what Gabe did.

I agree with Gabe--I would set the game up to natively select DX8.1 for the 5900U, and allow the user to select full DX9 support electively. Even though I spent a lot of time on the mixed codepath--I'd drop it and go with 8.1. default support in this manner, as there doesn't seem to be much if any performance difference between the nV3x path and the DX8.1 path.

It's complex, but I really think the partial-precision being the only workable performance option means that one of the reasons the chip does not perform so well under DX9 is because it does fp32 instead of fp24. That's not the only reason, certainly, but it does contribute.
 
Goodness, you're blessed with the gift of written gab.

Even though half of this past post was nonsense(FP32 doesn't meet compliancy requirements because it exceeds the precision of FP24!?), you've still got a gift.

And for that, I salute you.

p.s. as a public service announcement, I clicked on the smiley banner looking for a 'saluting smiley'. I was treated to 5 full page pop up on window closes. They get the 'annoying banner ad of the week award'.
 
WaltC said:
cthellis42 said:
Oh come now, Walt... They're DX9 compliant all right--it's just that being compliant makes them run at, like, half-speed. ;)

Now, why did I know this question would come up?...:D

OK, here's how I see it...

The API calls for FP24. Not fp16, and not fp32. Even if you argue that fp32 exceeds the API specification, you still must concede that fp32 is not fp24, and is therefore not compliant with the API.

OK, so why fp24 in the API and not fp16 or fp32? Well, the people formulating the API decided fp16 wasn't precision enough for DX9, and that fp32, while certainly precision enough, was unlikely to perform well due to current technical know how and manufacturing processes--and so fp24 was decided upon for DX9 compliance.

ATi goes out and builds an fp 24 part, and nVidia builds a hybrid fp16/32 part, with zero fp24 hardware support. So why did it take this route? Here's what I think...

nVidia felt that going to fp16 would be all the precision they needed with what they projected would be essentially DX8.1 games for the life of the product, but they also decided to sort of double-up in the architecture and throw fp32 in there as well, albeit without enough register/transistor support in the chip to really make fp32 competitive in the 3d gaming market (and it wanted to keep transistor count down as yields were yet to be determined and we all know what those have been this year.) They wanted to be able to talk about fp32 support in the chip as exceeding the API specification, but planned all along to run at fp16--if not FX12. The intention all along was to use fx12/fp16 while its competitors would be using fp24, and nVidia assumed this would provide it with a performance advantage in and of itself. I think this is further butressed by the fact that originally the nv3x drivers didn't do fp32 (since rectified to get WHQL compliance.)

However, ATi comes out with an 8x1 architecture that does only fp 24, and runs it faster than nVidia's 4x2 fp16--so nVidia lost that gambit straight up. But the important thing to me is that the consensus that fp24 should be the API turned out to be correct from the standpoint of performance, and so nVidia screwed itself more or less in trying to one-up everybody else with fx12/fp16, because they made architectural assumptions about their competitors that turned out to be faulty.

So, yes--fp32--as slow as Christmas as it is--exceeds the API specification, but nV3x is not technically in compliance with the API because the chip can't do fp24. So when running full precision DX9 nV3x has to do fp32 and runs at 1/3 to 1/2 the speed of R3x0, which is running at fp24 all the time. So...the consensus behind fp24 for the API was correct--even for nVidia--in terms of performance-to-rendering-precision benefits, and nVidia would have been wise to heed it. That said, even with partial precision nV3x isn't competitive with R3x0, and so there are things other than fp precision in the architecture which have caused this outcome.

The problem for developers such as Valve is that everybody is expecting a certain degree of DX9 performance from nVidia's hardware because that's what nVidia's been telling everybody to expect all year. Well, the chip is not technically API compliant for DX9 (there may be other things than simply fp24 that it lacks--not clear on that) because it doesn't do fp24, but *must do* fp32, which exceeds the API spec but is overkill for what the specification demands--which not surprisingly kills performance. This puts the developer in the unfortunate situation of having to explain this complex reality to its customers, who have been led to believe something by nVidia about their hardware which is not technically--or accurately--true. So in on order for Valve to explain why its DX9 software runs better on ATi, it's necessary to do what Gabe did.

I agree with Gabe--I would set the game up to natively select DX8.1 for the 5900U, and allow the user to select full DX9 support electively. Even though I spent a lot of time on the mixed codepath--I'd drop it and go with 8.1. default support in this manner, as there doesn't seem to be much if any performance difference between the nV3x path and the DX8.1 path.

It's complex, but I really think the partial-precision being the only workable performance option means that one of the reasons the chip does not perform so well under DX9 is because it does fp32 instead of fp24. That's not the only reason, certainly, but it does contribute.
The Dig stands up slack-jawwed and starts clapping slowly with a glazed look of awe on his face

"Good post Walt, really good", the Dig says softly with admiration still clapping softly.
 
RussSchultz said:
Even though half of this past post was nonsense(FP32 doesn't meet compliancy requirements because it exceeds the precision of FP24!?), you've still got a gift.
I have to agree with Russ here. There's nothing wrong with exceeding requirements. nvidia's big fault here is that they can't run PS2.0 at the min requirements (24 bit FP unless the application specifies _pp) at good speeds.
 
What has taken thousands upon thousands of words in the previous posts, OpenGL Guy has summed it all up in 2 lines (across my screen anyway).

Thank you OpenGL Guy.

OpenGL guy said:
RussSchultz said:
Even though half of this past post was nonsense(FP32 doesn't meet compliancy requirements because it exceeds the precision of FP24!?), you've still got a gift.
I have to agree with Russ here. There's nothing wrong with exceeding requirements. nvidia's big fault here is that they can't run PS2.0 at the min requirements (24 bit FP unless the application specifies _pp) at good speeds.
 
WaltC said:
Sxotty said:
Every knows, and complains about how crappy Nv3x stuff is at shaders when it is running them at higher precision than ATI's products, so why doesn't NV just lock it at fp16 and say sorry folks this is the way the cookie crumbles. Like back in the day of 16bit, and 32bit color, and what 24bit was that 3dfx?

I suppose everyone would get really angry and cranky but hey they already are right?

People are upset to discover that the hardware nVidia has been touting, pushing, and selling into the market all year long as "DX9" hardware is not actually DX9 compliant. That is the root and branch of the complaint. Just talking about the fp pipeline and nothing else--fp16 is not DX9 compliant. Period.

Your commentary seems to treat accuracy as an acceptable casualty in a battle to propose your viewpoint. :-?

The partial precision hint, and using fp16, is DX 9 compliant. Period.
Forcing fp16 in place of fp32, without the _pp hint, is not. Period.
The NV3x is DX 9 compliant. Period.

Mixing and matching select bits of phrasings related to these without regard to veracity, to suite your opinion, is not a suitable basis for a "platform of attack", and this manufactured premise seems the only "factual basis" for everything said throughout. Therefore, the rest of your discussion based on this premise seems to be purely rhetorical.

My observation: Even when you do a lot of work with reinforcing wording and posturing to make it sound valid to you, it doesn't actually become any more valid when the premise is flawed. :-?
 
I wasn't aware that using partial precision was part of the Dx9 spec, I assumed that the primary reason HL2 has a seperate path for Nv35 was to enable partial precision, FP16, but as this too wasn't enough for reasonable fps so hand written code had to be used as well.
I would say that Nvidia's biggest failing isn't so much the support if DX9 API but they haven't been able offer an alternative high profile feature, rather they took the route of continuously supplying missinformation and fixed drivers...... (and getting caught doing it most of the time).

Right now we're hearing that Det 5x.xx is the answer to their problems yet all we really know is that they've done exhaustive work blocking anti-cheat software.
 
THe_KELRaTH said:
I wasn't aware that using partial precision was part of the Dx9 spec, I assumed that the primary reason HL2 has a seperate path for Nv35 was to enable partial precision, FP16, but as this too wasn't enough for reasonable fps so hand written code had to be used as well.
As long as you stick with PS 2.0, hand written code is still going to require FP24 (or FP16 if _pp is used). There's no other options.
 
NVIDIA did try and force DX9 PS_2_0 at FP16 but when they annouced it to us games devs (before GFFX was out, I've still got the presentation, that basically says FP16 is here, learn to live with it) we complained LOUDLY (O.K. I may have been shouting louder than most).

MS agreed with us and confirmed that we should consider forced FP16 in PS_2_0. a driver bug.
 
DeanoC said:
NVIDIA did try and force DX9 PS_2_0 at FP16 but when they annouced it to us games devs (before GFFX was out, I've still got the presentation, that basically says FP16 is here, learn to live with it) we complained LOUDLY (O.K. I may have been shouting louder than most).

MS agreed with us and confirmed that we should consider forced FP16 in PS_2_0. a driver bug.

Even after several months I don't think that nVidia is near finished with cursing Microsoft for that decision. ;)
 
Back
Top