No, the difference between the NV35 and the NV30 was that the NV35 gutted the integer register combiner units of the NV30 and instead added four shiny new FP32 processing units. This brought theoretical shader performance up to where one would expect for a 8-pipeline architecture (as many believed the NV30 to be before launch....and shortly after launch), but it rarely reached that performance due to register pressure and the units being in serial instead of parallel.Ostsol said:I thought that the NV35 doubled the register space over the NV30. There were some tests a while back that seemed to indicate this. It might be difficult to test now, due to the shader compiler's current, much improved state.
DegustatoR said:FP32 is running quickly enough on NV4x hardware. It runs at exactly the same speed as ATI's FP24 if we use the same core clocks. And FP16 gives NV4x a nice performance lead over Radeon's in this case.
DegustatoR said:There is no 'FP16 transistors'. FP16 is just data which still goes through FP32 ALUs. NVIDIA isn't spending transistors on it. It's just a flexibility feature allowing more perfomance for everyone who wants to fine-tune their shaders. There is virtually no reason to drop it not now, not in the nearest future. Many effects are quite happy with FP16 so why use FP32 for everything? Maybe we should use FP128 for everything just b/c 128 is bigger than 32?
DegustatoR said:As everyone in the industry is moving towards FP32
ATI isn't 'everyone'. You're far too biased you know...
Yeah, I don't remember where I read it. I couldn't find it via in the forum search.Chalnoth said:No, the difference between the NV35 and the NV30 was that the NV35 gutted the integer register combiner units of the NV30 and instead added four shiny new FP32 processing units. This brought theoretical shader performance up to where one would expect for a 8-pipeline architecture (as many believed the NV30 to be before launch....and shortly after launch), but it rarely reached that performance due to register pressure and the units being in serial instead of parallel.Ostsol said:I thought that the NV35 doubled the register space over the NV30. There were some tests a while back that seemed to indicate this. It might be difficult to test now, due to the shader compiler's current, much improved state.
ChrisRay said:By listening to BZB. You'd think he believes that Fp32 is too slow to run on the NV4x.
This is assuming that the transistors in the FP processing units are a large portion of the total number of transistors. I claim that they are not. What's more, how do you explain that the GeForce 6600 GT is, to date, the highest-clocked retail GPU available, with full support for FP32?Bouncing Zabaglione Bros. said:But you can't get equivalent clocks if you have to have significantly more transisitors to support FP32 than FP24. It's probably the reason why ATI chose to design around FP24.
Just like they didn't do with the NV40? Full precision is fast on the NV40. But there's just no such thing as "fast enough."Transistor budget. Nvidia will drop PP at some point in the future when full precision is fast enough.
FP16 will always be faster, and there will always be a need for more performance. Your statement is idiotic and juvenile. No, FP16 will only be dropped if and when it can no longer be used commonly with little to no quality loss.I can't think of a single major IHV that either doesn't offer FP32 now, or will be offering FP32 in the next 12-18 months. Can you? In fact Nvidia has already moved to FP32 - they just can't get the performance to use it exclusively yet - but they will.
Complete and utter bullshit. FP16 will always be faster. There's no such thing as fast enough.Bouncing Zabaglione Bros. said:Once again, Nvidia recommends PP wherever possible because they can't do FP32 fast enough for it not to matter.
FP16 will always be faster.
Because it saves the extra transistors that it would have taken to implement the same thing in FP32?Chalnoth said:Complete and utter bullshit. FP16 will always be faster. There's no such thing as fast enough.Bouncing Zabaglione Bros. said:Once again, Nvidia recommends PP wherever possible because they can't do FP32 fast enough for it not to matter.
Why in the hell do you think nVidia actually added FP16-only processing units to the NV40?
Chalnoth said:FP16 will always be faster, and there will always be a need for more performance. Your statement is idiotic and juvenile. No, FP16 will only be dropped if and when it can no longer be used commonly with little to no quality loss.
Chalnoth said:Complete and utter bullshit. FP16 will always be faster. There's no such thing as fast enough.Bouncing Zabaglione Bros. said:Once again, Nvidia recommends PP wherever possible because they can't do FP32 fast enough for it not to matter.
Why in the hell do you think nVidia actually added FP16-only processing units to the NV40?
Chalnoth said:FP32 is not inherently any slower than FP24 (at least, not in a deeply-pipelined architecture where latency of the processing isn't important). Just consider that without adding any additional per-pipeline processing power, the transistor count more than doubled from the NV25 to the NV30. Since when should an improvement in featureset cause that kind of change? The transistor count changes from the NV30 to the NV35, and later to the NV40 are a further testament as to how FP32 was not the problem.
We see exactly the same thing every time some older technology is replaced by a newer, faster technology. You don't see developers making 16 bit artwork and designing their games for 640x480 even though it's *always* faster, and developers *always* need more performance, do you?
There is a monstrous difference here. FP16 doesn't necessarily reduce the quality of the final image, because the final image is never as high-quality as FP16.Bouncing Zabaglione Bros. said:We see exactly the same thing every time some older technology is replaced by a newer, faster technology. You don't see developers making 16 bit artwork and designing their games for 640x480 even though it's *always* faster, and developers *always* need more performance, do you?
That doesn't make any sense, ChrisRay. There's no such thing as a long instruction.ChrisRay said:FP32 is roughly 10% slower than FP16 when long instructions are used. If the instructions are short enough to not affect the Nv4x's registry. Then FP32 can be roughly the same speed BZB.
It would only be bad if other IHVs were precluded from producing an architecture that was required. The point of this thread was to ask the following....hstewarth said:I would expect it would be bad for industry if one approach to hard designed is desided - it may make it simpler for Microsoft to designed its support, but it lessen competition in the GPU industry.
Yes of course, I had a brain fart. I was mixing up the FP24 vs. FP32 debate.Chalnoth said:Erm, partial precision is the only way FP16 could be supported.
Chalnoth said:That doesn't make any sense, ChrisRay. There's no such thing as a long instruction.ChrisRay said:FP32 is roughly 10% slower than FP16 when long instructions are used. If the instructions are short enough to not affect the Nv4x's registry. Then FP32 can be roughly the same speed BZB.
From what I can tell, the difference is that when you use FP16, the NV40 is more likely to be able to execute more than one instruction per clock per pipeline.
Chalnoth said:Where do you get that the NV40 has double the amount of register space? I remember there was speculation to the tune of this, but I don't remember any hard data or interviews stating it.DemoCoder said:For the NV40 they doubled the amount of register space, and reduced the penalty for exceeding it. The result is that FP32 runs at full speed.