Nvidia losing influence due to Ps3 involvement?

I thought that the NV35 doubled the register space over the NV30. There were some tests a while back that seemed to indicate this. It might be difficult to test now, due to the shader compiler's current, much improved state.
 
I highly doubt that NV35 changed NV30 shader core at all. I'd say that the only difference in NV35 was 256-bit memory controller. But that's just my experience, no soild info (and no, i don't belive marketing info about NV35).
 
Ostsol said:
I thought that the NV35 doubled the register space over the NV30. There were some tests a while back that seemed to indicate this. It might be difficult to test now, due to the shader compiler's current, much improved state.
No, the difference between the NV35 and the NV30 was that the NV35 gutted the integer register combiner units of the NV30 and instead added four shiny new FP32 processing units. This brought theoretical shader performance up to where one would expect for a 8-pipeline architecture (as many believed the NV30 to be before launch....and shortly after launch), but it rarely reached that performance due to register pressure and the units being in serial instead of parallel.
 
DegustatoR said:
FP32 is running quickly enough on NV4x hardware. It runs at exactly the same speed as ATI's FP24 if we use the same core clocks. And FP16 gives NV4x a nice performance lead over Radeon's in this case.

But you can't get equivalent clocks if you have to have significantly more transisitors to support FP32 than FP24. It's probably the reason why ATI chose to design around FP24.

DegustatoR said:
There is no 'FP16 transistors'. FP16 is just data which still goes through FP32 ALUs. NVIDIA isn't spending transistors on it. It's just a flexibility feature allowing more perfomance for everyone who wants to fine-tune their shaders. There is virtually no reason to drop it not now, not in the nearest future. Many effects are quite happy with FP16 so why use FP32 for everything? Maybe we should use FP128 for everything just b/c 128 is bigger than 32?

Transistor budget. Nvidia will drop PP at some point in the future when full precision is fast enough. One day the same will be true of FP128, though the visual benefit in terms of the jump from FP64 or FP32 will be much smaller, than the benefit we get from going to FP24 or FP32 (in comparison to FP16).

DegustatoR said:
As everyone in the industry is moving towards FP32

ATI isn't 'everyone'. You're far too biased you know...

I can't think of a single major IHV that either doesn't offer FP32 now, or will be offering FP32 in the next 12-18 months. Can you? In fact Nvidia has already moved to FP32 - they just can't get the performance to use it exclusively yet - but they will.
 
Chalnoth said:
Ostsol said:
I thought that the NV35 doubled the register space over the NV30. There were some tests a while back that seemed to indicate this. It might be difficult to test now, due to the shader compiler's current, much improved state.
No, the difference between the NV35 and the NV30 was that the NV35 gutted the integer register combiner units of the NV30 and instead added four shiny new FP32 processing units. This brought theoretical shader performance up to where one would expect for a 8-pipeline architecture (as many believed the NV30 to be before launch....and shortly after launch), but it rarely reached that performance due to register pressure and the units being in serial instead of parallel.
Yeah, I don't remember where I read it. I couldn't find it via in the forum search.
 
By listening to BZB. You'd think he believes that Fp32 is too slow to run on the NV4x.
 
ChrisRay said:
By listening to BZB. You'd think he believes that Fp32 is too slow to run on the NV4x.

It's certainly too slow for NV3x (which is what we have gone off on a tagent to), though it is getting there for NV4x. However you just have to look at heavy DX9 games like HL2 to see that it still needs improvement to get to the point where you can just use full precision and still get "enough" frames with all the eye-candy turned on on the high end cards.

Once again, Nvidia recommends PP wherever possible because they can't do FP32 fast enough for it not to matter.
 
Bouncing Zabaglione Bros. said:
But you can't get equivalent clocks if you have to have significantly more transisitors to support FP32 than FP24. It's probably the reason why ATI chose to design around FP24.
This is assuming that the transistors in the FP processing units are a large portion of the total number of transistors. I claim that they are not. What's more, how do you explain that the GeForce 6600 GT is, to date, the highest-clocked retail GPU available, with full support for FP32?

Transistor budget. Nvidia will drop PP at some point in the future when full precision is fast enough.
Just like they didn't do with the NV40? Full precision is fast on the NV40. But there's just no such thing as "fast enough."

I can't think of a single major IHV that either doesn't offer FP32 now, or will be offering FP32 in the next 12-18 months. Can you? In fact Nvidia has already moved to FP32 - they just can't get the performance to use it exclusively yet - but they will.
FP16 will always be faster, and there will always be a need for more performance. Your statement is idiotic and juvenile. No, FP16 will only be dropped if and when it can no longer be used commonly with little to no quality loss.
 
Bouncing Zabaglione Bros. said:
Once again, Nvidia recommends PP wherever possible because they can't do FP32 fast enough for it not to matter.
Complete and utter bullshit. FP16 will always be faster. There's no such thing as fast enough.

Why in the hell do you think nVidia actually added FP16-only processing units to the NV40?
 
Chalnoth said:
Bouncing Zabaglione Bros. said:
Once again, Nvidia recommends PP wherever possible because they can't do FP32 fast enough for it not to matter.
Complete and utter bullshit. FP16 will always be faster. There's no such thing as fast enough.

Why in the hell do you think nVidia actually added FP16-only processing units to the NV40?
Because it saves the extra transistors that it would have taken to implement the same thing in FP32?
 
Chalnoth said:
FP16 will always be faster, and there will always be a need for more performance. Your statement is idiotic and juvenile. No, FP16 will only be dropped if and when it can no longer be used commonly with little to no quality loss.

Sure, and 8-bit is faster again, but no one is suggesting we revert back to that. There comes a point where you can hit 200 fps in FP32, and then getting an extra 10 or 20 percent won't be worth the transistor budget, the loss in IQ, or the extra development work for PP. No one is worrying about getting that extra jump from 250 frames to 300 in Q3 anymore - those extra frames are being traded off for IQ - which is as it should be.

We see exactly the same thing every time some older technology is replaced by a newer, faster technology. You don't see developers making 16 bit artwork and designing their games for 640x480 even though it's *always* faster, and developers *always* need more performance, do you?
 
Chalnoth said:
Bouncing Zabaglione Bros. said:
Once again, Nvidia recommends PP wherever possible because they can't do FP32 fast enough for it not to matter.
Complete and utter bullshit. FP16 will always be faster. There's no such thing as fast enough.

Why in the hell do you think nVidia actually added FP16-only processing units to the NV40?

So why don't they just do away with 32 bit and just use FP16 or go FP8? Obviously faster and faster is all you care about, whereas Nvidia, ATI, MS and everyone actually care about trading off that speed for better IQ. There is always going to be a "fast enough" sooner or later. Are you going to worry about sqeezing out an extra 10 percent when you are already getting hundred of frames per second? History tells us that developers don't care once they get what they consider a playable framerate.

Here you're saying that FP16 is always faster, but earlier you took great pains to tell us that FP32 was no slower, and it was only the register pressure that made the real world implementation slower and that it's been fixed with NV40.

Chalnoth said:
FP32 is not inherently any slower than FP24 (at least, not in a deeply-pipelined architecture where latency of the processing isn't important). Just consider that without adding any additional per-pipeline processing power, the transistor count more than doubled from the NV25 to the NV30. Since when should an improvement in featureset cause that kind of change? The transistor count changes from the NV30 to the NV35, and later to the NV40 are a further testament as to how FP32 was not the problem.

So what's the deal here? Is FP16 faster, or is FP32 just as fast? Make your mind up! You can't have it both ways. Or is it "juvenile and idiotic" to claim one thing, and then claim the exact opposite with equal conviction a few minutes later?
 
FP32 is roughly 10% slower than FP16 when long instructions are used. If the instructions are short enough to not affect the Nv4x's registry. Then FP32 can be roughly the same speed BZB.

Ideally the Nv4x can run FP32 at same speed as FP16 when the instructions arent affecting its registry. But it's certainly isnt "Slow"


We see exactly the same thing every time some older technology is replaced by a newer, faster technology. You don't see developers making 16 bit artwork and designing their games for 640x480 even though it's *always* faster, and developers *always* need more performance, do you?

Cant believe you'd even compare FP16/32 to 16 bit and 32 bit color.
 
Bouncing Zabaglione Bros. said:
We see exactly the same thing every time some older technology is replaced by a newer, faster technology. You don't see developers making 16 bit artwork and designing their games for 640x480 even though it's *always* faster, and developers *always* need more performance, do you?
There is a monstrous difference here. FP16 doesn't necessarily reduce the quality of the final image, because the final image is never as high-quality as FP16.

You don't see developers making 16-bit artwork and designing their games for 640x480 because every monitor out there on the market today can display much higher.
 
ChrisRay said:
FP32 is roughly 10% slower than FP16 when long instructions are used. If the instructions are short enough to not affect the Nv4x's registry. Then FP32 can be roughly the same speed BZB.
That doesn't make any sense, ChrisRay. There's no such thing as a long instruction.

From what I can tell, the difference is that when you use FP16, the NV40 is more likely to be able to execute more than one instruction per clock per pipeline.
 
hstewarth said:
I would expect it would be bad for industry if one approach to hard designed is desided - it may make it simpler for Microsoft to designed its support, but it lessen competition in the GPU industry.
It would only be bad if other IHVs were precluded from producing an architecture that was required. The point of this thread was to ask the following....

Microsoft wants to move to a unified shader model for future Windows versions(see Dave’s post). Do you (or anyone) think that because of nVidia involvement with Sony, MS may hasten this requirement, not for technical reasons but for competitive ones? If so, how would this effect nV? Does the R&D done for Sony carry over to PC GPUs (assuming PS3 being separate shaders and PC ones being unified)?
 
Chalnoth said:
ChrisRay said:
FP32 is roughly 10% slower than FP16 when long instructions are used. If the instructions are short enough to not affect the Nv4x's registry. Then FP32 can be roughly the same speed BZB.
That doesn't make any sense, ChrisRay. There's no such thing as a long instruction.

From what I can tell, the difference is that when you use FP16, the NV40 is more likely to be able to execute more than one instruction per clock per pipeline.

Hmm I didnt really mean a long instruction. I actually meant a long shader with numerous instructions. This is from Neeyik's explanation to me though. With short shaders you are less likely to affect the Nv4x's registry verses long shaders (with higher instructions) if thats incorrect. Perhaps you can clarify.
 
Chalnoth said:
DemoCoder said:
For the NV40 they doubled the amount of register space, and reduced the penalty for exceeding it. The result is that FP32 runs at full speed.
Where do you get that the NV40 has double the amount of register space? I remember there was speculation to the tune of this, but I don't remember any hard data or interviews stating it.

It never hurts to attend NVidia events. I was told by an Nvidia engineer that 4 FP temps can be live now without penalty, and that the penalty for exceeding 4 registers is much smaller now. (not 50% speed hit)
 
Back
Top