NV35 to have all that was wrong in NV30 fixed?

Chalnoth said:
Because the texel rate of the 9500 non-Pro is half that of the Pro.

Thats the theory that fits your bias, true.
And i am totally willing to admit that it has some bearing on performance.
But i think you are kidding yourself if you honestly think that 4x2 is just as good as 8x1, even given the bandwidth constraints. I seem to remember you on the other side of this issue when the Radeon had 3 TMUs per pipe....and the GF only 2. Strange.
 
Shaderman-

Welcome to the board! Most people don't seem to make such informed statements in their first 4 posts. :)

But...

NV is dead dead dead (at least for this year).

As far as the high-end of the consumer discrete space goes, you're probably right. (Assuming your well-justified guess that NV35 is still 128-bit is indeed correct.) In terms of corporate strategy, I don't think so. There was an interesting comment (from Jen-Hsun I think) to the effect that Nvidia's 2003 revenues are expected to be roughly distributed as follows:
  • 15% notebook GPUs
  • 15% integrated (nForce)
  • 15% workstation
  • 20% XBox
  • 35% all discrete consumer GPUs

NV30 may have few to no advantages over R300 for the consumer space (and a lot of disadvantages), but NV30GL has a lot of significant advantages over everything else for the DCC space, where FP32 support actually means something (and may in fact be close to a necessity, although I can't judge this); where it matters not one whit if ARB_fragment_program performance sucks compared to NV_fragment_program performance (and where DX9 doesn't even exist); where the extra costs of a 12-layer board, expensive (though in this case not jet-engine) fan, big expensive chip, etc. are negligible.

Even if it was a brilliantly successful design, clearly superior to whatever ATI had to offer, NV30 (by which I mean the high-end segment only--the equivalents of GFfx 5800 and 5800 Ultra) would be extremely hard pressed to account for 15% of Nvidia's total revenues. Plus, as high-margin as the high-end consumer cards may be, that's nothing compared to the margins on the workstation cards (essentially the same part for 3x the price). If Nvidia can really wrap up 15% of their revenues at those sorts of margins, they'd be idiots to let concerns about the high-end consumer space get in their way.

In case it's not clear, my new theory: instead of designing a high-end consumer chip with an afterthought workstation variant, with NV30/NV30GL Nvidia designed a workstation chip with an afterthought high-end consumer variant. And, if those revenue projections are at all accurate, it was a brilliant decision.

So if NV30 was really targeted at the DCC space and not the consumer space, what does that say about NV31/NV34 performance and design? I dunno, maybe nothing. Certainly the 3DMark03 scores posted by the Inq are both plausible and none too impressive. Still, the fact remains that for quite some time Nvidia will be the only one with a DX9-compliant part in range of the mainstream OEM market (i.e. sub-$100). And assuming the 250/200 clocks we've seen for Chaintech's card are correct, perhaps actually quite nicely into the mainstream OEM market (sub-$80?). Will a DX9-compliant card with hardly enough performance to run DX8-era games fly as an OEM part? I dunno. My guess, though, is a disillusioned yes.

"Dead, dead, dead" looks dubious, dubious, dubious.
 
But i think you are kidding yourself if you honestly think that 4x2 is just as good as 8x1, even given the bandwidth constraints.

Do you really think there is any reasonable in-game situation where GFfx 5800 (much less 5800 Ultra) will be
  1. Using the fixed-function pixel pipeline only (i.e. no pixel shaders)
  2. Fillrate-limited
  3. Applying an odd number of bilinear-filtered textures to much of anything
  4. Averaging <70 fps?

Me either. Realize of course that the 3rd requirement effectively rules out any anisotropic filtering, which IIRC tends to be applied by filtering an even number of bilinear/trilinear filtered texels.

So can we all just give up on this 4x2 whining and focus on the problems with pixel shader throughput?

No, because we have no idea what they are or if they even exist. NV_fragment_program performance seems to be just fine. And there is no reason I can think of why adequate drivers won't raise PS 1.4+/ARB_fragment_program performance near those levels.

But isn't it a little late for "adequate drivers"? After all GFfx is not really "pre-release" anymore.

No, because it's still pre-the-release of any significant game to make significant use of PS 1.4+ or PS 1.4+ equivalent ARB_fragment_programs. For right now the only real uses of PS 1.4+ equivalent functionality on NV30 are:
  1. DCC/workstation
  2. 3DMark03
  3. Game development
#1 will be using NV_fragment_program or something like Cg compiled to NV_fragment_program. #2...well, #2 has its own set of drivers. ;)

#3 is the only legitimate problem I can see with the current situation, but presumably most developers already bought 9700's!

If pixel shader performance still sucks by the time PS 1.4+ functionality really starts becoming a significant issue, then we can start concluding something about the underlying hardware. At the moment I think it is justly low on Nvidia's priority list.
 
But it didn't have unequivocally superior image quality. It had better FSAA, but, most especially at default settings, it didn't have the texture quality of the GeForce2 GTS (the GTS actually supported 2-degree anisotropic). The V5 was often hailed as the board for people who played flight sims, racing games, and the like. But it really did fall behind in every area but FSAA quality, particularly when it came to advanced 3D features.

Regardless, I still very clearly remember being referred to a Voodoo3 by a person in a Software ETC. store when I was purchasing a GeForce DDR.

You are right, there are some people who made informed purchase decisions for the Voodoo5, but I'm willing to bet that the vast majority who purchased them did so under the wrong pretense.
 
Althornin said:
Thats the theory that fits your bias, true.
And i am totally willing to admit that it has some bearing on performance.
But i think you are kidding yourself if you honestly think that 4x2 is just as good as 8x1, even given the bandwidth constraints. I seem to remember you on the other side of this issue when the Radeon had 3 TMUs per pipe....and the GF only 2. Strange.
At the time I was wondering what you'd want to use that third TMU for. Now I know, of course. But that quote you just replied to had nothing to do with 8x1 vs. 4x2. It had to do with the inaccuracy of claiming that the FX would have a significant benefit moving to 8x1 because there is a difference between the 9500 Pro and 9500 non-Pro.

Now, back to the GeForce2. This chip was effectively a 4x1 architecture. With no bandwidth-savings techniques to speak of, the GeForce2 managed to have the same performance with bilinear filtering as with trilinear filtering, despite the fact that it had half the theoretical fillrate with trilinear filtering enabled.

But, regardless, that's past history. Can you dispute my claims on a real argument rooted in today? Put up or shut up.
 
Chalnoth said:
I don't think there's a religion out there that believes that happens (except in very specific, rare cases...).
It's a rare Beyond3D post that makes me think of Glenn Hoddle...
 
Dave H,
Yes, ever since the vertex performance figures have been reported, the nv30 has struck me as an excellent part for the "non-Cinematic shading" professional market, and it is only in the apparent lack of raw shader processing power that they seem to have fallen down for the latter (which I expect to be adequately addressed in the nv35). The problem as I see it is that whatever their approach it may have resulted in a significant failure for the near future (unless they both maintain their vertex processing lead in comparison to the R350 and make sizeable profits from the workstation parts) in the consumer space, which I certainly don't think was their intent, so discussing the failure in that regard is still quite relevant (EDIT: though "dead, dead, dead" is a bit of an exaggeration :p ).
But discussing the strengths for professional applications is just as relevant.

BTW, I really don't have any significant doubt the NV35 is 256-bit, even before MuFu stated it so strongly.

I also have some serious doubts about nv34 dx9 functionality in regards to exceeding the Radeon 9000 "DX 9-alike" functionality, but we've had that discussion before, and I've also guessed another alternative is PS 2.0 functional shaders without floating point support.
 
Dave H said:
Do you really think there is any reasonable in-game situation where GFfx 5800 (much less 5800 Ultra) will be
  1. Using the fixed-function pixel pipeline only (i.e. no pixel shaders)
  2. Fillrate-limited
  3. Applying an odd number of bilinear-filtered textures to much of anything
  4. Averaging <70 fps?

In willing to bet that running UT2K3 with anisotropic filtering would greatly benefit from running as 8x1 vs. 4x2....
 
Chalnoth said:
But it didn't have unequivocally superior image quality. It had better FSAA, but, most especially at default settings, it didn't have the texture quality of the GeForce2 GTS (the GTS actually supported 2-degree anisotropic). The V5 was often hailed as the board for people who played flight sims, racing games, and the like. But it really did fall behind in every area but FSAA quality, particularly when it came to advanced 3D features.

Fair enough, but it's also worth noting that the V5, with its AA, could give an overall image quality that a GF2 couldn't even begin to touch. I should know since I spent hours comparing the two. Enable AA and push the LOD bias into the negative and the V5's texture quality was considerably better than that of a GF2 (which, no matter what settings were used, exhibited horrific amounts of texture shimmering). And if you cared at all about 2D quality, there was no comparison. Lastly, I play a lot of games and there were no titles out that took advantage of the GF's more advanced 3D features (though it could be argued that 3dfx could be thanked for holding developers back with their very aged, over-used core), yet there were titles such as NOLF and Deus Ex that ran considerably better on the V5 than even a GF2 Ultra. Unfortunately for 3dfx's sales, these titles weren't used as benchmarks. But they deserved to get their asses handed to them in Q3 scores since they could never be bothered to write a decent OpenGL driver.
 
John Reynolds said:
But they deserved to get their asses handed to them in Q3 scores since they could never be bothered to write a decent OpenGL driver.
I think it wasn't that at all. It just simply didn't have the fillrate like the GF2 had at the time. Not to mention the Glide/OpenGL scenario.
 
Dave H said:
But i think you are kidding yourself if you honestly think that 4x2 is just as good as 8x1, even given the bandwidth constraints.

Do you really think there is any reasonable in-game situation where GFfx 5800 (much less 5800 Ultra) will be
  1. Using the fixed-function pixel pipeline only (i.e. no pixel shaders)
  2. Fillrate-limited
  3. Applying an odd number of bilinear-filtered textures to much of anything
  4. Averaging <70 fps?

Me either. Realize of course that the 3rd requirement effectively rules out any anisotropic filtering, which IIRC tends to be applied by filtering an even number of bilinear/trilinear filtered texels.

While I agree with the gist of your point about the 3rd item, i don't get some of the others.

Fixed-function pixel pipeline only: That's good description of UT2003 and games inspired by the philosophy behind it, as the degree to which the programmable pipeline is used is bound to the (non-nvidia) pixel pipeline specification.

Fillrate limited: See above, with high resolutions in mind, though your comment makes more sense in conjunction with your last point about "< 70 fps".

Taken together, the 3rd point is the only really relevant one AFAICS, and I tend to agree to an extent because of that, but as I've said elsewhere, the central issue is that "NV_fragment_program performance seems to be just fine"...for a 4 pipeline part.
 
Joe DeFuria said:
In willing to bet that running UT2K3 with anisotropic filtering would greatly benefit from running as 8x1 vs. 4x2....
Why? The benefit gets much smaller the more complex the shader/texture sampling gets. When you're doing AF, you're usually texture sampling limited, so 8x1 has no advantage over 4x2 in this case.
 
they can still have 4х2 config but twice speedup ps20 and 14 (i/e/ floating point) simply by change scheme from

[tex][tex]/[fp alu]
[int stage alu]
[int stage alu]

to

[tex][tex]
[float/int stage alu]
[float/int stage alu]

still 4 pixels per tact but 8 pixel in process at all pipe, and always 2 fp per clock per one of 4 pipe
 
Dave H said:
NV30 may have few to no advantages over R300 for the consumer space (and a lot of disadvantages), but NV30GL has a lot of significant advantages over everything else for the DCC space, where FP32 support actually means something (and may in fact be close to a necessity, although I can't judge this)

Actually, I believe the NV30GL succeeds because it seems to have a very fast fixed function pipeline, while the R300 doesn't. DCC space is behind game space in use of shaders, and I'd imagine that FP32 support means absolutely nothing in it. DCC people don't change programs as quickly as consumers change games. Traditional lighting of a lot of polygons is still a major factor.
 
Dave H said:
In case it's not clear, my new theory: instead of designing a high-end consumer chip with an afterthought workstation variant, with NV30/NV30GL Nvidia designed a workstation chip with an afterthought high-end consumer variant. And, if those revenue projections are at all accurate, it was a brilliant decision.

If I remember well, SA comfirmed this exact design filosophy shift at Nvidia sometime ago...


Rodrigo
 
horvendile said:
Chalnoth said:
Now, back to the GeForce2. This chip was effectively a 4x1 architecture.
I thought it was 4x2 (thus justifying the GTS addendum)? Am I missing anything in the "effectively"?
It was only 4x2 with bilinear filtering (4x1 with trilinear), but it didn't have the memory bandwidth to actually perform like a 4x2 chip. So, I suppose it may be more accurate to say that it was closer to a 2x2 chip effectively (The GeForce2 GTS was quite possibly the most memory bandwidth-limited card of all time).

Remember how the original Radeon, with its 2x3 architcture, actually didn't do very bad against the GeForce2 GTS?
 
Back
Top