R350/NV35 Z Fillrate with FSAA

Ailuros said:
The only thing we get for free, even on today's accelerators is bilinear filtering, to the contrary what nifty PR yadda yadda will state.

Hehe, never thought of it that way, but it's true :)

Although if you wanted to be even more picky, you could say that with very complex skinning, with loads of bones, even huge Pixel Shader programs would be "free" due to stalls, hehe. But that isn't true in all conditions - so you're right, obviously.

Oh, and another thing which is free: outputting multiple Z & Color pixels per clock ;) Joking, hehe.


Uttar
 
Well you get essentially AA with minimal performance penalty in completely CPU bound applications too.

OT: I think it's high time for free trilinear don't you think?
 
Tridam said:
The first GeForce had free trilinear.
Even trilinear filtering takes up some memory bandwidth. Even though the original GeForce took no fillrate hit from enabling trilinear, the memory bandwidth requirements made for some performance hit.

And as for the Savage4, I think it only offered free trilinear because it created the lower-detail MIP map on the fly (I think it was the Savage4...either that or ATI's Rage 128), which was far from "true" trilinear, so the lack of a performance hit really wasn't that exciting.
 
Chalnoth said:
And as for the Savage4, I think it only offered free trilinear because it created the lower-detail MIP map on the fly (I think it was the Savage4...either that or ATI's Rage 128), which was far from "true" trilinear, so the lack of a performance hit really wasn't that exciting.
Savage3D did this (I have the brochure!) but I've no idea if
(a)that is the same chip as Savage4 or
(b) whether they changed it later.

Simon
 
Chalnoth said:
And as for the Savage4, I think it only offered free trilinear because it created the lower-detail MIP map on the fly (I think it was the Savage4...either that or ATI's Rage 128), which was far from "true" trilinear, so the lack of a performance hit really wasn't that exciting.
It's called "box filtering" and it was first introduced on the Savage 3D. And why wasn't it exciting? It looked just like "true" trilinear as long as your mipmaps were box filtered mipmaps. In other words, it looked great 99% of the time (the 1% being when colored mipmaps were enabled) and was very fast.

In fact, I even made a case for making that the default filter mode for Savage MX because it looked better than bilinear and it came at no performance cost.
 
Derived from the specs rampage was a 4-pipeline chip with one decoupled z/stencil ROP per pipeline, which is enough for n-sample AA if a pixel takes n or more cycles to calculate, but becomes a bottleneck if it takes less. But if you write 4 pixels with 32-bit color and z/stencil, that's 256 bits anyway.

NVidia went the other way with GF3, implementing Z-compression and putting 16 (iirc, maybe 8 ) ROPs in the chip. But they also found another task for them: early Z. A GF3 can discard 16 pixels per clock when running without AA.



Ailuros,
there's a lot of other stuff for free :D
Alpha testing, perspective correction, gouraud shading, fog, ...
 
Ailuros,
there's a lot of other stuff for free
Alpha testing, perspective correction, gouraud shading, fog, ...

Ack I know. Bilinear was just the first that popped into my mind.

But if you write 4 pixels with 32-bit color and z/stencil, that's 256 bits anyway.

I wish we could really put a heavy layer of cement over that darned vaporware thing, but for what it's worth that's why I always said that the specific claim was more for the dual chip high end model.

It's called "box filtering" and it was first introduced on the Savage 3D. And why wasn't it exciting? It looked just like "true" trilinear as long as your mipmaps were box filtered mipmaps. In other words, it looked great 99% of the time (the 1% being when colored mipmaps were enabled) and was very fast.

In fact, I even made a case for making that the default filter mode for Savage MX because it looked better than bilinear and it came at no performance cost.

With the exeption of the fact that it was operative only with TC enabled, was that method any different to what KYRO did?

And as for the Savage4, I think it only offered free trilinear because it created the lower-detail MIP map on the fly (I think it was the Savage4...either that or ATI's Rage 128), which was far from "true" trilinear, so the lack of a performance hit really wasn't that exciting.

I don't see why it isn't true trilinear by definition. And since we're here (and it affects both IHVs) I'd rather have a well tuned 16x sample on the fly fetch from one mipmap, than the "optimisations" we get fed today. Do I really need to elaborate any further on that one? I want trilinear in one clock and I don't really care how they'll do it.
 
Ailuros said:
I wish we could really put a heavy layer of cement over that darned vaporware thing, but for what it's worth that's why I always said that the specific claim was more for the dual chip high end model.

The very fact that its features are still relevant says worlds about 3dfx's top-tier engineers. We're talking IN STORES April 2001 tech here. 8)

And the specific claim can easily apply to single chip. Why would it only be available in dual chip? The comparative performance would stay the same between modes, long as memory didn't overflow (and it could - frame buffer would take more space per chip)
 
Tagrineth said:
Ailuros said:
I wish we could really put a heavy layer of cement over that darned vaporware thing, but for what it's worth that's why I always said that the specific claim was more for the dual chip high end model.

The very fact that its features are still relevant says worlds about 3dfx's top-tier engineers. We're talking IN STORES April 2001 tech here. 8)

And the specific claim can easily apply to single chip. Why would it only be available in dual chip? The comparative performance would stay the same between modes, long as memory didn't overflow (and it could - frame buffer would take more space per chip)

Oh pleaaaaase. The majority of those engineers are working at NV for ages now and that's why I said that the ROP trick has been adopted and gone through evolutionary changes.

I don't care anymore about stuff that never had been and never will be, non period. Alternatively I could also start a senseless hypothetical and constant babble about the long gone KYRO3. And it all is and will be in the theoretical realm, since none of us really ever had any of the prementioned vaporware seen in real time or run through an extensive test or analysis.

As for dual chip, look at the total amount of bits Xmas mentioned, it might then make more sense. There's always a bottleneck somewhere.
 
Tag: Without wanting to be picky or anything...

There's one very fundamental difference between the R3xx and Rampage designs for this thing. The Rampage only had one unit per pipeline, while the R3xx got 2.

Doesn't sound signifiant? Sure, it isn't very signifiant. But in my eyes, 1 per pipeline is just unbalanced. Remember there were probably even less usage of pixel shaders & multitexturing at the time. One unit per pipeline would probably, and this is just a guestimate, give 25% of the fillrate hit of SSAA. Which is still considerable.

Although you mentionned some tricks they did for AF. They'd do some type of "performance AF" free in those conditions or something? If so, then obviously, it makes a lot more sense to me. Of course, in our time frame, that type of AF would be unaccaptable, but it'd have been an excellent overall design with that.

Also, I think saying "in stores April 2001" seems optimistic to me. I don't think they ever had Sage working. What proofs us they wouldn't have had more problems with it than with Rampage?
Yes, yes, I'm a pessimist, hehe :p Still, even if they had one or two months delay ( I know I'm being picky here ) , I'd still have been impressive they had that in such a time frame.

Although as Xmas says, nVidia uses those 4 units per pipeline for Z rejects too, which is a nifty idea that Rampage doesn't have. But again, ATI found a very ingenious fix to that: Hierarchical Z. They don't need as many Z units with that! :)

Which really makes me think that this ideas in Rampage's case seem too reliant on an unrealistic wish for the developers to quickly move to many layers multitexturing and complex pixel shading.

Because that trick, even though it's very nifty, with only one unit, simply cannot stand few layers of texturing or no shading. And I'm thinking here that 3DFX thought developers would move quickly to shaders for a lot of things, making one unit loopback perfectly acceptable.
Something that didn't quite happen, as we all know.

Could be wrong, of course. This certainly tells that there were some amazing engineers at 3DFX - but does it mean Rampage would have benefited of that trick as much as the R300? Not at all, IMO. Kinda makes Rampage look as some type of heritage, hehe :)


Uttar
 
I think you're both wrong when it comes to pipelines and Z or ROPs in that case (read your PM Uttar).

Back on topic:

Dave (or anyone else),

Is the R3xx dynamically reconfiguring it's Z/stencil units during the loops or not? If I'd be hard-pressed to guestimate I'd say no, but can't be certain either.
 
Ailuros said:
It's called "box filtering" and it was first introduced on the Savage 3D. And why wasn't it exciting? It looked just like "true" trilinear as long as your mipmaps were box filtered mipmaps. In other words, it looked great 99% of the time (the 1% being when colored mipmaps were enabled) and was very fast.

In fact, I even made a case for making that the default filter mode for Savage MX because it looked better than bilinear and it came at no performance cost.

With the exeption of the fact that it was operative only with TC enabled, was that method any different to what KYRO did?
I can't say as I have no idea how the KYRO implemented the feature. However, if the KYRO did it the same way as Savage 3D/4, then they'd better be careful of patent infrigement lawsuits :)
 
Here's an older statement from Kristof for an old tech-report review:

Rather than reading from 2 pre-generated MipLevels in memory the lower less detailed MipLevel is generated on-the-fly by the chip from the higher more detailed level. So rather than accessing 4 texels in 2 maps one accesses 16 texels in the upper map which allows the generation of the 4 high level and 4 low level values. Now because Quake colors the different MipLevels differently the color band effect fails since KYRO only access 1 miplevel to create the fast trilin effect.

So what you see is trilinear but a fast implementation which accesses only one texture (the high detailed miplevel). Since the lower level is auto generated by assuming that the lower level is generated from the higher level using a 2 by 2 filter. Now in the case of Quake Color Bands the lower levels are different from the higher levels, they do not have the 2 by 2 filter relation and thus the Fast Implementation seems to fail. Real games with real texture contents do use this 2 by 2 pixel filter relation between the MipLevels and thus the fast implementation is technically correct.

As I said only trilinear + TC.
 
Ailuros said:
Is the R3xx dynamically reconfiguring it's Z/stencil units during the loops or not? If I'd be hard-pressed to guestimate I'd say no, but can't be certain either.
Good question.

First, dynamic allocation need not be occuring: if z/stencil is done first, then pixel data sits in a buffer, and then pixel rendering is performed, then there is no reason for dynamic allocation. Both pipelines will be as full as they are allowed to be. I'd say that's a much simpler way to get things done than any attempt to dynamically manage, say, the connections between multiple pipelines "on the fly." Plus it makes early z-reject almost trivial.

I don't know how the R3xx would do it, though. It might just perform two z checks at the beginning of each rasterization stage, with loopback forced if the number of FSAA samples exceeds the number of clocks it takes to render the pixel.
 
Back
Top