Ati's Technology Marketing Manager @ TechReport

Joe DeFuria · Jun 1, 2004

geo said:
What I don't get is the implications of the earlier ATI statement that they were going to cut margins this round by accepting lower yields. .

They didn't say they would accept lower yields. They did say they would accept lower margins, but not "how." We already know that the high end part has the largest die size they every produced, meaning fewer dies per wafer. We also know that ATI is using bleeding edge memory, which they have not done in the past. These will both contribute to lower margins.

and they still took 60m transistors in the shorts compared to NV. What does this mean?

That nVidia is less concerned with cost and power consumption than ATI is...that's all.

Joe DeFuria · Jun 1, 2004

Chalnoth said:
nVidia actually considers all of the transistors in a chip when they report a transistor count. ATI only counts logic transistors, so the R420 and NV40 are quite a bit closer in actual transistor count than that analysis suggests.

Not confirmed, but is irrelevant nonetheless.

Even if that is actually true, it has no bearing on the argument unles you're saying that ATI changes its "counting" metholdolgy from generation to generation.

And also, if true, it means that ATI would pack more transistors per die area anyway...so no reason why ATI couldn't pack in 2 more quads in the same die size as NV40.

It's quite simple, Chal. It's not so much of a matter if ATI could. There's no reason to doubt that. It's just a question of why should they. It's certainly going to cost more in terms or margins, possibly quite a bit more. And with bandiwdth constraints, it gets harder and harder to show real-world benefits of exgtremely high fill-rate to bandwidth ratios, given the lack of games that have lots of shaders.

You simply have to draw your "margin" line somewhere.

Mintmaster · Jun 1, 2004

Thanks, Joe, for taking care of most of the pointless rebuttals Chalnoth made.

The 50% argument does make sense. Somewhere there has to be a cutoff. What if your shader only runs 30 fps at 640x480? It's enough to make you say, "I'm going to try and optimize this further" and "This is hopeless. Let's wait for the next gen". We keep bickering about 5% or even less in the graphics community, so why is 50% suddenly irrelevant?

Very few people will give up one third of their framerate for PS3.0. Fortunately, NVidia has a bigger chip, so that decision doesn't have to be made.

If I made the reverse argument and said NVidia was to be contrained to 160M transistors, they'd probably have to cut out two quads and maybe some vertex pipes, especially knowing that NV35 was 130M transistors. Just see how much both ATI and NVidia were able to get done with their limited transistor increases over their previous high end architectures. The relationship between transistor and pipe counts doesn't seem be a linear one (i.e. not f(an)=af

).

Anyway, none of this matters. Nalalsco basically meant SM3.0 = lower performance with identical budgets. On that point he is right, and JHH reminded us of that.

Chalnoth said:
Try comparing the partial precision numbers, at least. That's what is actually useful for games.

Why? So we can get the lighting quality that we see in NVidia's rendering of FarCry? Are you kidding?

If I may quote David Kirk, Chief Scientist of your favourite IHV:

FP24 is too much precision for pure color calculations and its not enough precision for geometry, normal vectors or directions or any kind of real arithmetic work like reflections or shadows or anything like that.

I don't necessarily agree with his statement for FP24, but FP16 will often be too little for normals, especially when there's a high specular component. Just not enough mantissa bits. Look at the ATI car shader demo, and you can envision what only 3 more bits would do. ATI said in their shading paper that even 11 or 12 bits isn't enough, and I agree.

Isn't this a little too hypocritical? Aren't you one of the most adamant advocates of 32-bit precision?

Finally, why the hell are you arguing with me? I'm agreeing with the points you made about Nalasco's replies, but just trying to clear some things up about what he's saying. You're mostly right about the geometry instancing, about advancing technology, about SM3.0 benefits, about looping, etc.

Geo · Jun 2, 2004

Joe DeFuria said:
geo said:

and they still took 60m transistors in the shorts compared to NV. What does this mean?

Click to expand...

That nVidia is less concerned with cost and power consumption than ATI is...that's all.

Fair enuf. I'm reminded that Orton made a comment about "who steps down first" in his interview with Wavey. Possibly ATI got a bit of a lesson this time around about how determined NV is (which, btw, is a different lesson than how well or poorly NV implements/executes that determination). ATI has some high cards to play in that debate as well, centered around their proven record in power efficiency --however, that is also an "all other things being equal" point.

Bjorn · Jun 2, 2004

Joe DeFuria said:
We already know that the high end part has the largest die size they every produced, meaning fewer dies per wafer. We also know that ATI is using bleeding edge memory, which they have not done in the past. These will both contribute to lower margins.

The new high end chip always has "the largest die size they ever produced", R300 much larger then the R200 and so forth. So i don't think you can use that as an argument for lower margins. At at least not if you look over the entire lifespan of the card. What i think will be more important is that they now have competition compared to the R300 vs NV3X situation which was a slam dunk for Ati.

Xmas · Jun 2, 2004

FUDie said:
Xmas said:

ATI suggests R420 has a stencil/Z-op advantage over NV40 when multisampling is enabled. But apart from the higher clock speed, it has not.

Click to expand...

You don't think the 25% advantage in clock speed the X800 XT has over the NV40 Ultra warrants the claim?

-FUDie

Which claim? That X800XT is (at least theoretically) faster than NV40 Ultra when doing stencil operations with MSAA enabled? Yes.
This claim:

One advantage, I think, that we have in this capability over the GeForce 6800 series is that we can expose this capability even when color writes are enabled, so it's not limited to situations where you're doing Z-only or stencil-only reads and writes.

No.

Xmas · Jun 2, 2004

WaltC said:
Yes, but why not help eliminate the confusion? Web sites like TR doing interview questions with ATi and asking about "two pixels per clock per pipe," etc.--without apparently understanding that it's "ops" per clock instead of "pixels per clock"--is reason enough to be precise with the terminology, imo... I mean, it gets pretty bad when the major hardware review sites expose their ignorance on simple but fundamental topics like this.

Besides "ops" is much simpler (not to mention much more accurate) than saying "32 black & white pixels per clock," and the term "ops" is already sufficiently distinct from "samples" so that we don't need to call ops "pixels," right?...

Clarity and precision seem woefully absent around this topic of late. The axiom that "pixels do not equal ops" and vice-versa seems very easy to understand--which is why I find it so puzzling that the two terms are so often used interchangeably as if they were the same thing. Might as well make statements like, "R420 does 520MHz per clock of pixel pipes."...

I did not say black & white pixels, but pixels without color data. And no, I don't think "ops" is more accurate, because it's actually multiple ops. And "pixels" at least implies that multiple samples are grouped together with MSAA enabled, "ops" doesn't say anything about that.
The most correct term would perhaps be fragment. But if you insist that only pixels shown on screen are real pixels, then what word would you use for the content of rendertargets that never get shown on screen?

WaltC · Jun 2, 2004

Xmas said:
I did not say black & white pixels, but pixels without color data. And no, I don't think "ops" is more accurate, because it's actually multiple ops. And "pixels" at least implies that multiple samples are grouped together with MSAA enabled, "ops" doesn't say anything about that.
The most correct term would perhaps be fragment. But if you insist that only pixels shown on screen are real pixels, then what word would you use for the content of rendertargets that never get shown on screen?

I sure wouldn't call them pixels. I'd probably call them SPEs, for Sub-Pixel-Elements, or something similar...

Here's the way I see it:

"ops" is the plural for "op" which is an abbreviation of the word "operation," and none of these is congruent in meaning with the word "pixel." Thus, to say "multiple ops" is redundant, right?--since that's what you mean when you say "ops." A "pixel" is the smallest element rendered to screen. There are no smaller subdivisions of elements rendered to screen below the pixel.

The purpose of a pixel pipeline is to render the pixels to screen that we see. Everything else that goes on in the pipeline during the creation of that pixel, all of the "operations" that are done on that pixel pre-render, are parts of the pixel creation process--a process that has only one goal: the rendering of a final color pixel to screen. A pixel pipeline may only render a maximum of one completed pixel per clock, which is why we need 16 physical pixel pipelines working in parallel to render 16 pixels per clock. If you want 32, you need 32 physical pipelines, etc. All of this is "3d 101," right?

The number of "ops" that can be applied to each pixel in the pre-render stage is not dependent on the number of pixel pipelines present in a gpu, but is dependent on other architectural hardware within the gpu relative to each of the pixel pipelines. Thus some gpus may be able to perform more "ops" on each pixel prior to rendering them than another, but the maximum amount of pixels rendered per clock is still capped in both architectures by the number of physical pixel pipes in the gpu. For instance, the specification ATi publishes as to the 16-pixel-pipe x800 xt PE is 8.3Gigapixels, and since ATi publishes the MHz clock at 520:

(Go here http://apps.ati.com/ATIcompare/ and select the XT PE and hit the "next" button to get the specs)

...we can conclude that at full speed the XT PE is capable of an absolute maximum of 16 pixels per clock (520x16 = 8.32 Gpixels/sec)

Here is what ATi said in a recent TR interview when asked about rendering two pixels per clock per pipe:

http://techreport.com/etc/2004q2/nalasco/index.x?pg=1

TechReport interview with Nalasco said:
TR: One of the presentations about the X800 said each pixel pipe can produce two pixels per clock, much like the NV30 and NV40 chips can, provided that multisampling antialiasing is enabled. How does this work?

Nalasco: One of the capabilities of our chip is that we have, at the end of our pipeline, the capability to do Z testing and stencil testing, and each pipeline has its own dedicated hardware to do this. When you're doing multisample AA, you're actually testing the Z values and stencil values at multiple locations within a pixel. We built the capability into our hardware that if you have more than one sample per pixel, we can actually do more Z or stencil operations per clock than we can when you're just doing a single test per pixel. The maximum number we can do is two per pipeline in a single clock. Even with the lowest AA setting, which is 2X, we're able to achieve this maximum of 32 Z or stencil operations per clock.

One advantage, I think, that we have in this capability over the GeForce 6800 series is that we can expose this capability even when color writes are enabled, so it's not limited to situations where you're doing Z-only or stencil-only reads and writes. The relative limitation is that we can't make use of this feature when AA is not enabled. However, the belief was that for this class of card, there should be very few cases where there would be any reason not to use antialiasing, and therefore, this wasn't really that big of a limitation.

Note that although the TR question inaccurately asserts "two pixels per clock per pipe" in the question for both the ATi and nVidia products mentioned (TR should have stated, "My impression of an x800 presentation I saw was that..."), Nalasco politely overlooks that erroneous assertion and provides an answer that nowhere states that that x800 does "two pixels per clock per pipe." Instead, the meat of Nalasco's answer is here:

We built the capability into our hardware that if you have more than one sample per pixel, we can actually do more Z or stencil operations per clock than we can when you're just doing a single test per pixel. The maximum number we can do is two per pipeline in a single clock. Even with the lowest AA setting, which is 2X, we're able to achieve this maximum of 32 Z or stencil operations per clock.

So what is it Nalasco actually states can be done in pairs per clock in each pipeline--is it the production of two pixels? Nope, it's "Z or stencil operations"--not pixels--when you're doing more than one sample per pixel--and this brings us squarely back to "ops," doesn't it?

Congruent with our conversation, the following things are not pixels:

Z operations
Stencil operations
fragments
sub-pixel elements...etc., and etc.

In short, lots of things are done inside a pixel pipeline in the process of final pixel creation--which is the sole purpose for a pixel pipeline. Among those things--operations--are Z and stencil operations, which are included in the final output of each pixel pipe, a maximum of one pixel per clock.

Sometimes, depending on the ops that need to be done on the pixel as it is being created in the pipeline, a pixel pipeline may deliver less than one pixel per clock (such as a pixel every two clocks, etc.), but my point here is that at no time will a pixel pipeline ever render more than one, final, pixel per clock.

Maybe we're just debating semantics, here, I don't know. But I have to state, yet again, that in my opinion it is exceedingly unwise and misleading to equate things like Z and stencil operations performed during pixel creation with the rendered pixels themselves. I mean, since color and/or textures are also applied to pixels during their pre-render creation, calling a Z op or a Stencil op a "pixel" strikes me as no more accurate than calling a texture or a color a "pixel." Seems pretty straightforward to me and always has...

Ati's Technology Marketing Manager @ TechReport

Joe DeFuria

Joe DeFuria

Mintmaster

Geo

Mostly Harmless

Bjorn

Xmas

Porous

Xmas

Porous

WaltC

Similar threads