Opinions needed on this Interview

Dave Baumann · Apr 21, 2005

No, the setup engine sets up triangles (thats its job), it doesn't determine any visibility. The HeirZ is part of the each of the quads (there is one per quad) and its effectively the first part of the render process.

However, as the triangle has been setup the final pixels that it will be rendered to will be known so at this point it is known whether the triangle has to be issued to one or more quads (each of which will determine the visibility of the elements of that triangle its rendering).

trinibwoy · Apr 21, 2005

jb said:
Well take a look at this one:
http://3dcenter.org/artikel/2005/03-31_a_english.php

I know that all of NV interviews are scrubbed by their PR department. And you can see that in some of what you had. Now please I am not saying its bad, wrong, evil, ect. Just it is what it is. If you look at that above ATI interview there seems to be a lot less PR infulence. Some have noticed this trend for awhile now....

Don't think that's a valid comparison. The type of interview questions are completely different. Josh's questions dealt with the implementations and advantages of specific features and comparisons to the competition. That 3dcenter interview resembled a water-cooler chat more than anything else with no specifics about architecture or competing implementations.

Jawed · Apr 22, 2005

Yes, thanks for that Dave, the Hierarchical-Z and the interpolation is a part of the quad-pipeline. Page 8 of the whitepaper, right under my nose.

Why are those black squares, generated by a card with an apparently faulty quad, 16x16 pixels in size and not 2x2? It sure looks like an X800Pro with one bad quad "unlocked":

http://home.insightbb.com/~t.mccoy/wsb/html/view.cgi-image.html--graphic.html

I'll decide, tomorrow, whether to bother describing my ideas on tile-size trade-offs, specifically per-quad texture cache wastage, setup engine triangle-issue thrashing and geometry buffer overheads.

The short version is that the geometry buffer overhead is the least pressing component

Jawed

3dcgi · Apr 22, 2005

Xmas said:
Without reuse, how much would a cache buy you? Bilinear filtering means most texels are used four times. There is massive reuse, especially when magnifying textures. The cache doesn't need to be big, because the rendering process is optimized to take advantage of locality.

Even without reuse a small cache is still useful to hide latency. I guess I wasn't considering bilinear filtering with a nearly 1 to 1 texel/pixel ratio to be reuse even though in hindsight it obviously is. If magnification is happening then there is even more reuse as you point out. I was thinking more along the lines of discounting the original bilinear filtering and is that part of the texture reused with future triangles. I would think the cache doesn't need to be very big to handle most bilinear filtering cases, but there is obviously some benefit in a larger cache if NV40 uses an L2. It seems like I'm rambling so I'm not sure if that made any sense.

Mendel · Apr 22, 2005

Jawed said:
Why are those black squares, generated by a card with an apparently faulty quad, 16x16 pixels in size and not 2x2? It sure looks like an X800Pro with one bad quad "unlocked":

http://home.insightbb.com/~t.mccoy/wsb/html/view.cgi-image.html--graphic.html

In fact that pic would support the

Q1Q2Q1Q2
Q3Q4Q3Q4
Q1Q2Q1Q2
Q3Q4Q3Q4

idea rather than the

Q1Q2Q3Q4
Q1Q2Q3Q4
Q1Q2Q3Q4
Q1Q2Q3Q4

solution you were suggesting.

Notice how between those black squares there is an identical sized correctly rendered square and that on every other row there is no black squares. Probably because the defected quad isnt rendering in that row at all.

dizietsma · Apr 22, 2005

I might have missed it in the article but what was nvidia's response to 4 chips being supported in drivers/hardware , ie 2 cards each with 2 gpu's on them ?

Jawed · Apr 22, 2005

Mendel - how big are those black squares?

What I'm suggesting is that a tile, which is 16x16 pixels in size, is assigned to a single quad. Throughout the lifetime of rendering the frame, only that quad will write pixels there.

The squares can only be black because they've been written by a single quad (though I'm at a loss to explain why only the sky shows the fault! - could it be a fog unit fault in the quad?).

The actual quad-tile assignment ordering isn't material to this discussion. What is material is that regardless of the number of graphics cards rendering a frame, it's possible to identify precisely which quad (and therefore which graphics card) will draw every tile. This doesn't change from frame to frame. It's what enables super-tiling (which I propose uses the same tile organisation scheme as single-card tiling) to operate independently on both graphics cards.

When both graphics cards come to the same triangle (let's say it covers 7 tiles for the sake of argument) the setup engine in each graphics card knows which of its tiles the triangle falls upon, so it will only issue the triangle to those tiles' owning quads. Similarly the setup engine knows that certain tiles are off-limits, so it will never try to issue a triangle to those tiles. The other graphics card will handle those tiles.

If it makes you happy, here's a revised 7-quad tiling, with that 7-tile triangle - note the X800 Pro owns 4 of the tiles

So the setup engine on the X800 Pro issues the triangle to four tiles:

- Red 1
- Orange 1
- Yellow 1
- Red 2

And the X800XT's setup engine issues the triangle to these three tiles:

- Green 1
- Blue 1
- Violet 1

This is clearly a pathological case where the X800 Pro is doing much more work for this triangle than the X800XT. That's how the cookie crumbled...

An odd number of quads is an awkward-looking tiling however you order it. You've got a similarly awkward tiling in a single X800 Pro

Obviously multi-card tiling is neatest when you've got two identical cards...

Jawed

Dave Baumann · Apr 22, 2005

I think there has ben some confusion here - your initial daigrams didn't make it clear that each indiviual square was a pixel; when I've been denoting "Q1" I'm talking about a tile that is assigned to a quad, not a single quad.

I would also suggest that the granulatity of the "SuperTiles" in a multiboard rendering scheme is nowhere near as fine as your suggestion. You are looking at a tile level distribution, but it'll be dealing with Meta-tiles (groups).

Jawed · Apr 22, 2005

Well, if anyone can persuade Eric to lend a little more insight, that would be 8)

Jawed

Mendel · Apr 22, 2005

DaveBaumann said:
I would also suggest that the granulatity of the "SuperTiles" in a multiboard rendering scheme is nowhere near as fine as your suggestion. You are looking at a tile level distribution, but it'll be dealing with Meta-tiles (groups).

And in such a situation, how would you arrange the tiles and the meta-tiles in the case of 3 quad card working together with a 4 quad card?

edit: Is there any other way than to disable the fourth quad in the other card?

Xmas · Apr 22, 2005

3dcgi said:
I would think the cache doesn't need to be very big to handle most bilinear filtering cases, but there is obviously some benefit in a larger cache if NV40 uses an L2.

The L2 cache isn't large, the L1 cache is just very small.

The L2 cache isn't large enough to handle texture repeat cases, or cases where the same texture is used on another triangle non-continually. Well, at least for common texture sizes. However, since NVidia isn't assigning larger tiles (e.g. 16x16) to quad pipelines, neighboring quads are rendered by different pipelines. That means that at the quad edge, both pipelines need to fetch some of the same texels. And as a quad is so small, this happens very often. That's what the L2 cache is for.

JoshMST · Apr 22, 2005

In regards to jb, I think this is sort of a case of apples and oranges between those two interviews. Chris didn't do any trash talking, and I really didn't get the impression of a lot of FUD in there. I am pretty sure that NVIDIA did pursue a supertiling type of rendering scheme, but the reasons Chris gave for them not pursuing it were reasonable. I think that since ATI has a different architecture, supertiling will probably be one of the more effective schemes for them (other than their own version of AFR, which you know they will support).

Now, in the past I have done little fireside chats with Mike Hara of NVIDIA, and he has always been amazingly honest. Definitely a "what you see is what you get" kind of guy. His interviews are done outside of PR, and since he is the head of investor relations, he can pretty much call those shots and not be overruled by PR. So, I can see where you are coming from, but I think that taking a general stance of "NVIDIA PR-washes everything coming out of there" is not entirely correct.

Now, saying that, I wish that NVIDIA allowed their engineers to engage in discussions like SirEric, Terry, and the rest of them do. Though oddly enough, I haven't seen them around for a while? Is this because of issues with this particular board, or has ATI also cut down their ability to respond and interact with the community?

jb · Apr 22, 2005

JoshMST said:
In regards to jb, I think this is sort of a case of apples and oranges between those two interviews.

(and trinibwoy)

Guys I just grabbed the last ATI intview I had a link to. I am sure you guys are just as good with google as I am so you can find as many more as you feel is needed to either see what I was talking about or prove me wrong. Again I am not saying that you wont find a PR inteview by ATI and a more "water cooler chat" with NV as I am sure there are some. Just its a very very well known fact that NV PR has a much tighter control over what is said vrs ATI PR in interviews.

JoshMST · Apr 22, 2005

I see. Not trying to ruffle any feathers jb.

I must admit though, I would very much like to conduct a ATI interview and see if there are any differences in perception between the two. Each company does have its own culture, and NVIDIA does seem very much more "in control". As Rev said, when guys like David Kirk can't really talk to press before going out through PR, that does sort of show how each company responds to these requests.

Hell, if I had to worry about things like intellectual property and the SEC, I would clamp down on this stuff as much as possible!

jb · Apr 22, 2005

JoshMST said:
I see. Not trying to ruffle any feathers jb.

Nope, nor was I. Sorry if it came off like that. Its all good

UPO · Apr 23, 2005

Mendel said:
DaveBaumann said:

I would also suggest that the granulatity of the "SuperTiles" in a multiboard rendering scheme is nowhere near as fine as your suggestion. You are looking at a tile level distribution, but it'll be dealing with Meta-tiles (groups).

Click to expand...

And in such a situation, how would you arrange the tiles and the meta-tiles in the case of 3 quad card working together with a 4 quad card?

edit: Is there any other way than to disable the fourth quad in the other card?

Wondered that too and assuming Jawed is right about 3 quad card allocation of tiles, metatiles could be for example 4x3 or 6x6 tiles.
I guess in that case no disabling would be necessary.

Jawed · Apr 23, 2005

It's just occurred to me that ATI's super-tiling method should work quite well with render to texture.

I'm thinking that each card can render its tiles on the render target independently, since the tiling pattern means the cards don't have to communicate to share the load.

At the end of the render to texture the two cards swap tiles so that they each have a complete copy of the texture. Obviously the texture tile-swap is an overhead.

Does this sound practical? Would render to texture benefit in MVP (or at least be compatible with it), whereas it seems to be a bone of contention in SLI?:

http://www.beyond3d.com/reviews/nvidia/sli/index.php?p=06

Jawed

Xmas · Apr 23, 2005

There's absolutely no difference to SFR in that regard.

Mendel · Apr 23, 2005

UPO said:
Mendel said:

DaveBaumann said:

I would also suggest that the granulatity of the "SuperTiles" in a multiboard rendering scheme is nowhere near as fine as your suggestion. You are looking at a tile level distribution, but it'll be dealing with Meta-tiles (groups).

Click to expand...

And in such a situation, how would you arrange the tiles and the meta-tiles in the case of 3 quad card working together with a 4 quad card?

edit: Is there any other way than to disable the fourth quad in the other card?

Click to expand...

Wondered that too and assuming Jawed is right about 3 quad card allocation of tiles, metatiles could be for example 4x3 or 6x6 tiles.
I guess in that case no disabling would be necessary.

Eh... try to make a picture where there are 4 quads allocated to a grid.

like
Q1Q2Q1Q2Q1Q2Q1Q2
Q3Q4Q3Q4Q3Q4Q3Q4
Q1Q2Q1Q2Q1Q2Q1Q2
Q3Q4Q3Q4Q3Q4Q3Q4
Q1Q2Q1Q2Q1Q2Q1Q2
Q3Q4Q3Q4Q3Q4Q3Q4
Q1Q2Q1Q2Q1Q2Q1Q2
Q3Q4Q3Q4Q3Q4Q3Q4

then try to draw similar size rectangles where 4x3 or 6x6 quad rectangles get selected inside that bigger rectangle... and then try to make it so that every other big rectangle (or metatile) has "Q4 tiles" and every other doesn't.

I just can't see how that can be done!
for example 4x3...
r800pro + r800xt

board 1 board 2 (cross between each meta tile)
____________________
|Q1Q2Q1Q2|Q1Q2Q1Q2|
|Q3Q4Q3Q4|Q3Q4Q3Q4|
|Q1Q2Q1Q2|Q1Q2Q1Q2|
----------------------------
|Q3Q4Q3Q4|Q3Q4Q3Q4|
|Q1Q2Q1Q2|Q1Q2Q1Q2|
|Q3Q4Q3Q4|Q3Q4Q3Q4|
----------------------------
|Q1Q2Q1Q2|Q1Q2Q1Q2|
|Q3Q4Q3Q4|Q3Q4Q3Q4|
|Q1Q2Q1Q2|Q1Q2Q1Q2|
---------------------------

nope. Both boards get "Q4-tiles" r800pro would render black squares or errors there.

Dave Baumann · Apr 23, 2005

If a meta-tile contains an x and y number of board subtiles that it divisible by 6 (6, 18, 24, etc) then you can fit 3 quad boards in with others using the distribution:

Q1Q2Q2
Q3Q1Q3

or similar.

So in a 3 quad / 4 quad arrangement a meta tile could be along the lines of:

Q1Q2Q2Q1Q1Q2
Q3Q1Q3Q3Q3Q1
Q1Q2Q2Q1Q1Q2
Q3Q1Q3Q3Q3Q1
Q1Q2Q2Q1Q1Q2
Q3Q1Q3Q3Q3Q1

for the 3 quad, and

Q1Q2Q2Q1Q1Q2
Q3Q4Q3Q4Q3Q4
Q1Q2Q2Q1Q1Q2
Q3Q4Q3Q4Q3Q4
Q1Q2Q2Q1Q1Q2
Q3Q4Q3Q4Q3Q4

For the 4. The 4 quad board would need to render more metatiles though.

I don't know if this is how they would want to operate, even i they did decide to make two different board performances work in conjunction.

Opinions needed on this Interview

Dave Baumann

Gamerscore Wh...

trinibwoy

Meh

Jawed

3dcgi

Mendel

Mr. Upgrade

dizietsma

Jawed

Dave Baumann

Gamerscore Wh...

Jawed

Mendel

Mr. Upgrade

Xmas

Porous

JoshMST

jb

JoshMST

jb

UPO

Jawed

Xmas

Porous

Mendel

Mr. Upgrade

Dave Baumann

Gamerscore Wh...

Similar threads