NV40 and programmable tile sizes

Luminescent · May 6, 2004

Wavey said:
R420 adopts the same type of quad dispatch system, which is how the system was easily extended to 4 quads, however it has been slightly altered to allow for programmable tile sizes so the load balancing between the pipes can be controlled in a much finer way and potentially altered according to resolution.

Anyone know if NV40 supports a comparable feature?

Dave Baumann · May 6, 2004

According to the way I read David Kirk's comments when I asked about this it seems that NV40 doesn't tile, it will just dispatch each quad from a triangle as a quad pipeline is available.

That may not actually be correct, but thats certainly how it appeared to be explained to me.

Luminescent · May 6, 2004

Interesting...I recall reading something to that effect in your NV40 preview.

Assuming Dave is correct, what would be the potential advantages/disadvantages of either (NV40/R420) approach?

ERP · May 6, 2004

FWIW Don't know about NV40, but you could change the tile width on NV2A, and primitives were rendered tile by tile.

Dave Baumann · May 6, 2004

Luminescent said:
Interesting...I recall reading something to that effect in your NV40 preview. Any possible advantages/disadvantages to NV40's or R420's approach?

Well, its clear that ATI's method is how they are able to distribute processing across multiple chips.

KimB · May 7, 2004

Luminescent said:
Assuming Dave is correct, what would be the potential advantages/disadvantages of either (NV40/R420) approach?

Given that the R420 seems to have significantly higher pure fillrate than the NV40, and yet the NV40 tends to do very well by comparison in actual games, I would tend to think that nVidia's way is more efficient.

muzz · May 7, 2004

Well it's about time NV can say that......

Not that it really matters.

Dave Baumann · May 7, 2004

I would say thats more related to bandwidth than anything else.

3dcgi · May 7, 2004

It seems to me that this could affect how the memory controler(s) are organized. I.e. does each quad pipe need access to all portions of memory or just its tiles.

FUDie · May 7, 2004

Chalnoth said:
Luminescent said:

Assuming Dave is correct, what would be the potential advantages/disadvantages of either (NV40/R420) approach?

Click to expand...

Given that the R420 seems to have significantly higher pure fillrate than the NV40, and yet the NV40 tends to do very well by comparison in actual games, I would tend to think that nVidia's way is more efficient.

Why am I not surprised that you think that NVIDIA is better? We don't even have confirmation, yet you're (again) ready to claim NVIDIA is superior. Did it ever occur to you that there are other limiting factors? Did it occur to you that the bandwidth is very similar on the boards?

Thought not.

-FUDie

DemoCoder · May 7, 2004

Did it ever occur to either of you that certain optimizations perform differently in different scenarios and what may be a win on is a loss on others?

K.I.L.E.R · May 7, 2004

DemoCoder said:
Did it ever occur to either of you that certain optimizations perform differently in different scenarios and what may be a win on is a loss on others?

I'm not clear about what you are refering to when you mention "..certain optimisations..".

What kind of optimisations are we talking about?
Bandwidth saving? Fillrate saving? Static clip planes?

Your generalisation is correct according to common sense but I'm sure people would love it if you expanded your explanation a little.

FUDie · May 7, 2004

DemoCoder said:
Did it ever occur to either of you that certain optimizations perform differently in different scenarios and what may be a win on is a loss on others?

Of course, but I feel it's a bit premature to claim that "nvidia's method is superior" given that we haven't even seen independent support for the claims made.

-FUDie

DemoCoder · May 7, 2004

I agree FUDie. We just don't know.

martrox · May 7, 2004

DemoCoder said:
I agree FUDie. We just don't know.

Thank you, DC........

Ailuros · May 7, 2004

DemoCoder said:
Did it ever occur to either of you that certain optimizations perform differently in different scenarios and what may be a win on is a loss on others?

Combine that with Wavey's bandwidth comment and for me personally this case is closed 8)

Hyp-X · May 7, 2004

Luminescent said:
Assuming Dave is correct, what would be the potential advantages/disadvantages of either (NV40/R420) approach?

I'm not sure how the R420 works exacly, but on the R300 it is pre-determined (altough it might be configurable) which quad pipeline can process which screen pixels.
In other words the screen (RT) is divided amongst the pipelines.

The advantage is that if you have large enough continuous areas (tiles) assigned for a single quad pipe than you can get away with not sharing the texture cache between the pipes - which results in a much simpler architecture - which results in less transistors, etc.

For the NV40 method you have to have a shared texture cache - hence the two level texture caching approach.

DemoCoder · May 7, 2004

Ailuros said:
DemoCoder said:

Did it ever occur to either of you that certain optimizations perform differently in different scenarios and what may be a win on is a loss on others?

Click to expand...

Combine that with Wavey's bandwidth comment and for me personally this case is closed 8)

I don't get it.

Rockster · May 7, 2004

I don't get it.

Me either. The X800 XT has more bandwidth and greater fillrate, makes better use of compressed textures, has more levels of occulussion detection, is much more straight forward to tune (using more specialized units and based off a well established architecture), and yet still loses in more benches than you would think. AF filtering seems to be it's real strength. Any idea why the NV40 is making such a good account of itself?

Luminescent · May 7, 2004

It seems NV40 has a smart hardware shader scheduler and a very balanced set of ALUs. Lack of a dedicated texture addressing unit in its pixel pipelines holds it back, however its ability to keep up with the X800 speaks for how well it load balances, particularly in ALU intesive tasks.

NV40 and programmable tile sizes

Luminescent

Dave Baumann

Gamerscore Wh...

Luminescent

ERP

Dave Baumann

Gamerscore Wh...

KimB

muzz

Dave Baumann

Gamerscore Wh...

3dcgi

FUDie

DemoCoder

K.I.L.E.R

Retarded moron

FUDie

DemoCoder

martrox

Old Fart

Ailuros

Epsilon plus three

Hyp-X

Irregular

DemoCoder

Rockster

Luminescent

Similar threads