NV40 and programmable tile sizes

Wavey said:
R420 adopts the same type of quad dispatch system, which is how the system was easily extended to 4 quads, however it has been slightly altered to allow for programmable tile sizes so the load balancing between the pipes can be controlled in a much finer way and potentially altered according to resolution.
Anyone know if NV40 supports a comparable feature?
 
According to the way I read David Kirk's comments when I asked about this it seems that NV40 doesn't tile, it will just dispatch each quad from a triangle as a quad pipeline is available.

That may not actually be correct, but thats certainly how it appeared to be explained to me.
 
Interesting...I recall reading something to that effect in your NV40 preview.

Assuming Dave is correct, what would be the potential advantages/disadvantages of either (NV40/R420) approach?
 
FWIW Don't know about NV40, but you could change the tile width on NV2A, and primitives were rendered tile by tile.
 
Luminescent said:
Interesting...I recall reading something to that effect in your NV40 preview. Any possible advantages/disadvantages to NV40's or R420's approach?

Well, its clear that ATI's method is how they are able to distribute processing across multiple chips.
 
Luminescent said:
Assuming Dave is correct, what would be the potential advantages/disadvantages of either (NV40/R420) approach?
Given that the R420 seems to have significantly higher pure fillrate than the NV40, and yet the NV40 tends to do very well by comparison in actual games, I would tend to think that nVidia's way is more efficient.
 
It seems to me that this could affect how the memory controler(s) are organized. I.e. does each quad pipe need access to all portions of memory or just its tiles.
 
Chalnoth said:
Luminescent said:
Assuming Dave is correct, what would be the potential advantages/disadvantages of either (NV40/R420) approach?
Given that the R420 seems to have significantly higher pure fillrate than the NV40, and yet the NV40 tends to do very well by comparison in actual games, I would tend to think that nVidia's way is more efficient.
Why am I not surprised that you think that NVIDIA is better? We don't even have confirmation, yet you're (again) ready to claim NVIDIA is superior. Did it ever occur to you that there are other limiting factors? Did it occur to you that the bandwidth is very similar on the boards?

Thought not.

-FUDie
 
Did it ever occur to either of you that certain optimizations perform differently in different scenarios and what may be a win on is a loss on others?
 
DemoCoder said:
Did it ever occur to either of you that certain optimizations perform differently in different scenarios and what may be a win on is a loss on others?

I'm not clear about what you are refering to when you mention "..certain optimisations..".

What kind of optimisations are we talking about?
Bandwidth saving? Fillrate saving? Static clip planes?

Your generalisation is correct according to common sense but I'm sure people would love it if you expanded your explanation a little.
 
DemoCoder said:
Did it ever occur to either of you that certain optimizations perform differently in different scenarios and what may be a win on is a loss on others?
Of course, but I feel it's a bit premature to claim that "nvidia's method is superior" given that we haven't even seen independent support for the claims made.

-FUDie
 
DemoCoder said:
Did it ever occur to either of you that certain optimizations perform differently in different scenarios and what may be a win on is a loss on others?

Combine that with Wavey's bandwidth comment and for me personally this case is closed 8)
 
Luminescent said:
Assuming Dave is correct, what would be the potential advantages/disadvantages of either (NV40/R420) approach?

I'm not sure how the R420 works exacly, but on the R300 it is pre-determined (altough it might be configurable) which quad pipeline can process which screen pixels.
In other words the screen (RT) is divided amongst the pipelines.

The advantage is that if you have large enough continuous areas (tiles) assigned for a single quad pipe than you can get away with not sharing the texture cache between the pipes - which results in a much simpler architecture - which results in less transistors, etc.

For the NV40 method you have to have a shared texture cache - hence the two level texture caching approach.
 
Ailuros said:
DemoCoder said:
Did it ever occur to either of you that certain optimizations perform differently in different scenarios and what may be a win on is a loss on others?

Combine that with Wavey's bandwidth comment and for me personally this case is closed 8)


I don't get it.
 
I don't get it.
Me either. The X800 XT has more bandwidth and greater fillrate, makes better use of compressed textures, has more levels of occulussion detection, is much more straight forward to tune (using more specialized units and based off a well established architecture), and yet still loses in more benches than you would think. AF filtering seems to be it's real strength. Any idea why the NV40 is making such a good account of itself?
 
It seems NV40 has a smart hardware shader scheduler and a very balanced set of ALUs. Lack of a dedicated texture addressing unit in its pixel pipelines holds it back, however its ability to keep up with the X800 speaks for how well it load balances, particularly in ALU intesive tasks.
 
Back
Top