David Kirk on HDR+AA

Status
Not open for further replies.
I won't bother responding to this thread any more.

The original argument was that a 128-bit memory bus is perfectly adequate still today. The proof is in the 6600 GT. An FX 5900U with the NV4x architecture and the original memory bus would fare significantly better than the FX 5900U (obviously). So comments of the type "Nvidia *needed* a 256-bit bus to even compete!" are quite silly. There is a 128-bit part out that can more than compete.

Since the core/memory clocks on the FX 5900U and the 6600 GT are the same, you can't even say that the 6600 GT has a clock advantage over the FX 5900 U. It doesn't.

Obviously, one can construct a benchmark that shows that the 9800 XT is 60% faster than the 6600 GT. That's not the point. The point isn't that in some artificial benchmark one chip is better than another. I can trivially come up with a benchmark where the 6600 GT is over 4x faster than the 9800 XT.

The point is that if games (you know, the thing we usually use GPUs for) aren't bandwidth starved with 8 fragment pipes, there is no needfor a 256-bit memory bus.

So far, no one seems to have managed to disprove my point, which is also DK's point: No need for a 256-bit memory bus when you're only processing <= 8 fragments. Can you show me a game that's memory bandwidth limited on the 6600 GT?


So whats faster the x800xt pe or the 6600gt ?
Obviously, the X800 XT PE. It can process about 2x the fragment and 2x the geometry per second. Please, have a look at my posts in context: We're looking at the specific quote from David Kirk about the need for 256-bit memory busses. Please don't confuse the issue.


Edit: Corrected some sentense structures so that make a little more sense.
 
Bob said:
We're looking at the specific quote from David Kirk about the need for 256-bit memory busses. Please don't confuse the issue.
Ok, let's spin it around a bit then....do you see 256-bit memory busses as a bad thing/overkill?
 
do you see 256-bit memory busses as a bad thing/overkill?
I don't think it's an overkill when you have 16 fragment pipes to service. I do think it's an overkill when there are only 8. And certainly so when there are only 4!

12 fragment pipes is a little trickyer. A 192-bit bus might be a good balance there, but that's not likely to happen due to the additional complications of managing the memory.
 
Bob said:
I won't bother responding to this thread any more.
That's a relief.
The original argument was that a 128-bit memory bus is perfectly adequate still today. The proof is in the 6600 GT. An FX 5900U with the NV4x architecture and the original memory bus would fare significantly better than the FX 5900U (obviously). So comments of the type "Nvidia *needed* a 256-bit bus to even compete!" are quite silly. There is a 128-bit part out that can more than compete.

Since the core/memory clocks on the FX 5900U and the 6600 GT are the same, you can't even say that the 6600 GT has a clock advantage over the FX 5900 U. It doesn't.
What does this prove? Little. You're comparing the memory controller of two completely different generations.
Obviously, one can construct a benchmark that shows that the 9800 XT is 60% faster than the 6600 GT. That's not the point. The point isn't that in some artificial benchmark one chip is better than another. I can trivially come up with a benchmark where the 6600 GT is over 4x faster than the 9800 XT.
Actually, I think you have no point at all. You pointed to results where the 9800 XT is nearly twice as fast as the 6600 GT, and now you say they are not important, what a surprise. The case you refer to (4x AA w/ 8x AF) is very bandwidth intensive. If the game is not shader bound, this bandwidth hit will be felt.
The point is that if games (you know, the thing we usually use GPUs for) aren't bandwidth starved with 8 fragment pipes. So a 256-bit memory bus is not really needed.
But they are starved, especially when AA and AF are used.
So far, no one seems to have managed to disprove my point, which is also DK's point: No need for a 256-bit memory bus when you're only processing <= 8 fragments. Can you show me a game that's memory bandwidth limited on the 6600 GT?
You've done this already, but it's not important. :rolleyes:
 
nAo said:
Ailuros said:
If there are going to be PS3 games that support 1080p then they'd have to be pretty simplistic in a relative sense and I wouldn't even think of float HDR in such a high resolution.
Think about running at half the frame rate (30 fps) but with per pixel motion blur techniques to fake a higher frame rate..
1080p -> 2 MPixels at 30 fps -> 60 Mpixel to shade per second
So we have 550*48/60 = 440 DP4 per pixel per frame (ovedraw 1)
So we can do a lot of work per pixel too..

Motion blur or motion trail? If it's the latter I'm not in the least interested.

As for Supersampling in 720p, how many samples exactly? Sounds equally optimistic to me.
1080p -> 720P = 1.5*1.5 = 2.25x AA :LOL: [/quote]

1080p belongs in my book still to the usual marketing fluff, until I see it in a real demanding game on PS3.

Shall I assume 2.25x OGSS...dear God if yes :roll


Are you talking about XBOX360, PS3 or both?

Both.
 
nAo said:
Too bad I can't see why XBOX360 titles can't afford some SSAA too if PS3 titles will render to 1080p resolution.

(bold is mine ;) )

Does the answer lie with the number of ROPs? C1 only has 8 ROPs right? But it kinda sounds so far like the RSX has as many ROPs as the 7800GTX, which is....16?

Just a thought...not sure if it's right.
 
Ok, just for me and all the rest of the kind of thicky people out there, this whole argument pretty much boils down to:

OpenGL guy said:
Bob said:
The point is that if games (you know, the thing we usually use GPUs for) aren't bandwidth starved with 8 fragment pipes. So a 256-bit memory bus is not really needed.
But they are starved, especially when AA and AF are used.
Right?

If so, I'm going to go with OpenGL guy. My experiences with cards/games and just the fact that I'm pretty sure he knows what he's talking about makes me say that. ;)
 
Next we finally settle who was a better team, the '27 Yankees or the '76 Big Red Machine. We might have a little pitching mound issue, and a few others, but we'll get there, not to worry.
 
Maybe. If that was always the case, the 9800 XT should be trouncing the 6600 GT, as it has 60% more memory bandwidth. Turns out that apart from that one data point in JK2 (which is dubious - just look at the 9700 Pro score), that extra memory bandwidth is pretty much irrelevent.

If you're only doing single-texturing (or single textureing with so few math ops such that a loop-back isn't necessary), with no aniso, then 8 color +Z fragments / clock requires 512 bits (256 with DDR) of output, ignoring AA, and ~1024 bits (512 with DDR) for both input and output.

Texture caches, output merging and color compression can bring this number down significantly. That's where most of the IHV mojo goes in: How to manage the memory bandwidth you have (or more correctly, how to manage the bandwidth you don't have). Caches, write combiners, compression, and taking advantage of locality can get you down to just a few bytes of accesses / fragment / clock.

As soon as Aniso / trilinear kicks off, or you do more interesting things like use multiple textures, or a non-trivial shader, then you're way down from 8 fragments / clock. You could easily be running at just 0.5 fragments per clock.

On interesting shaders (Doom3, Far Cry, HL2), the 6600 GT is more than adequate. Considering it's on-par (sometimes faster, sometimes slower) than the 9800 Pro/XT, then I think it's safe to say that the extra GPU work keeps the memory bandwidth requirements down.


OpenGL guy is free to disagree, and we'll leave it at that. ATI can overbuild all the parts it wants. I would even encourage it.
 
digitalwanderer said:
If so, I'm going to go with OpenGL guy. My experiences with cards/games and just the fact that I'm pretty sure he knows what he's talking about makes me say that. ;)

Well it was settled a long time ago on the last page when Bob claimed that for an 8-pipe card, a 256-bit bus is not necessary which he backed up with some 6600GT/9800Pro comparisons. But of course it was misconstrued to mean that 256-bit buses are evil, then came the onslaught (with no proof mind you). The man made a valid point backed up with real game benchmarks - normally that's enough no?
 
Over built 9700/9800?
Take a look at 9800 pro SE's benchmarks if you think the 256 bit bus on a 8 pipe card is overkill.
 
Take a look at 9800 pro SE's benchmarks if you think the 256 bit bus on a 8 pipe card is overkill.
Now that's an interesting data point! Do you have a link? What does the 9800 SE look like?

Edit: I'm assuming that 9800 SE is the same as the 9800 Pro SE. Feel free to correct me if I'm wrong.

It seems that the 9800 SE only has 4 fragment pipelines (Link) and comes with either a 128 or a 256-bit memory bus. So it's not really a good comparison to the 6600 GT or the 9800 Pro / XT.
 
Bob said:
Take a look at 9800 pro SE's benchmarks if you think the 256 bit bus on a 8 pipe card is overkill.
Now that's an interesting data point! Do you have a link? What does the 9800 SE look like?
er.. actually the 9800 pro 128 bit bus :oops:
9800 pro se is 1 disabled quad, but a 9800 pro with an 128 bit bus all the cheaper 9800;s you see today.
 
Bob said:
er.. actually the 9800 pro 128 bit bus
I can't seem to find references to it with Google. Do you have a link?
use these keywords
"9800 pro 128-bit"
you should fine pleanty.
sappire started making 9800 pros with only 128 bit bus to cut costs, then power color did the same thing.
The problem is that most people assumed they were 256 bit models, since they didn't have much warning.
You can likely find some forum posts with benchmarks, but not real reviews.
Anyway, they're alot slower going by forum post benchmarks, as one would guess.
Of course the 9800 pro's memory speed is alot slower
Also look at 6600GT vs 6800 at high res with fsaa, they have similar fillrates, yet the 6800 has alot more bandwidth.
Look how the 9800 pro cars against the 6600GT in older style games, such as UT2003/4, in that game the cards are good match.
http://www.anandtech.com/video/showdoc.aspx?i=2277&p=7
 
I am very unimpressed with what Kirk had to say. nVidia will change its tune before too long. Longhorn's purported standardization of feature set will likely see to that.

I could see the one coming, though.
 
trinibwoy said:
digitalwanderer said:
If so, I'm going to go with OpenGL guy. My experiences with cards/games and just the fact that I'm pretty sure he knows what he's talking about makes me say that. ;)

Well it was settled a long time ago on the last page when Bob claimed that for an 8-pipe card, a 256-bit bus is not necessary which he backed up with some 6600GT/9800Pro comparisons. But of course it was misconstrued to mean that 256-bit buses are evil, then came the onslaught (with no proof mind you). The man made a valid point backed up with real game benchmarks - normally that's enough no?

The problem is the bit width of a memory interface is only one fact. It would be one thing if both ran at the same frequency, but they do not. The 9800Pro is rated as 680MHz (I believe 340MHz actual) and the 6600GT is rated at 1000MHz.

And then you have all the issues of generational parts to consider.

Looking back, 256bit memory was no NV30's only problem and even a jump in memory bandwidth did not cure all its ills.

But looking on the ATI side (e.g. 256bit 9700 vs 128bit 9500Pro) the memory bandwidth was an issue and a limiting factor in games that tended to be bandwidth limited before hitting other bottlenecks. ATI's high end parts would have been neutered without 256bit memory interfaces. Ditto the R420 and NV40 series cards. And obviously the 6600GT has the advantage of time as that memory speeds were significantly faster in 2004 compared to 2002 when the 9700 shipped.

If NV30/R300 had access to the memory bandwidth available on the 6600GT with an 128bit interface they would have obviously jumped at it.
 
Bob said:
Maybe. If that was always the case, the 9800 XT should be trouncing the 6600 GT, as it has 60% more memory bandwidth. Turns out that apart from that one data point in JK2 (which is dubious - just look at the 9700 Pro score), that extra memory bandwidth is pretty much irrelevent.
I see, the 9700 score being out of line means the rest are as well. Good way to discard evidence that goes against your argument, Bob. Marketing background perhaps?
If you're only doing single-texturing (or single textureing with so few math ops such that a loop-back isn't necessary), with no aniso, then 8 color +Z fragments / clock requires 512 bits (256 with DDR) of output, ignoring AA, and ~1024 bits (512 with DDR) for both input and output.

Texture caches, output merging and color compression can bring this number down significantly. That's where most of the IHV mojo goes in: How to manage the memory bandwidth you have (or more correctly, how to manage the bandwidth you don't have). Caches, write combiners, compression, and taking advantage of locality can get you down to just a few bytes of accesses / fragment / clock.

As soon as Aniso / trilinear kicks off, or you do more interesting things like use multiple textures, or a non-trivial shader, then you're way down from 8 fragments / clock. You could easily be running at just 0.5 fragments per clock.
You talk a good game, but I don't think you understand what you've said.

Let's take 4x AA for example. Let's say that the HW can do 2 AA samples per clock per pixel (i.e. 2x AA is "free"). This means that the pixel rate is halved when 4x AA is enabled, however the bandwidth requirement is quadrupeled. Even with compression and caches, you can still be using double the bandwidth of non-AA rendering.

Granted, AF slows things down immensely, assuming your driver is actually doing the requested AF. ;) Maybe you would know something about this, Bob?
On interesting shaders (Doom3, Far Cry, HL2), the 6600 GT is more than adequate. Considering it's on-par (sometimes faster, sometimes slower) than the 9800 Pro/XT, then I think it's safe to say that the extra GPU work keeps the memory bandwidth requirements down.
This has already been said. New games are more shader-bound, older games less so. Maybe the reason the 6600 GT fares so poorly in JK2 is because it's based on older technology?
OpenGL guy is free to disagree, and we'll leave it at that. ATI can overbuild all the parts it wants. I would even encourage it.
Wow, that's a low blow. The 9800 didn't seem overbuilt when it came out, did it? Also, I don't recall the 9800 competing against the 6600, I guess NVIDIA feels the need to compete against two year-old parts in order to feel special.
 
And obviously the 6600GT has the advantage of time as that memory speeds were significantly faster in 2004 compared to 2002 when the 9700 shipped.
Both the GeForce FX 5800 Ultra and the GeForce 6600 GT run their memory at 500 MHz. Although, the FX 5800 was indeed released in early 2003. The real memory difference is GDDR-2/3 (NV43) vs DDR-2 (NV30). I don't know how that affects the results.

Building an NV43 minus SM3.0 back in 2002 wouldn't be too much of a stretch. It's basically on a similar process as NV30 (TSMC 0.11 is a cost-reduction over TSMC 0.13, not an actual process node), with roughtly the same transistor count (120 million vs 143 million (probably ~110 million without SM3.0 - assuming SM3.0 took the supposed 60 million transistors on NV40, and scaling down for 8 fragment pipes)). 500 MHz clocks were achievable for both core and memory at the time.

Granted, that mythical NV43 would likely also require a leaf blower to cool it, but it would be no slouch, even with a 128-bit memory bus.


Edit:

I guess NVIDIA feels the need to compete against two year-old parts in order to feel special.
Nah, beating the X700 XT will suffice ;)

This means that the pixel rate is halved when 4x AA is enabled, however the bandwidth requirement is quadrupeled.
ATI doesn't have color compression? I guess that's why you'd want a wider bus...

Good way to discard evidence that goes against your argument, Bob. Marketing background perhaps?
I didn't discard it (hell, I'm the one who brought it up!). But I would like to see that particular benchmark redone, just to rule out any funnyness. The 9700 Pro should be only slightly slower than the 9800 Pro.

New games are more shader-bound, older games less so. Maybe the reason the 6600 GT fares so poorly in JK2 is because it's based on older technology?
Perhaps. Do you have Quake 3 benchmarks comparing the FX 5800 / 6600 GT and eithe 9700 Pro or 9800 Pro/XT? What about 3DMark 01? Any other benchmark you'd like to bring up?


PS: I'm not in marketting.
 
Acert93 said:
The problem is the bit width of a memory interface is only one fact. It would be one thing if both ran at the same frequency, but they do not. The 9800Pro is rated as 680MHz (I believe 340MHz actual) and the 6600GT is rated at 1000MHz.

That's true but when you factor in everything, the 9800P still has a bandwidth advantage and I think Bob's point is that in spite of the bandwidth deficit the 6600GT is able to go head-to-head with the 9800.

I do agree with your point that at the memory speeds available to them at the time, a 256-bit bus was the only way to maximize the potential of R300. But if I understand correctly, the point here is that the 6600GT would not benefit from a 256-bit bus or else we would have seen it being outgunned by the 9800......
 
Status
Not open for further replies.
Back
Top