David Kirk on HDR+AA

Status
Not open for further replies.
For the lazy out there, here is a review of a 9800PE 128 bit mem bus and yes its slower.
http://www.tweaktown.com/document.php?dType=review&dId=704

What realy hurts a 128 bit mem bus is using AA and AF. With 16XAF and 4X AA the 9800pro will give higher numbers then a X700 PRO and a 6600GT. The 6600GT and X700PRO will out do the 9800pro as long as AA and AF are not used and I have seen reviews show this.
EDIT
There is not many reviews that test AA and AF out there for 6600GT AGP. Most of the test are done with out AA and AF and if AA and AF is used its only for Doom3. The few test that have AA and AF used thats not Doom3 show the 9800pro wining the tests with res 1280X1024 and up.
 
Ailuros said:
Motion blur or motion trail? If it's the latter I'm not in the least interested.
Per pixel motion blur.
You render a couple buffers, one is the 'official' color buffer, the second one is a velocity buffer.
In a second pass you use the velocity buffer to blur (per pixel) the color buffer..
Obviously it's a fake..it doesn't work properly when objects intersects and in other many cases..

Shall I assume 2.25x OGSS...dear God if yes :roll
I know.. :devilish: but it's better than nothing at all..
 
trinibwoy said:
But if I understand correctly, the point here is that the 6600GT would not benefit from a 256-bit bus or else we would have seen it being outgunned by the 9800......

Is it? The original response was to me, and my first two on this were firmly in the context of 5800 vs R300 and Kirk's bus-width statement re what was required for top-end cards of the day. Then I moved to 5900 as further evidence that even NV belatedly conceded that 256-bit was necessary at the time for the top-end. And things got really wiggy from there.

But maybe Bob was trying to make a much narrower point. <shrugs>
 
I've even heard that some developers dislike the unified pipe, and will be handling vertex pipeline calculations on the Xbox 360's triple-core CPU."

A little off topic, but this comment seemed a little odd. I would suspect that if you are developing for cell and want to port to xbox you would do this.

I thought the R500 looked like it could do vertex magic.
 
Bob said:
This means that the pixel rate is halved when 4x AA is enabled, however the bandwidth requirement is quadrupeled.
ATI doesn't have color compression? I guess that's why you'd want a wider bus...
Learn how to quote text and you'd see the next sentence:
OpenGL guy said:
Even with compression and caches, you can still be using double the bandwidth of non-AA rendering.
Bold added so you don't miss it.

Compression doesn't always work, which is why I said "can still be". That means it's possible to be using twice the bandwidth even with compression. That also means twice the cache is used which means less pixels are stored in the cache which means less cache efficiency. Am I going too fast?
PS: I'm not in marketting.
Wow, coulda fooled me.
 
geo said:
But maybe Bob was trying to make a much narrower point. <shrugs>

Yeah I think it was a lot narrower than a lot of people took it to be. Judging from his first post it looks like he was saying that a 128-bit card can beat a 256-bit card and he provided benchmarks as proof of exactly what he said. Maybe he should've qualified it a bit by stating that 128-bit is enough if you've got sufficiently high memory clocks to feed a certain number of pipes. *shrug*

Bob said:
That's right, because as we all know, it's impossible for a GPU with a 128-bit memory bus to beat the crap out of one with a 256-bit memory bus. Notice that this is Half-Life 2 we're looking at, and not Doom 3.
 
OpenGL guy wrote:

Even with compression and caches, you can still be using double the bandwidth of non-AA rendering.

Bold added so you don't miss it.
My bad.

Assuming 4xAA doubles your data, and that you only have your data in two buckets: fully compressed (ie: one color for all 6 samples) and non-compressed, and that compression tags are free, this means that:

2 * total_pixels = non_compressed_sample_size + compressed_sample_size
<=> 2 * total_samples / 4 = non_compressed_samples + non_compressed_samples / 4
<=> 2 * total_samples = non_compressed_samples * 4 + non_compressed_samples
<=> 2 * total_samples = non_compressed_samples * 5
<=> non_compressed_samples = 2/5 total_samples.

In other words, (assuming I didn't screw up the math) you could have ~40% of pixels that aren't compressible (4xAA). That seems like a lot to me. If you stored compression tags in memory, that would decrease that ratio slightly. If you had more compression "buckets", the math becomes muddier, and you couldn't infer much from just this one data point.

I assume lots of small triangles would bump up that ratio, but that still takes a lot of small triangles to make up 40% of all pixels. Do you see this ratio in many actual games? Even at high resolutions (say 1600x1200)?

Of course, I don't know anything about ATI's sample compression scheme.


That also means twice the cache is used which means less pixels are stored in the cache which means less cache efficiency. Am I going too fast?
Gotcha.

Wow, coulda fooled me.
Maybe I should have provided you with a Power Point presentation? ;)

Btw, what's with personal attacks?



Edit: Oops, I did screw up the math. It should be:

2 * total_pixels = non_compressed_sample_size + compressed_sample_size
<=> 2 * total_samples / 4 = non_compressed_samples + non_compressed_samples / 4
<=> 2 = (1 - compressed_ratio) * 4 + compressed_ratio
<=> 2 = 4 - 4 * compressed_ratio + compressed_ratio
<=> -2 = -3 * compressed_ratio
<=> compressed_ratio = 2/3.

That does look somewhat better, at 33% uncompressed.
 
This thread makes my head hurt. It's very simple to see if 256-bit is overkill: go the to Digit-Life 3Digest archives and compare the scores of a 128-bit to a 256-bit 9800. Yes, the 128-bit card has 20MHz slower DDR, but that doesn't account for its halved performance in Unreal 2 at 1024x768 4x8. Nor does it account for its 1/3 slower performance in the same game without AF or AA. Performance is also halved in TR:AoD with and without AA+AF, and takes a similar dive in their HL2 "beta" benchmarks.

Or look at the performance drop the 6600GT experiences relative to the a 9800P with AA here and with AA+AF here. We go from a CPU limit to a bandwidth limit in both cases, though far more rapidly with Far Cry.

So, it would appear 256-bit benefitted ATI's cards. Was it necessary? For ATI, yes. While the 6600GT might also benefit from a doubled bus width, maybe advances in memory access (crossbar, etc.) have made 128-bit good enough from a price/performance standpoint. Looking at the TR review in its entirety and this HW.fr bench, the 6600GT doesn't seem to be hurting that badly.

That aberrant score in JK2 with 4xAA may well be due to memory usage spilling over 128MB. Plus, Bioware doesn't appear to have done ATI any favors with its engines, so it's an iffy benchmark to begin with.
 
geo said:
trinibwoy said:
But if I understand correctly, the point here is that the 6600GT would not benefit from a 256-bit bus or else we would have seen it being outgunned by the 9800......

Is it? The original response was to me, and my first two on this were firmly in the context of 5800 vs R300 and Kirk's bus-width statement re what was required for top-end cards of the day. Then I moved to 5900 as further evidence that even NV belatedly conceded that 256-bit was necessary at the time for the top-end. And things got really wiggy from there.

But maybe Bob was trying to make a much narrower point. <shrugs>

Honestly, I dont think thats what he was saying at all. I think he was saying with an good core/architecture. You can do more with less.. Bandwith will only take you so far. I mean there are several scenerios that can prove this.

5900 NU Verses 5900XT, ((850 mhz verses 700 mhz) These amounted to marginal performance increases. Or 5900 Ultra verses 9800 Pro. In which case the 9800 Pro did alot better with much less bandwith. I dont feel his 5800 Ultra verses 6600GT Comparison is such a bad one. Technically they have the same multitexture fillrate and the same bandwith, ((500/1000)). Under the great assumption that the 5800 Ultra had a 6600GT Core performance. Would the 128 verses 256 bus made a huge difference back then? Not really.

Obviously more of anything is a good idea, Bandwith, fillrate. But there are points where you can have stupid amounts of bandwith and no fillrate to feed it, Like I said. Look at the 5900 verses 5900XT and you'll see that its really not unrealistic to say that too much bandwith can be overkill. Obviously thats a bit simplistic. Nvidia used high clocks too compensate for the lack of bandwith compared to a 9700 Pro. ((IRC it was 18 megs of bandwith verses 16 megs of bandwith, and IMO a core difference could have easily accounted for such discrepencies. Assuming all things were equal. Which they obviously werent at the time.
 
Bob said:
In other words, (assuming I didn't screw up the math) you could have ~40% of pixels that aren't compressible (4xAA). That seems like a lot to me. If you stored compression tags in memory, that would decrease that ratio slightly. If you had more compression "buckets", the math becomes muddier, and you couldn't infer much from just this one data point.
I'd put it this way (assuming your two buckets as above). Say 10% of pixels are not compressible and that the pixel output for a single frame uses bandwidth x. Now, this means that 90% of the pixels are fully compressible and so use bandwidth .9 * x, but 10% are not compressible and use .1 * 4 * x (assuming 4x AA). Total bandwidth would be 130% the normal, non-AA rendering. That's already 30% more bandwidth. Of course, I am not counting other bandwidth usage like texture reads and so on.
Of course, I don't know anything about ATI's sample compression scheme.
Of course, I didn't say anything about ATI's compression scheme, I only spoke in generalities. ;)
Wow, coulda fooled me.
Maybe I should have provided you with a Power Point presentation? ;)

Btw, what's with personal attacks?
Sorry, I get rather annoyed at people who don't fully read what's written and take things, that are, in my opinion, out of context.
 
Honestly, I dont think thats what he was saying at all. I think he was saying with an good core/architecture. You can do more with less.. Bandwith will only take you so far.
Thank you.

Or look at the performance drop the 6600GT experiences relative to the a 9800P with AA here and with AA+AF here. We go from a CPU limit to a bandwidth limit in both cases, though far more rapidly with Far Cry.
A 7:1 drop in performance when tripling the number of pixels (1024x768 to 1600x1200) looks like it's hitting the limits of 128 MB, and not local memory bandwidth. That 256 MB cards don't seem to have too much trouble. It would be nice to see a 6600 GT with 256 MB for comparison (or conversely, the same cards in this benchmark all with 128 MB of memory).
 
nAo said:
Ailuros said:
Motion blur or motion trail? If it's the latter I'm not in the least interested.
Per pixel motion blur.
You render a couple buffers, one is the 'official' color buffer, the second one is a velocity buffer.
In a second pass you use the velocity buffer to blur (per pixel) the color buffer..
Obviously it's a fake..it doesn't work properly when objects intersects and in other many cases..

That's what I've understood so far under "motion trail"; if then I want real antialiasing and not bluring in the temporal dimension. First isn't really feasable yet on today's hardware, the second I hardly consider an IQ improvement, au contraire. IMHO of course.

Shall I assume 2.25x OGSS...dear God if yes :roll
I know.. :devilish: but it's better than nothing at all..[/quote]

At 720p I'd still prefer 4xRGMS + 16xAF.
 
As for the 128 vs 256bit buswidth debate (since there's no way in hell to get the conversation back to HDR as it seems), it depends on the situation and how and for what exactly an architecture has been layed out.

Obviously for mainstream GPUs 128bit busses were good enough, as can be seen on 6600/X700's. You pay what you get for and no one should really expect high sample AA density in high resolutions for ~200 bucks either.

I wouldn't really compare former generation high end GPUs with current generation mainstream GPUs, not only because any conclusion would be misleading, but also because I wouldn't consider a R350 to NV43 jump a real upgrade either.
 
I’ve always wondered if a faster memory latency -- higher Mhz (500 on 6600GT vs 340 on 9800p) can make up for slower bandwidth.

It would be interesting to clock the GPU on a 9800pro and 9500pro the same, say 300Mhz, and then see how low the memory clocks have to go on the 9800pro to equal out the performance with the 9500pro.
 
Bob said:
A 7:1 drop in performance when tripling the number of pixels (1024x768 to 1600x1200) looks like it's hitting the limits of 128 MB, and not local memory bandwidth. That 256 MB cards don't seem to have too much trouble. It would be nice to see a 6600 GT with 256 MB for comparison (or conversely, the same cards in this benchmark all with 128 MB of memory).
I must be missing something. I don't see a 7:1 drop in performance in either FC or CSS. The 128MB 9800P maintains performance better than the 128MB 6600GT. And the 256MB X700P doesn't benefit at all in FC, although its extra memory seems to make a 20% difference in CSS at 16x12 4x8.
 
Ailuros said:
As for the 128 vs 256bit buswidth debate (since there's no way in hell to get the conversation back to HDR as it seems), it depends on the situation and how and for what exactly an architecture has been layed out.

Heh. My bad. Had no idea I was being multi-page controversial at the time. :oops:
 
For what it's worth, Tom just published a large AGP shoot-out in which 6600 GT compares quite favorably with 9800 XT under most circumstances..
 
It has since its release, at least in all the reviews I can remember (TR, HW.fr, Hexus, AT).
 
I've seen alot of people argue that fix function MSAA is useless because HDR does tone mapping. Does that mean the NV30 AA was useless because it wasn't gamma corrected? Linear MSAA would be far better then no MSAA in my opinon.
 
As others have itirated: Comparing architectures that are ~2 generations apart is rediculous. I would say a good comparison for the benefits of a 256bit bus is the tweaktown review posted earlier. Clearly there are benefits to be had. As far as backing the NV30 as anything other than a disaster, I'm going to have to abstain. The card was expensive trash. David Kirk can smoke all the hallucinogenic substances he wants, but that mistake is not going away.

Bob- I understand what you're saying. More bandwidth is not always necessary. Here is what everyone else is saying: it is in most cases. The 6600gt is a better card than the 9800xt [in many cases] not because nVidia decided to skimp on the wider bus. It's a better card because the architecture it's based on is way more refined. Is there really any need to argue the point? More banwidth is better. End of discussion. There is no case where u say "gosh, I really wish I had less bandwidth". It never hurts.

BTW: tweaktown review. Also, one huge error. The 256bit bus on the 9800pro offers ~9.6GB/s more bandwidth, not 2GB/s.
http://www.tweaktown.com/document.php?dType=review&dId=704&dPage=8

Doom 3 benchmark (for Bob): 50fps vs 30 @ 1024 in doom3 (9800xt vs. 9800pe ). Some difference in clock speed exists between the 2, but only ~ 8%. Rest is bandwidth diff. Enjoy.
 
Status
Not open for further replies.
Back
Top