Games and bandwidth

You're talking 3DMark Vantages rather strange fillrate- tests? Several hundred GTex/sec. – I wonder how they get at that number. Haven't found anything in the docs as of yet.

Yes I was talking about 3dmark Vantage fillrate test. The numbers might be off but the graph isn't.
 
You are telling me you don't need more bandwidth to run higher resolutions and AA because this thread is showing benchmarks in high resolutions with AA? :rolleyes:
Relatively speaking, yes - I don't think anybody is claiming that bandwidth is never important nor that it's the sole factor in the overall performance of a graphics card but as the resolution increases relative performance hits decrease with respect to bandwidth. I've run a couple more 3DMark06 tests to show this:
Code:
3DMark06 - Graphics Test 1: Return to Proxycon
640 x 480 - No AA / Trilinear			680 x 480 - 8x AA / Trilinear
576 / 1350 / 900 - 49.50 fps			576 / 1350 / 900 - 47.37 fps
576 / 1350 / 450 - 46.49 fps (-6.1%)		576 / 1350 / 450 - 37.16 fps (-21.6%)

1680 x 1050 - No AA / Trilinear			1680 x 1050 - 8x AA / Trilinear
576 / 1350 / 900 - 38.97 fps			576 / 1350 / 900 - 26.16 fps
576 / 1350 / 450 - 31.90 fps (-18.1%)		576 / 1350 / 450 - 18.05 fps (-31.0%)
Take the non-AA results first: an increase in resolution by a factor of 5.74 results in the performance, when halving the bandwidth, decreasing by a factor of 2.97; the reason being, as already pointed out earlier in this thread, is that the bandwidth requirement per pixel doesn't increase. The same is shown with the AA figures: scale the resolution by a factor of 5.74 again, and the performance drop only scales by a factor of 1.43. Obviously more bandwidth is being used in total but not per pixel, thus there comes a point when the resolution is so high that differences in bandwidth between graphics cards matters far less than their other merits, such as number and speed of the SPs, ROPs, etc. I wish I had a bigger monitor to look at this in more detail.

Anyway, what Mintmaster was showing was that the games examined, the performance on the graphics card was only significantly affected by its bandwidth for a percentage that's probably a lot lower than people were expecting. Just like the amount of RAM or number of TMUs, it's just not the be-all, to end-all that it's often claimed to be. Which, funnily enough, was what I believe you were claiming and Mintmaster was showing!
 
Since you're asking questions now instead of being belligerent, I'll give it one more shot.
There's many different kind of G92 chips that are clocked differently. Wasn't mint taking about shader clocks and why are you comparing core clocks?
Read what Arun and I wrote. Core clock is what affects polygon speed. Only the longest vertex shaders are limited by SP speed.

You are telling me you don't need more bandwidth to run higher resolutions and AA because this thread is showing benchmarks in high resolutions with AA? :rolleyes:
Comparing AA to no-AA, no, you do indeed need more BW. If you read our posts then you'd know I said "I agree with you about AA" and that Arun never mentioned AA.

If the game is not limited at all by your CPU, PCIe bus, or polygon throughput (which is quite rare, especially if a game has shadow mapping), the answer is yes. BW does not help you churn out pixels any more at the high resolution than the low resolution. The only time BW matters more at high resolution is when the above factors are relevent, because it means that at the lower resolution the GPU is working on pixels for a lower percentage of the rendering time.

I see you are buddy buddy with mintmaster. :rolleyes: Perhaps it's time you took your own advice and open to others' points.
Except you aren't making any.

In my first post, I said among other things that HL2:E2 is BW limited only 29% of the time on 4850 and CoH is BW limited 11% of the time. Does that sound like I'm saying bandwidth is key? Read the damn thread before making stupid remarks.

What this means is if you double BW and leave the clock the same, you only get improvements of 17% and 6% respectively in the framerate. If you can't see how I got these numbers, I'll explain it to you.

Go educate yourself on what standard error is. You can check my results for yourself if you know how to use a calculator:
EDIT: Looks like this thread has a little more appeal than I thought it would, so I'll elaborate one example. With regression, I came up with the following for RV770 running HL2:EP2 at 2560x1600 w/ 4xAA/16xAF:

Predicted HL2 fps = 1 / ( 9.12M clocks / RV770 freq + 375.6MB / bandwidth )

4870 stock (750/1800): 64.9 predicted fps, 64.7 actual fps
4870 OC'd (790/2200): 70.4 predicted fps, 70.6 actual fps
4850 stock (625/993): 48.8 predicted fps, 49.1 actual fps
4850 OC'd (690/1140): 54.5 predicted fps, 54 actual fps
 
... but as the resolution increases relative performance hits decrease with respect to bandwidth. I've run a couple more 3DMark06 tests to show this:
Actually, your results show the opposite. ;) At 640x480, there is a very substantial load that's limited by polygon-throughput, so you're not isolating pixel/shading speed.
 
Relatively speaking, yes - I don't think anybody is claiming that bandwidth is never important nor that it's the sole factor in the overall performance of a graphics card but as the resolution increases relative performance hits decrease with respect to bandwidth. I've run a couple more 3DMark06 tests to show this:
Code:
3DMark06 - Graphics Test 1: Return to Proxycon
640 x 480 - No AA / Trilinear			680 x 480 - 8x AA / Trilinear
576 / 1350 / 900 - 49.50 fps			576 / 1350 / 900 - 47.37 fps
576 / 1350 / 450 - 46.49 fps (-6.1%)		576 / 1350 / 450 - 37.16 fps (-21.6%)

1680 x 1050 - No AA / Trilinear			1680 x 1050 - 8x AA / Trilinear
576 / 1350 / 900 - 38.97 fps			576 / 1350 / 900 - 26.16 fps
576 / 1350 / 450 - 31.90 fps (-18.1%)		576 / 1350 / 450 - 18.05 fps (-31.0%)
Take the non-AA results first: an increase in resolution by a factor of 5.74 results in the performance, when halving the bandwidth, decreasing by a factor of 2.97; the reason being, as already pointed out earlier in this thread, is that the bandwidth requirement per pixel doesn't increase. The same is shown with the AA figures: scale the resolution by a factor of 5.74 again, and the performance drop only scales by a factor of 1.43. Obviously more bandwidth is being used in total but not per pixel, thus there comes a point when the resolution is so high that differences in bandwidth between graphics cards matters far less than their other merits, such as number and speed of the SPs, ROPs, etc. I wish I had a bigger monitor to look at this in more detail.

Anyway, what Mintmaster was showing was that the games examined, the performance on the graphics card was only significantly affected by its bandwidth for a percentage that's probably a lot lower than people were expecting. Just like the amount of RAM or number of TMUs, it's just not the be-all, to end-all that it's often claimed to be. Which, funnily enough, was what I believe you were claiming and Mintmaster was showing!

How is in not showing more bandwidth requirement. Your benches are showing exactly that. Going from 640x480 to 1680x1050 made your card drop further.
 
Since you're asking questions now instead of being belligerent, I'll give it one more shot.
Read what Arun and I wrote. Core clock is what affects polygon speed. Only the longest vertex shaders are limited by SP speed.

Of course it affects polygon speed but bandwidth doe too. I asked Arun why he put up core clock speed to talk about vertex limitations. Shouldn't he be putting SP clocks instead?

Comparing AA to no-AA, no, you do indeed need more BW. If you read our posts then you'd know I said "I agree with you about AA" and that Arun never mentioned AA.

If the game is not limited at all by your CPU, PCIe bus, or polygon throughput (which is quite rare, especially if a game has shadow mapping), the answer is yes. BW does not help you churn out pixels any more at the high resolution than the low resolution. The only time BW matters more at high resolution is when the above factors are relevent, because it means that at the lower resolution the GPU is working on pixels for a lower percentage of the rendering time.

You are quite wrong. Considering pixel fillrate is limited to bandwidth. You specifically said you don't need more bandwidth to run higher resolutions but even in Neeyik tests are showing exactly that.

Except you aren't making any.

In my first post, I said among other things that HL2:E2 is BW limited only 29% of the time on 4850 and CoH is BW limited 11% of the time. Does that sound like I'm saying bandwidth is key? Read the damn thread before making stupid remarks.

What this means is if you double BW and leave the clock the same, you only get improvements of 17% and 6% respectively in the framerate. If you can't see how I got these numbers, I'll explain it to you.

Go educate yourself on what standard error is. You can check my results for yourself if you know how to use a calculator:

Making what? Having people agree with you just because he's your buddy? . :LOL:

In my first post I replied it's not just bandwidth. It's about fillrate and amount of bandwidth needed to use that fillrate properly.

You don't need to tell me how you got the numbers. Clearly more bandwidth is required in higher resolutions and or with AA. :!:
 
Clearly more bandwidth is required in higher resolutions and or with AA. :!:
We all agree.

The question is, what is the scaling like? i.e. for 2x the pixels with 4xMSAA, how much extra bandwidth is required? 25%? 50%? etc.

With MSAA turned on the bandwidth per pixel decreases as resolution increases - the number of edge pixels with MSAA goes up more slowly than the total number of pixels in the frame.

Jawed
 
marvelous - I'm just curious, what are you qualifications in the graphics industry?

arun and mintmaster both work in the industry and are highly-respected round these parts. Their contributions are greatly appreciated by those with enough sense to know when to listen and when to speak.

You're really starting off on a bad foot here... This isn't a fanboi hangout, this is one of the most respected and collectively knowledgeable forums on the intarwebs.
 
3DMk06 is very CPU-limited though.
It's not for most GPU configurations - the CPU plays a role in the overall score but it's not as significant as the graphics card:

2.67GHz Q6600 + stock G80 - 12309
3GHz Q6600 + stock G80 - 12616
2.67GHz Q6600 + 9.9% more core + shader speed G80 - 12707

Note that a 12% increase in CPU speed results in a 2.5% increase in the final score (and only tiny increases in the frame rates of the graphics tests), whereas a 10% increase in GPU speed results in a 3.2% increase. (edit: oops - fixed various typos)

Edit 2: Some better results to show this (sorry to all for keeping off-topic):

3GHz Q6600 + stock G80 - 12635
2GHz Q6600 (33% slower) + stock G80 - 10254 (-17% in overall score)
3GHz Q6600 + 33% slower core/shader/RAM than stock G80 - 8905 (-27% in overall score)

Mintmaster said:
Actually, your results show the opposite.
Not relative to the increase in pixels - I thought that I'd clarified that by showing that despite an increase in pixels by a factor of nearly 6, the drop in performance by halving the bandwidth only scaled by a factor of 3? I totally agree that 3DMark06 isn't really the best test to show this given it's very high vertex load (even by today's standards) but I couldn't be bothered to try it with other stuff ;)

marvelous said:
Considering pixel fillrate is limited to bandwidth. You specifically said you don't need more bandwidth to run higher resolutions but even in Neeyik tests are showing exactly that.
I showed that the relative drop in performance, by decreasing bandwidth, does not scale linearly with resolution. I was also under the impression that neither Mintmaster nor Arun said that you need more bandwidth at a high resolution compared to a low one, but rather they're talking about being limited by bandwidth - a different thing entirely - and they repeated stated bandwidth per pixel. It would have been better if I'd used a different test to show this but as Mintmaster reminding me, the vertex load is rather high; just look at these results to see what I mean:
Code:
3DMark06 - Graphics Test 1: Return to Proxycon
640 x 480 - No AA / Trilinear
576 / 1350 / 900 - 49.50 fps
288 / 1350 / 900 - 32.93 fps (-33.4%)
288 /  675 / 900 - 31.36 fps (-36.6%)

1680 x 1050 - No AA / Trilinear
576 / 1350 / 900 - 38.97 fps
288 / 1350 / 900 - 23.50 fps (-39.7%) 
288 /  675 / 900 - 20.53 fps (-47.3%)
Compare the differences in the performance drops with the "full vs. half core speed" settings - despite the fact that the ROPs are having to deal with nearly 6 times more pixels, the frame rate only decreases by a further 6%; it's clearly that the bulk of that performance drop is down to the triangle setup rate being halved.
 
It's not for most GPU configurations.
But your baseline is 640x480 - if that isn't CPU-limited then I don't know what is :oops:

"CPU-limited" is really code for "CPU, PCI Express, Direct3D small batching etc. problems", i.e. anything really that isn't GPU-specific.

Before examining pure GPU scaling you really need a genuinely GPU-limited baseline.

Jawed
 
1. They recently added a data point of the 4870 with GDDR5 @ 64 GB/s. The performance is well below GDDR3 @ 64 GB/s, thus throwing a bit of a wrench into the calculations. The question is whether the full speed GDDR5 results are still valid for comparison with the GDDR3 results in the regression, because it could simply be a matter of not being tuned for the low speed. Latency can be hidden by GPUs, but it may not be for GDDR5 at 993MHz.

In any case, there are two options for the regression. I can ignore the 4870 @ 725/993 and do it with the other 4 data points, or I can do it with just the three 4870 data points. I think the first method is more sound due to the latency argument. As expected, the second method shows games to be far more dependent on memory bandwidth since the 64 GB/s data point is much slower in all games.
I seem to remember that the GDDR5 is using a different signaling scheme where the actual signalling is negotiated (don't ask me how ;)), but could underclocking the memory actually result in lower performance than expected? Do we have any fairly accurate tests to show the actually available bandwith for a given clock rate?
 
But your baseline is 640x480 - if that isn't CPU-limited then I don't know what is :oops:

"CPU-limited" is really code for "CPU, PCI Express, Direct3D small batching etc. problems", i.e. anything really that isn't GPU-specific.

Before examining pure GPU scaling you really need a genuinely GPU-limited baseline.

Jawed
Ah, I see what you're getting at now (it would have been much clearer if you had specifically referred to the resolution I was using) but look at the third batch of 3DMark06 results I posted - halving the core speed still results in a 33% drop in frame rate (due to the polygon loads). I've no doubt that varying the CPU speed at that resolution would also have some impact on the frame rates but the test clearly isn't "very CPU limited" when changes in any aspect of the GPU's capabilities results in a change in the benchmark's outcome. Had I a bigger monitor I would have preferred to have scaled from the default resolution of 1280 x 1024 to something much higher, such as 2560 x 1600, but it wouldn't have given as large an increase in resolution as the limited method I used, so going well down the resolution scales is really the only way of achieving that - ultimately it still also tests the bandwidth usage of the graphics card, however minor it may be.

Edit: Your comment piqued my curiosity and so off I went to do a couple of more tests. Naturally for me, I did indeed pick the most CPU dependent graphics test in 3DMark06 at that resolution:
http://service.futuremark.com/compare?3dm06=7495301 - 3GHz CPU (9 x 333MHz) @ 640x 480
http://service.futuremark.com/compare?3dm06=7495474 - 2GHz CPU (6 x 333MHz) @ 640 x 480
Code:
	3GHz	2GHz	-33.33%
GT1	49.69	33.41	-32.76%
GT2	49.69	35.02	-29.52%
CPU1	1.4	0.98	-30.00%
CPU2	2.07	1.47	-28.99%
GT3	74.67	70.07	-6.16%
GT4	65.82	45.14	-31.42%
I need to go back and rerun my previous tests with GT3 instead of GT1 but I'd probably still cock things up...
 
marvelous - I'm just curious, what are you qualifications in the graphics industry?

arun and mintmaster both work in the industry and are highly-respected round these parts. Their contributions are greatly appreciated by those with enough sense to know when to listen and when to speak.

You're really starting off on a bad foot here... This isn't a fanboi hangout, this is one of the most respected and collectively knowledgeable forums on the intarwebs.

I never worked in the industry but I've been around from the rise of 3d architecture and before that CPU and not a fanboi either. Just because those guys worked in the industry doesn't mean what they say is god's truth. They are only human after all. :p
 
We all agree.

The question is, what is the scaling like? i.e. for 2x the pixels with 4xMSAA, how much extra bandwidth is required? 25%? 50%? etc.

With MSAA turned on the bandwidth per pixel decreases as resolution increases - the number of edge pixels with MSAA goes up more slowly than the total number of pixels in the frame.

Jawed

Not according to mintmaster it doesn't.
 
I showed that the relative drop in performance, by decreasing bandwidth, does not scale linearly with resolution. I was also under the impression that neither Mintmaster nor Arun said that you need more bandwidth at a high resolution compared to a low one, but rather they're talking about being limited by bandwidth - a different thing entirely - and they repeated stated bandwidth per pixel. It would have been better if I'd used a different test to show this but as Mintmaster reminding me, the vertex load is rather high; just look at these results to see what I mean:
Code:
3DMark06 - Graphics Test 1: Return to Proxycon
640 x 480 - No AA / Trilinear
576 / 1350 / 900 - 49.50 fps
288 / 1350 / 900 - 32.93 fps (-33.4%)
288 /  675 / 900 - 31.36 fps (-36.6%)

1680 x 1050 - No AA / Trilinear
576 / 1350 / 900 - 38.97 fps
288 / 1350 / 900 - 23.50 fps (-39.7%) 
288 /  675 / 900 - 20.53 fps (-47.3%)
Compare the differences in the performance drops with the "full vs. half core speed" settings - despite the fact that the ROPs are having to deal with nearly 6 times more pixels, the frame rate only decreases by a further 6%; it's clearly that the bulk of that performance drop is down to the triangle setup rate being halved.

You got the wrong impression. Mintmaster specifically said you don't need more bandwidth to run higher resolutions.

Here's his post.

- Increasing resolution doesn't increase the bandwidth usage per pixel. Contrary to myth, GPUs do not become more bandwidth limited at higher resolution. The exception is when polygon speed is a factor, because then higher resolution increases the percentage of time spent limited on pixel processing. AA, however, does increase BW usage per pixel, so I agree with you there.
 
You got the wrong impression. Mintmaster specifically said you don't need more bandwidth to run higher resolutions.
Bandwidth per pixel vs. Bandwidth per frame. Clearly, higher resolution rendering requires more bandwidth per frame, but bandwidth per pixel may not increase. It's bandwidth per pixel that's important in this discussion because that can be a bottleneck when trying to determine how fast you can render.
 
Making what?
Making points. We can't be open to your points if you aren't making any.

In my first post I replied it's not just bandwidth. It's about fillrate and amount of bandwidth needed to use that fillrate properly.
Again, stop saying fillrate if you mean texturing rate. Texture samples don't fill anything. Pixels fill polygons. I don't care what reviewers say, because many of them are wrong.

Anyway, it's about a lot more than texturing speed. Look at how the 9600GT barely trails the 8800 GT in game after game. 32 TMUs vs. 56. Bandwidth, pixel fillrate, and setup speed are all important.

You don't need to tell me how you got the numbers.
Obviously I do, given your remarks.

Here's his post.
Maybe you should actually read what I wrote instead of just quoting me. Everyone else here gets it. OpenGL Guy spelled it out for you if you're still confused.
 
I never worked in the industry but I've been around from the rise of 3d architecture and before that CPU and not a fanboi either. Just because those guys worked in the industry doesn't mean what they say is god's truth. They are only human after all. :p

I'm not saying anyone's perfect here, but you would be wise to ask questions and listen, rather than argue ;)

Not according to mintmaster it doesn't.

That's not what he said at all. Bandwidth per pixel doesn't increase, but pixel count increases with resolution so the overall bandwidth requirement does indeed increase. Bandwidth can be calculated in the following fashion: bw per pixel * pixel count per frame = bandwidth per frame. Your equation was incomplete as you were only allowing for bandwidth per pixel.
 
Back
Top