Memory bandwidth and GPU performance

dkanter

Regular
Just wrapped up another article that looks at the performance impact of memory bandwidth on modern GPUs. For those of you curious about architecture, we actually discuss the quantitative relationship between performance and memory bandwidth (for 3DMark Vantage GPU). In some ways it is a complement for my earlier piece on the subject.

http://www.realworldtech.com/page.cfm?ArticleID=RWT042611035931

The most interesting part is that if the memory bandwidth to a GPU is insufficient for a given workload (and architecture), the performance implications are quite big. Easily in the realm of 30-40% for a 2X change. This strongly suggests that Llano and Ivy Bridge will offer models with more memory bandwidth, in order to avoid starving the GPUs.


David
 
I'm not sure I'm completely sold on your conclusion... For integrated graphics, clearly bandwidth is an issue. But for the midrange and top-level boards, while more is always better it isn't clear to me that resources wouldn't be better spent elsewhere. For example, on the GTX 580 would you rather have more bandwidth (512bit bus) or more SMs for a given additional cost in area?

I think the more interesting question is how much memory bandwidth is "enough?"
 
I'm not sure I'm completely sold on your conclusion... For integrated graphics, clearly bandwidth is an issue. But for the midrange and top-level boards, while more is always better it isn't clear to me that resources wouldn't be better spent elsewhere. For example, on the GTX 580 would you rather have more bandwidth (512bit bus) or more SMs for a given additional cost in area?

I think the more interesting question is how much memory bandwidth is "enough?"

I don't agree, I find this to be a good write-up, going from where a previous simple model fails to account for differences, and using that as a basis for an extended analysis. And for the industry in a wider sense, the conclusion that integrated GPUs has implications for the overall system design if you want to avoid choked off performance is certainly relevant.

Also, and as David points out, the workload determines the bandwidth need. At present, most games are written assuming a dedicated gfx-card and corresponding bandwidth.
It's a bit of a chicken and egg problem - if bandwidth is pitiful, rendering techniques will have to be employed that work as well as possible given that constraint, but everyone would be better off if those constraints are lifted, cheaply enough that it can be a part of all system. Not what we see happening yet though, witness Brazos 64-bit interface, so maybe on-chip/package solutions are what we are looking at for the future.
 
I think its not realy just buss width and frequency. If u see CPU-s for example, the new sandy bridge is reaching 17-18 GB/s read writes with dual chanel ddr3. Phenon II with the same memory 7-8 GB/s read writes.

With all the MRT deferred rendering engines in my opinion there could be quite some differences in real world bandwith in GPU generations and models (cache hierarchy/bandwith, the, memory controller design). The theoretical bandwith numbers are maybe close for just pure buffered reads, but the gpu cant read all the time.

Is there any complex GPU bandwith testing program ?
 
You may be interested in a thread I started a couple years ago:

http://forum.beyond3d.com/showthread.php?t=48761

Yes, I remember that thread now!
And it would really be a boon to hardware forums if your final paragraph ("Q: Is card XX bandwidth limited" ...) reached greater awareness. It's not just about graphics, obviously, but equally true about CPUs, et cetera.

Of course, both your study and Davids suffer from a very narrow set of cards, applications, and settings. The total problem is so much more complex. But that doesn't detract from the validity of the general conclusions you can draw.
 
Assuming Drivers plays the +- 5% difference in test, an actual GFLOPS output / Bandwidth Performance Numbers actually gives a VERY accurate results of the GPU performance.

But it doesn't take into account the amount of memory with GFX card. With 22nm and 3D Package, Intel could literally stack 64MB of L3 Cache offering ~300GB/s of bandwidth to its iGPU. Of coz since the iGPU has low number of GFlops, it doesn't require that much bandwidth.

But would these excess bandwidth help with the lower amount of memory?
 
But it doesn't take into account the amount of memory with GFX card. With 22nm and 3D Package, Intel could literally stack 64MB of L3 Cache offering ~300GB/s of bandwidth to its iGPU. Of coz since the iGPU has low number of GFlops, it doesn't require that much bandwidth.
Flops are cheap, bw is expensive.

Also, it is not at all obvious that intel is anywhere close to solving the heat dissipation problem with stacking 100W dies.
 
A few comments:

1. AMD's cache hierarchy blows, I'm not surprised that with mixed R/W they get half the bandwidth of Intel.

2. "Enough memory bandwidth" is determined by the workload. Every workload has a particular Byte/FLOP ratio, and if you deviate from that, performance drops.

David
 
So what is your final equation for the combined model of shaders and memory bandwidth?

Did you do or are you planning to do any multiple regression analysis?
 
I haven't done a multi-variate regression yet. I'm actually working on a bunch of articles that are substantially more complicated...so I may not get back to GPU performance models for a little while.

I will however return to GPU performance.

DK
 
I haven't done a multi-variate regression yet. I'm actually working on a bunch of articles that are substantially more complicated...so I may not get back to GPU performance models for a little while.

I will however return to GPU performance.

DK

I did this, in 2001, as I was looking around for fun things to do with PCA (Principal Component Analysis). Used the different tests in the then current 3DMark (1999? 2001?) and a number of different game frame rates at different settings taken from reviews. However, I never got very far with it, not because of time constraints actually, but because it quickly became apparent that the overwhelmingly strongest correlation was with fill rate. The CPU and main memory subsystem had some impact since at the time, reviewers were still looking at games performance at different resolutions rather than graphics cards performance at unrealistic application settings.

So the exercise showed early signs of being rather pointless - why go through the hassle of entering thousands of data points when the first few hundreds probably sufficed, and furthermore pointed towards a real life situation where you could measure a single value and be done with it?

The situation today is probably more complex and thus more interesting. However, depending on what you want to actually assess, the data produced by reviewers may be less useful. Todays reviews generally tries to isolate the graphics card from the rest of the system by using settings where the rest of the system has minimal impact on the scores, regardless of how useful or realistic these settings are for actual application use. To me, it seems reviewers are benchmarking using the applications as benchmark tools rather than as games to be played, so they disconnect from the practical use of the cards, but without gaining the clear insight that synthetic tests can yield.
The results of a multivariate data analysis of that data will show what is most significant in order to get good scores in website review benchmarking, rather than yielding useful info for gaming, or for that matter provide much insight into graphics card architecture vs. efficiency.
 
Reading through your posts i can't make difference: what's the difference between multivariate regression and multipal regression? i thought it was all the same..
 
Back
Top