Memory bandwidth and GPU performance

Discussion in 'Architecture and Products' started by dkanter, Apr 27, 2011.

  1. dkanter

    Regular

    Joined:
    Jan 19, 2008
    Messages:
    360
    Likes Received:
    20
    Just wrapped up another article that looks at the performance impact of memory bandwidth on modern GPUs. For those of you curious about architecture, we actually discuss the quantitative relationship between performance and memory bandwidth (for 3DMark Vantage GPU). In some ways it is a complement for my earlier piece on the subject.

    http://www.realworldtech.com/page.cfm?ArticleID=RWT042611035931

    The most interesting part is that if the memory bandwidth to a GPU is insufficient for a given workload (and architecture), the performance implications are quite big. Easily in the realm of 30-40% for a 2X change. This strongly suggests that Llano and Ivy Bridge will offer models with more memory bandwidth, in order to avoid starving the GPUs.


    David
     
  2. ninelven

    Veteran

    Joined:
    Dec 27, 2002
    Messages:
    1,722
    Likes Received:
    141
    I'm not sure I'm completely sold on your conclusion... For integrated graphics, clearly bandwidth is an issue. But for the midrange and top-level boards, while more is always better it isn't clear to me that resources wouldn't be better spent elsewhere. For example, on the GTX 580 would you rather have more bandwidth (512bit bus) or more SMs for a given additional cost in area?

    I think the more interesting question is how much memory bandwidth is "enough?"
     
  3. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,253
    Likes Received:
    1,261
    I don't agree, I find this to be a good write-up, going from where a previous simple model fails to account for differences, and using that as a basis for an extended analysis. And for the industry in a wider sense, the conclusion that integrated GPUs has implications for the overall system design if you want to avoid choked off performance is certainly relevant.

    Also, and as David points out, the workload determines the bandwidth need. At present, most games are written assuming a dedicated gfx-card and corresponding bandwidth.
    It's a bit of a chicken and egg problem - if bandwidth is pitiful, rendering techniques will have to be employed that work as well as possible given that constraint, but everyone would be better off if those constraints are lifted, cheaply enough that it can be a part of all system. Not what we see happening yet though, witness Brazos 64-bit interface, so maybe on-chip/package solutions are what we are looking at for the future.
     
  4. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
  5. GZ007

    Regular

    Joined:
    Jan 22, 2010
    Messages:
    416
    Likes Received:
    0
    I think its not realy just buss width and frequency. If u see CPU-s for example, the new sandy bridge is reaching 17-18 GB/s read writes with dual chanel ddr3. Phenon II with the same memory 7-8 GB/s read writes.

    With all the MRT deferred rendering engines in my opinion there could be quite some differences in real world bandwith in GPU generations and models (cache hierarchy/bandwith, the, memory controller design). The theoretical bandwith numbers are maybe close for just pure buffered reads, but the gpu cant read all the time.

    Is there any complex GPU bandwith testing program ?
     
  6. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,253
    Likes Received:
    1,261
    Yes, I remember that thread now!
    And it would really be a boon to hardware forums if your final paragraph ("Q: Is card XX bandwidth limited" ...) reached greater awareness. It's not just about graphics, obviously, but equally true about CPUs, et cetera.

    Of course, both your study and Davids suffer from a very narrow set of cards, applications, and settings. The total problem is so much more complex. But that doesn't detract from the validity of the general conclusions you can draw.
     
  7. iwod

    Newcomer

    Joined:
    Jun 3, 2004
    Messages:
    179
    Likes Received:
    1
    Assuming Drivers plays the +- 5% difference in test, an actual GFLOPS output / Bandwidth Performance Numbers actually gives a VERY accurate results of the GPU performance.

    But it doesn't take into account the amount of memory with GFX card. With 22nm and 3D Package, Intel could literally stack 64MB of L3 Cache offering ~300GB/s of bandwidth to its iGPU. Of coz since the iGPU has low number of GFlops, it doesn't require that much bandwidth.

    But would these excess bandwidth help with the lower amount of memory?
     
  8. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Flops are cheap, bw is expensive.

    Also, it is not at all obvious that intel is anywhere close to solving the heat dissipation problem with stacking 100W dies.
     
  9. dkanter

    Regular

    Joined:
    Jan 19, 2008
    Messages:
    360
    Likes Received:
    20
    A few comments:

    1. AMD's cache hierarchy blows, I'm not surprised that with mixed R/W they get half the bandwidth of Intel.

    2. "Enough memory bandwidth" is determined by the workload. Every workload has a particular Byte/FLOP ratio, and if you deviate from that, performance drops.

    David
     
  10. CRoland

    Newcomer

    Joined:
    Jan 19, 2010
    Messages:
    114
    Likes Received:
    0
    So what is your final equation for the combined model of shaders and memory bandwidth?

    Did you do or are you planning to do any multiple regression analysis?
     
  11. dkanter

    Regular

    Joined:
    Jan 19, 2008
    Messages:
    360
    Likes Received:
    20
    I haven't done a multi-variate regression yet. I'm actually working on a bunch of articles that are substantially more complicated...so I may not get back to GPU performance models for a little while.

    I will however return to GPU performance.

    DK
     
  12. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,253
    Likes Received:
    1,261
    I did this, in 2001, as I was looking around for fun things to do with PCA (Principal Component Analysis). Used the different tests in the then current 3DMark (1999? 2001?) and a number of different game frame rates at different settings taken from reviews. However, I never got very far with it, not because of time constraints actually, but because it quickly became apparent that the overwhelmingly strongest correlation was with fill rate. The CPU and main memory subsystem had some impact since at the time, reviewers were still looking at games performance at different resolutions rather than graphics cards performance at unrealistic application settings.

    So the exercise showed early signs of being rather pointless - why go through the hassle of entering thousands of data points when the first few hundreds probably sufficed, and furthermore pointed towards a real life situation where you could measure a single value and be done with it?

    The situation today is probably more complex and thus more interesting. However, depending on what you want to actually assess, the data produced by reviewers may be less useful. Todays reviews generally tries to isolate the graphics card from the rest of the system by using settings where the rest of the system has minimal impact on the scores, regardless of how useful or realistic these settings are for actual application use. To me, it seems reviewers are benchmarking using the applications as benchmark tools rather than as games to be played, so they disconnect from the practical use of the cards, but without gaining the clear insight that synthetic tests can yield.
    The results of a multivariate data analysis of that data will show what is most significant in order to get good scores in website review benchmarking, rather than yielding useful info for gaming, or for that matter provide much insight into graphics card architecture vs. efficiency.
     
  13. fredmad

    Newcomer

    Joined:
    Sep 9, 2011
    Messages:
    1
    Likes Received:
    0
    Reading through your posts i can't make difference: what's the difference between multivariate regression and multipal regression? i thought it was all the same..
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...