Someone needs to write a *Bandwidth* Benchmark.

OpenGL guy said:
arjan de lumens said:
You can hide latency as long as the buffers you use to keep track of outstanding pixels/memory accesses aren't full. These buffers get rather expensive after a while, but, say, 100-200 ns of latency isn't that hard to mask this way.
Ok. How big is your cache line? How many cache lines do you have? These factors will determine how much latency you can hide. Caches on GPUs are generally smaller than CPU caches. Also, caches on GPUs tend to be divided among different units (Z, texture, color).
If you, for every memory access, must allocate a cache line prior to performing the access, then the amount of latency that can be masked will be on the order of cache size divided by memory bandwidth, which should amount to several hundred to a few thousand ns for typical GPU cache sizes and memory bandwidths. There is nothing preventing us from e.g. filling multiple cache lines at the same time.
Plus, I think you are missing my whole point: If you aren't getting good cache line utilization, then you are wasting a lot of bandwidth, thus latency becomes important.
In that situation, effective latency (as seen from the unit that accesses the memory controller) will go up sharply due to bandwidth saturation effects, so I would say that in this situation memory bandwidth is still more important to performance than the raw memory latency (the latency from the memory module receives the request until it returns the data).
 
To write a bandwith benchmark you need to make sure you are not hitting fillrate limits, which may be a tad difficult. Of course with Dx9 cards it's not going to be 'so' difficult if you start using 128 bit buffers and textures.
 
Colourless said:
To write a bandwith benchmark you need to make sure you are not hitting fillrate limits, which may be a tad difficult. Of course with Dx9 cards it's not going to be 'so' difficult if you start using 128 bit buffers and textures.

Yep, just use those in combination with one bilinear-filtered texture and lots of framebuffer blending.
 
OpenGL guy said:
Entropy said:
OpenGL guy, if you stroll buy a Hardware guy, it would be most appreciated if you threw him a banana,
LOL! I did that a long time ago :)
:) I don't know.... Perpetuating this myth that ATI is staffed by monkeys. :rolleyes:
 
OpenGL guy said:
Hyp-X said:
One of the most bandwidth intensive benchmark:

Start 3dmark2001, set the texture format to 32bit, run the single texturing fillrate test.
Why is a real bandwidth test? Don't forget there are caches and things involved.
It's not realistic though as it uses alpha-blending which occurs relatively few places in normal rendering. (Except when multi passing).
OTOH it doesn't do Z-buffering. (Which is almost always enabled).
Yes, but there are a lot of optimizations for Z: Z compression, early Z rejection, etc. that don't apply well to alpha blending.

One big issue with "bandwidth" tests is that they aren't always measuring what they think they are measuring: Memory latency can be an issue as well.

Well here's a test to stress memory latency.

The program uses "random" dependent texture reads to kill the texture cache. You should see green-blue noise on the screen during the test.

It results around 47 MPix/s on a GF3Ti200 and 89 MPix/s on a R9700. :)

It doesn't work corretly on a R8500 that I tried, which I don't get since I only used PS1.0, and R8500 is supposedly compatible with it. :rolleyes:
 
Nagorak said:
saf1 said:
Let's face it, people buy what they can understand. And that is kids understanding how many madonion marks they score or max fps results. Nothing else. They do not care if a card can push 48GB of bandwidth or a zillion. But, people who read forums like this do. They do want to know. That is probably why someone mentioned a "real" bandwidth benchmark. Nothing is wrong with that.

I'm at a loss, honestly, how are completely synthetic benchmarks real? I'll be the first to go on record as saying I couldn't care less whether my graphics card has 1 GB/s of bandwidth, or 2 MB/s, if the 2 MB is faster then so be it. If you want a more indepth benchmark than 3Dmark, that's fine, but saying that a synthetic benchmark is "more real" than a composite FPS benchmark is just sort of ridiculous.

Exactly. In the end, all that really matters as a performance measure (let's leave image quality out of the discussion, as it can't be measured) is fps. Sure, you can argue whether the most important measure is max, average, or min fps, but it's still fps. A bandwidth benchmark would be an indication of *theoretical* performance. FPS is a measure of *actual* performance. How could a theoretical measure possibly be more useful than an actual measure?

And for those that argue that madonion is irrelevant because "you don't play 3DMark" - while real games benchmarks are preferred, I have found that 3DMark is actually an excellent indicator of real games performance. Read any card comparison review that include madonion scores as well as UT2K3, Quake3, Comanche, etc, and you will find that a card that scores higher on 3DMark typically scores higher on the real games as well.
 
How could a theoretical measure possibly be more useful than an actual measure?

Because you can't "actually" measure performance for something that doesn't exist yet.

In other words, theoretical/synthetic (or subsystem specific) benchmarks can be a useful tool for trying to speculate how a piece of hardware will perform on actual, future apps. If we have reason to beleive that acutal, future apps will shift the stress of the graphics card from one set of circustances to another (say, bandwidth vs. pixel shading rate), then synthetic benchmarks can give us an indication of how current hardware might perform on future apps. (The implicit assumption being, the "synthetic" app is designed in a way that the author believes is representative of future applications.)

Both "actual" and "synthetic" benchmarks have their places.
 
Back
Top