Someone needs to write a *Bandwidth* Benchmark.

Here is my thought. We all know the basic techniques that Nvidia, Ati etc uses to increase bandwidth. Fast Z, Heirarchical Z, Frame buffer compresion, Zcompression etc etc etc..

It seems that it would be possible to write a benchmark that gives some sort of real world number gauge of effective bandwidth. Follow me? I am not a programer so i cant really think of the internals that would be required. the program would have to simulate various loads, Types of loads, and written in such a way that the Test would throw both the Best case and Worst case for what each card, and a general test That all cards can handle. Like an AA test that only uses one color so you would get the max Frame compression possible.

This is just an idea, Please fill in the blanks here with other ideas, or ways that a bandwidth benchmark could be written. The program would then have to take the information and translate it into some sort of measurable effective number.
 
Good idea - however, the sad truth of the matter is that people only care about the overall fps scores or how many mad onion marks it can get.

Part to blame is the knowledge of the people who entertain the benchmarks. FPS is all that people want, that is all they get. No one really dives deeper.

It really comes down to people and why should they care, no? On the high end server market we focus on the stream benchmark - sustainable memory bandwidth. Not everything is rosy when you have 32 - 64 or more processors (which is why IBM's servers just took over the tpc/d lead with only 32 cpus vs HP's 64 - Hmm...).
 
Here is my thought. We all know the basic techniques that Nvidia, Ati etc uses to increase bandwidth. Fast Z, Heirarchical Z, Frame buffer compresion, Zcompression etc etc etc..

These things don't increase bandwidth, they reduce bandwidth use in certain situations.

I think VillageMark and any Fillrate tester can combine to figure out what you want to know. Perhaps some of the D3D demos too, since you can replace textures on some of them (to get the same color everywhere for instance). Run them at 32 bit color and 16 bit to vary the bandwidth used.

Of course, no benchmark will measure bandwidth if the system isn't bandwidth limited. So cards that are core limited can't measure bandwidth.

THere just will never be any benchmark that can find "real world bandwidth use".... only those that can measure some aspect of it on each card (occlusion detection, compression, etc) and only in a limited way.

Underclocking a card's memory and overclocking the core can help some, I suppose.
 
One of the most bandwidth intensive benchmark:

Start 3dmark2001, set the texture format to 32bit, run the single texturing fillrate test.

It's not realistic though as it uses alpha-blending which occurs relatively few places in normal rendering. (Except when multi passing).
OTOH it doesn't do Z-buffering. (Which is almost always enabled).
 
Hellbinder CE said:
Someone needs to write a *Bandwidth* Benchmark.

You mean another synthetic benchmark? I thought you hated those. In any case, I would still prefer games and their various settings to tell the story of bandwidth performance over any synthetic benchmark out there.

saf1 said:
Good idea - however, the sad truth of the matter is that people only care about the overall fps scores

What's so sad about caring for the overall fps score and why would caring for the bandwidth benchmarks be any different/better?
 
Hyp-X said:
One of the most bandwidth intensive benchmark:

Start 3dmark2001, set the texture format to 32bit, run the single texturing fillrate test.
Why is a real bandwidth test? Don't forget there are caches and things involved.
It's not realistic though as it uses alpha-blending which occurs relatively few places in normal rendering. (Except when multi passing).
OTOH it doesn't do Z-buffering. (Which is almost always enabled).
Yes, but there are a lot of optimizations for Z: Z compression, early Z rejection, etc. that don't apply well to alpha blending.

One big issue with "bandwidth" tests is that they aren't always measuring what they think they are measuring: Memory latency can be an issue as well.
 
dksuiko said:
Hellbinder CE said:
Someone needs to write a *Bandwidth* Benchmark.
What's so sad about caring for the overall fps score and why would caring for the bandwidth benchmarks be any different/better?

Why? Because there are more to games and video cards that just max FPS. Do you play madonion? Probably not. It is a indicator is all - no more, no less.

Can you tell me why a 256Bit bus is better than a 128Bit bus without just saying it is wider? DDR2 vs I, 8 pixels per clock vs 4, color compression, etc. Heck, image quality - side by side with/without aa/fsaa. How much data can each move - peak, sustained, etc. Release drivers?

Let's face it, people buy what they can understand. And that is kids understanding how many madonion marks they score or max fps results. Nothing else. They do not care if a card can push 48GB of bandwidth or a zillion. But, people who read forums like this do. They do want to know. That is probably why someone mentioned a "real" bandwidth benchmark. Nothing is wrong with that.

Remember when everyone thought the mips rating was king? Then spec benchmarks, tpc, tpc-d, etc. Change is good - maybe it is time to evolve the video card benchmarks...
 
OpenGL guy said:
Hyp-X said:
One of the most bandwidth intensive benchmark:

Start 3dmark2001, set the texture format to 32bit, run the single texturing fillrate test.
Why is a real bandwidth test? Don't forget there are caches and things involved.
It's not realistic though as it uses alpha-blending which occurs relatively few places in normal rendering. (Except when multi passing).
OTOH it doesn't do Z-buffering. (Which is almost always enabled).
Yes, but there are a lot of optimizations for Z: Z compression, early Z rejection, etc. that don't apply well to alpha blending.

One big issue with "bandwidth" tests is that they aren't always measuring what they think they are measuring: Memory latency can be an issue as well.

Wouldn't that be cool to know. What good is 256MB of memory when using it is slower than snot. Or why using a 256bit vs 128bit bus is better...
 
OpenGL guy said:
One big issue with "bandwidth" tests is that they aren't always measuring what they think they are measuring: Memory latency can be an issue as well.


I'll just interject that bandwidth and latency are interwowen enough that a "purified" measure of either serves little purpose. At least that is the case for general/scientific processing. I don't think gfx processing is too different though. And of course there's particulars of the cache hierarchy, bus turnarond, prefetching algorithms, buffers.... if you start to dig into memory subsystems they get messy as hell. Bandwidth as an important parameter has reached the awareness of some small percentage of the computing populace - enough that it can be marketed. Awareness of latency has a much smaller penetration - sufficient that armchair experts can feel smug about their superior knowledge. And then we get into the hair-tearing gritty reality of the actual apps people are running....
Even those who design these things make assumptions, simulate, adjust and then gather final data with actual silicon and typical apps. Even so specific application use can still invalidate your design choices.

Isolating a specific property can be tricky enough. And then you have to make sense of that lonely data point bereft of context, which can be nigh on impossible.


BTW, there recently was an interesting thread in comp.arch where John McCalpin (STREAM guy, and hired by IBM to work on the design on their Power processors) claimed that they had seen terrible bus turnaround properties with DDRII - much worse than with DDR which in turn is uglier than SDRAM in this respect. It wasn't entirely clear to me whether this was entirely the fault of the DDRII, or if their controller played some part in it as well, though he didn't give the impression of this.

OpenGL guy, if you stroll buy a Hardware guy, it would be most appreciated if you threw him a banana, and asked him what he has seen of real life DDRII properties in this respect. Inquiring minds want to know. :)

Entropy

PS. In the same thread, McCalpin admitted that he had bought a P4 system for himself a while back - with SDRAM! "I was cheap." I laughed out loud, mostly at my own suddenly obvious prejudice that only the ignorant would make a choice like that. Hell, has the man never seen a STREAM benchmark of those things? :)
 
saf1 said:
Let's face it, people buy what they can understand. And that is kids understanding how many madonion marks they score or max fps results. Nothing else. They do not care if a card can push 48GB of bandwidth or a zillion. But, people who read forums like this do. They do want to know. That is probably why someone mentioned a "real" bandwidth benchmark. Nothing is wrong with that.

I'm at a loss, honestly, how are completely synthetic benchmarks real? I'll be the first to go on record as saying I couldn't care less whether my graphics card has 1 GB/s of bandwidth, or 2 MB/s, if the 2 MB is faster then so be it. If you want a more indepth benchmark than 3Dmark, that's fine, but saying that a synthetic benchmark is "more real" than a composite FPS benchmark is just sort of ridiculous.
 
saf1 said:
dksuiko said:
Hellbinder CE said:
Someone needs to write a *Bandwidth* Benchmark.
What's so sad about caring for the overall fps score and why would caring for the bandwidth benchmarks be any different/better?

(I'll assume that quote of mine is an error on your part, since I didn't respond to Hellbinder's comments with that sentence, but yours. :))

Do you play madonion? Probably not. It is a indicator is all - no more, no less.

No, I don't play 'madonion'. But that wasn't the point - and I never did question your comments madonion/3dmarks since I agree with you on that. I was talking about your comments regarding the framerate. Your comments about it being sad that there is a focus on FPS scores made it seem as if there was no good reason for it being a priority.

When performing benchmarks on how fast a videocard performs in games, I don't think there is any good alternative to the good ol' frames per second. You can benchmark fill-rate, bandwidth, etc. but the end result of all that is an increase in framerate. You don't ever SEE fill-rate or bandwidth, but you see what they contribute to - and that's the framerate.

-dksuiko
 
Entropy said:
OpenGL guy, if you stroll buy a Hardware guy, it would be most appreciated if you threw him a banana, and asked him what he has seen of real life DDRII properties in this respect. Inquiring minds want to know. :)
LOL! I did that a long time ago :)

DDRII has more penalties than DDRI for things like page opens, etc., but that doesn't mean that you can't compensate for those penalties in your design. However, it does mean that random access requests (the worst kind) will be slower.
 
OpenGL guy said:
Hyp-X said:
One of the most bandwidth intensive benchmark:

Start 3dmark2001, set the texture format to 32bit, run the single texturing fillrate test.
Why is a real bandwidth test? Don't forget there are caches and things involved.
How would the presence of a cache affect this particular test? IIRC, this test basically draws ~100 full-screen polygons on top of each other, which would imply that unless your cache is large enough to hold the entire framebuffer you will get capacity misses 100% of the time, so performance will closely follow available memory bandwidth. So I would expect this particular test to be about as good a test of memory bandwidth as it is possible to get.
It's not realistic though as it uses alpha-blending which occurs relatively few places in normal rendering. (Except when multi passing).
OTOH it doesn't do Z-buffering. (Which is almost always enabled).
Yes, but there are a lot of optimizations for Z: Z compression, early Z rejection, etc. that don't apply well to alpha blending.

One big issue with "bandwidth" tests is that they aren't always measuring what they think they are measuring: Memory latency can be an issue as well.
I suspect that memory latency plays a much smaller role in a GPU than in a CPU. In a CPU you normally run a sequential stream of instructions which must stall completely in case of a cache miss - in this case, memory latency is clearly critical for performance. In a GPU, most memory accesses can be predicted far in advance - such accesses (framebuffer, ramdac, vertex arrays, anything except texture accesses) are easy to make extremely latency-tolerant. Also, if you get e.g. a texture cache miss for a pixel, you can still start processing of other pixels while waiting for the cache miss to be resolved - modern GPUs can easily sustain >95% of their theoretical texel fill rate even if the texture cache hit rate is only about 80%.
 
arjan de lumens said:
How would the presence of a cache affect this particular test? IIRC, this test basically draws ~100 full-screen polygons on top of each other, which would imply that unless your cache is large enough to hold the entire framebuffer you will get capacity misses 100% of the time, so performance will closely follow available memory bandwidth. So I would expect this particular test to be about as good a test of memory bandwidth as it is possible to get.
What about texture cache? Maybe you aren't measuring bandwidth at all, but how well you handle uncompressed textures.
I suspect that memory latency plays a much smaller role in a GPU than in a CPU. In a CPU you normally run a sequential stream of instructions which must stall completely in case of a cache miss - in this case, memory latency is clearly critical for performance. In a GPU, most memory accesses can be predicted far in advance - such accesses (framebuffer, ramdac, vertex arrays, anything except texture accesses) are easy to make extremely latency-tolerant. Also, if you get e.g. a texture cache miss for a pixel, you can still start processing of other pixels while waiting for the cache miss to be resolved - modern GPUs can easily sustain >95% of their theoretical texel fill rate even if the texture cache hit rate is only about 80%.
What about if you are rendering a bunch of skinny triangles? What if you are rendering a large, unmipmapped texture? These sorts of things can defeat caches. Once you blow your cache, it's all about memory latency.
 
I don't think a bandwidth benchmark is needed at all.

Alternatively, I think a very good, comprehensive suite of very primitive synthetic tests would be far superior.

Generic fillrate tests, overlapped region fills, single color texture fill routine exercises, generic shaded poly tests, etc.etc.

With a set of generic tests with no specialization... it would be the perfect tool for extrapolating things such as compression effectivity, "effective" bandwidth, polygon throughput, etc.etc.
 
OpenGL guy said:
What about texture cache? Maybe you aren't measuring bandwidth at all, but how well you handle uncompressed textures.
To measure how well you handle uncompressed textures, do a multi-texturing fillrate benchmark. If it returns >95% of theoretical texel fillrate (which Nvidia and ATI chips seem to do), you can pretty much tell that there are no inefficiencies in the texture handling. If the texture is then known to be either small enough to fit into the texture cache (8Kbyte or less) or large enough not to fit (1 MByte?), it can be counted into the bandwidth measurement.
What about if you are rendering a bunch of skinny triangles? What if you are rendering a large, unmipmapped texture? These sorts of things can defeat caches. Once you blow your cache, it's all about memory latency.
Not necessarily - there is no need to process pixels sequentially, so you still have the opportunity to overlap/pipeline large numbers of memory accesses, which is all that is needed to keep memory latency from dominating performance.
 
arjan de lumens said:
Not necessarily - there is no need to process pixels sequentially, so you still have the opportunity to overlap/pipeline large numbers of memory accesses, which is all that is needed to keep memory latency from dominating performance.
You can't hide latency forever.
 
OpenGL guy said:
arjan de lumens said:
Not necessarily - there is no need to process pixels sequentially, so you still have the opportunity to overlap/pipeline large numbers of memory accesses, which is all that is needed to keep memory latency from dominating performance.
You can't hide latency forever.
You can hide latency as long as the buffers you use to keep track of outstanding pixels/memory accesses aren't full. These buffers get rather expensive after a while, but, say, 100-200 ns of latency isn't that hard to mask this way.
 
arjan de lumens said:
OpenGL guy said:
You can't hide latency forever.
You can hide latency as long as the buffers you use to keep track of outstanding pixels/memory accesses aren't full. These buffers get rather expensive after a while, but, say, 100-200 ns of latency isn't that hard to mask this way.
Ok. How big is your cache line? How many cache lines do you have? These factors will determine how much latency you can hide. Caches on GPUs are generally smaller than CPU caches. Also, caches on GPUs tend to be divided among different units (Z, texture, color).

Plus, I think you are missing my whole point: If you aren't getting good cache line utilization, then you are wasting a lot of bandwidth, thus latency becomes important.
 
Back
Top