Memory latency and GPUs

Kaotik

Drunk Member
Legend
Supporter
I know GPUs are built to hide a LOT of latency, do something else if memory access for something you just did isn't complete yet etc, and that GTX480 has memory access latency of 400-800 cycles.

I also know that compared to 7970 BIOS, 7970 GHz BIOS has slightly(?) higher memory latency settings. (which also explains why 7970's with GHz Edition BIOS achieve higher memory clocks than with normal BIOS, the VMEM voltage should be the same AFAIK)

The big question is - how much added, say, 2-4 "clocks"* memory latency, affect the GPU performance?

*clocks as the best word I know to describe it, as CPU/RAM memory latency settings are "measured" in clocks
 
what you look as answer should tend in the wavefrontnumbers .. ( the wavefront acess on the 7970 is extremely high ) .
 
So what CPU-Z reports as "clocks" is same as cycles?

Both "clocks" and "cycles" are contractions for "clock cycles".

As for which cycles, that's another issue. The latency clocks reported by GPU-Z are memory bus cycles (divide the GDDR5 pr number by 4), while the 400-800 cycle memory latency typically refers to the time you have to wait as measured by the GPU clock (because that's what's interesting to a programmer). Of course, right now the memory clocks and the GPU clocks are close enough to each other that the distinction is irrelevant.

It's also important to note that as the latency is measured in memory bus cycles, latency of 10 on a 500MHz bus is exactly the same as latency 20 on a 1GHz bus.

As for the original question, the whole point of latency hiding in GPUs is based on the idea that when your problem is embarrassingly parallel, when you get hit by a memory access, you can always just go find something else to do while you wait. (whereas on typical CPU loads, you are pretty much hosed for the duration, so you want really good caches and low-latency memory). So the maximum the latency can get while not horribly ruining your performance depends on how many instructions (or, wavefronts) you can juggle, and on how long you can work on a single wavefront without stalling (on average). As that depends greatly on the workload you are executing, there is no simple answer.

Quite probably, the added latency doesn't hurt you on most game loads. However, some GPGPU loads are much more latency-sensitive.
 
2-4 clocks of memory latency is noise for a GPU. It would take a freak situation for the difference to be noticeable.
 
Just one more thing - how much, say, 4 clocks, be in nanoseconds (or how many cycles would 1ns be)
 
The hertz measures the number of rotations per second (aka frequency).
The classic watch's hand measuring second makes a full rotation in 1/60 hertz (1 minute).
A nanosecond is 1/10^9 second, so 10^9 nsec makes 1 second.
a Gigahertz is 1'000'000'000 hertz, or 10^9 rotations per second.


so... :)
 
800+2 = 802 = 0.25% increase.
I don't think it's as easy as that (and I don't think your intention was to depict it that way).

The total latency until a memory access is completed (and the write done or the results availably in the registers) seems not to be too closely related to what one traditionally understands when talking about memory timings as in latencies for, say normal DDR3-DRAM. At 800 MHz (DDR3-1600) having a CAS latency of 7 to 9 cycles is normal, adding two to four cycles on top of that... you do the math.

In the end it's a matter of alignment and how well your memory controllers are able to coalesce accesses. If there's wiggle room left, you can tolerate data sitting a little longer in the buffers, if not, well, you'll have a performance penalty then. But I'm sure, product managers and their teams did ensure beforehand that perf does not fall off a cliff here.
 
Back
Top