Mintmaster
Veteran
So in another thread, Jawed pointed out some interesting OC data at Firingsquad:
http://www.firingsquad.com/hardware/ati_radeon_4850_4870_performance/page16.asp
What's great is that we have identical hardware with 4 fairly different core:mem ratios, with the highest BW config having 75% more BW/clk than the lowest.
My thinking is as follows:
Most tasks for the GPU are either clearly BW limited or GPU limited. Some workloads would run just as fast with 1/10 the bandwidth (i.e. scales perfectly with core clock), some would speed up 3x with 3x more BW (i.e. does nothing with core clock), and others elsewhere in this range. On the whole, it won't be often that a task lies in the range where a 75% BW/clk boost changes it from BW limited to GPU limited. What we can do, then is characterize the workload into parts A and B. A requires a certain number of cycles to get done, and B requires a certain amount of data transfer. Higher GPU clock reduces the time to finish A, and higher memory clock reduces the time to finish B.
What I did was invert the framerates from the above link, and use multiple regression to fit the rendering times (inverse of FPS) to the inverses of clock speed and BW and no constant term (it gave messed up results from overfitting). The model fit the data exceptionally well, having a standard error of 0.6 fps. Then I took B and divided it by the bandwidth to get the time that the card was BW limited. Expressing this as a percentage of total render time:
ET:QW 2560x1600 4xAA/16xAF
4850: 30%
4870: 22%
HL2:E2 2560x1600 4xAA/16xAF
4850: 29%
4870: 21%
FEAR 2560x1600 4xAA/16xAF
4850: 36%
4870: 27%
CoH 1920x1200 4xAA/16xAF
4850: 11%
4870: 7%
These seem like pretty good estimates of how often you get BW limited in these games, though the numbers would be larger if the CPU is a limit for parts of the tests. Crysis has this happen for sure, as evidenced by 4850 -> 4870 being less than 20%, so using the above model is flawed (you get negative BW dependence ). Maybe some other data from these games (e.g. resolution scaling on different GPUs) would let me extract this factor accurately.
Interesting stuff, though. We can see that the 4850 isn't overwhelmingly BW limited, but GDDR5 definately makes an impact. The games' data yields coefficients suggesting 280-500 MB per frame of BW limited operations.
*****************************
EDIT: Looks like this thread has a little more appeal than I thought it would, so I'll elaborate one example. With regression, I came up with the following for RV770 running HL2:EP2 at 2560x1600 w/ 4xAA/16xAF:
Predicted HL2 fps = 1 / ( 9.12M clocks / RV770 freq + 375.6MB / bandwidth )
4870 stock (750/1800): 64.9 predicted fps, 64.7 actual fps
4870 OC'd (790/2200): 70.4 predicted fps, 70.6 actual fps
4850 stock (625/993): 48.8 predicted fps, 49.1 actual fps
4850 OC'd (690/1140): 54.5 predicted fps, 54 actual fps
Not bad at all! With other GPUs, the 9.12M figure will change, but 375.6MB should be similar unless compression efficiencies are different. It might be interesting to test that out if we had the data...
Common question: Is the 4850 bandwidth limited? How about the 4870?
Answer: This is the wrong way to think about it. If you chopped a typical HL2:EP2 frame into 1000 pieces that take the same amount of time on the 4850, it would be BW limited for 288 of those. If you doubled the bandwith with a 512-bit bus, you'd crunch through those parts in half the time, thus giving you a 17% framerate boost. For some GPUs, this is worth it -- I doubt a GTX 280's total cost would go down 14% by using eight 1 GBit chips and a simpler PCB instead of the current sixteen 512 MBit chips. On the other hand, if you had a 128-bit version of the 4850, you would double the time on the BW limited parts, knocking 23% off the fps.
http://www.firingsquad.com/hardware/ati_radeon_4850_4870_performance/page16.asp
What's great is that we have identical hardware with 4 fairly different core:mem ratios, with the highest BW config having 75% more BW/clk than the lowest.
My thinking is as follows:
Most tasks for the GPU are either clearly BW limited or GPU limited. Some workloads would run just as fast with 1/10 the bandwidth (i.e. scales perfectly with core clock), some would speed up 3x with 3x more BW (i.e. does nothing with core clock), and others elsewhere in this range. On the whole, it won't be often that a task lies in the range where a 75% BW/clk boost changes it from BW limited to GPU limited. What we can do, then is characterize the workload into parts A and B. A requires a certain number of cycles to get done, and B requires a certain amount of data transfer. Higher GPU clock reduces the time to finish A, and higher memory clock reduces the time to finish B.
What I did was invert the framerates from the above link, and use multiple regression to fit the rendering times (inverse of FPS) to the inverses of clock speed and BW and no constant term (it gave messed up results from overfitting). The model fit the data exceptionally well, having a standard error of 0.6 fps. Then I took B and divided it by the bandwidth to get the time that the card was BW limited. Expressing this as a percentage of total render time:
ET:QW 2560x1600 4xAA/16xAF
4850: 30%
4870: 22%
HL2:E2 2560x1600 4xAA/16xAF
4850: 29%
4870: 21%
FEAR 2560x1600 4xAA/16xAF
4850: 36%
4870: 27%
CoH 1920x1200 4xAA/16xAF
4850: 11%
4870: 7%
These seem like pretty good estimates of how often you get BW limited in these games, though the numbers would be larger if the CPU is a limit for parts of the tests. Crysis has this happen for sure, as evidenced by 4850 -> 4870 being less than 20%, so using the above model is flawed (you get negative BW dependence ). Maybe some other data from these games (e.g. resolution scaling on different GPUs) would let me extract this factor accurately.
Interesting stuff, though. We can see that the 4850 isn't overwhelmingly BW limited, but GDDR5 definately makes an impact. The games' data yields coefficients suggesting 280-500 MB per frame of BW limited operations.
*****************************
EDIT: Looks like this thread has a little more appeal than I thought it would, so I'll elaborate one example. With regression, I came up with the following for RV770 running HL2:EP2 at 2560x1600 w/ 4xAA/16xAF:
Predicted HL2 fps = 1 / ( 9.12M clocks / RV770 freq + 375.6MB / bandwidth )
4870 stock (750/1800): 64.9 predicted fps, 64.7 actual fps
4870 OC'd (790/2200): 70.4 predicted fps, 70.6 actual fps
4850 stock (625/993): 48.8 predicted fps, 49.1 actual fps
4850 OC'd (690/1140): 54.5 predicted fps, 54 actual fps
Not bad at all! With other GPUs, the 9.12M figure will change, but 375.6MB should be similar unless compression efficiencies are different. It might be interesting to test that out if we had the data...
Common question: Is the 4850 bandwidth limited? How about the 4870?
Answer: This is the wrong way to think about it. If you chopped a typical HL2:EP2 frame into 1000 pieces that take the same amount of time on the 4850, it would be BW limited for 288 of those. If you doubled the bandwith with a 512-bit bus, you'd crunch through those parts in half the time, thus giving you a 17% framerate boost. For some GPUs, this is worth it -- I doubt a GTX 280's total cost would go down 14% by using eight 1 GBit chips and a simpler PCB instead of the current sixteen 512 MBit chips. On the other hand, if you had a 128-bit version of the 4850, you would double the time on the BW limited parts, knocking 23% off the fps.
Last edited by a moderator: