^^ too many misinterpretations for me to correct, so I'll get to the point.
NO YOU DON'T.
It turned into YOU TRYING to prove that my 8 GB/sec was a FLUKE, IT WASN"T.
What you posted earlier,
Plug these numbers into my model,
AF2 = 16 samples/ TMU, cost ~ 4 cycles
AF4 = 32 samples/ TMU, cost ~ 8 cycles
AF8 = 64 samples/ TMU, cost ~ 16 cycles
8 GB/sec holds true for all AF.
Yeah, ANOTHER FLUKE! RIIIIGHT!
Taking my earlier equation for pathological b/w without texture cache,
~ 16 TMUs x (0.5/4) Ghz x 16 samples per texel x 4 bytes per sample
...and looking at it generally, i.e.
~No. of TMUs x (clockrate of GPU /cost of sampling in cycles) x max no. of samples needed for filtering x sample memory size
...and rearranging,
[EQ1]
~clockrate x TMUs x sample memory size x (max no. samples/ sampling cost)
...and for the following inputs,
AF2 = 16 samples/ TMU, cost ~ 4 cycles
AF4 = 32 samples/ TMU, cost ~ 8 cycles
AF8 = 64 samples/ TMU, cost ~ 16 cycles
...we can see that the ratio (max no. sample/ sampling cost) ~ 4 for all these cases. So by replacing this ratio into EQ1, we get,
[EQ2]
~clockrate x TMUs x sample memory size x 4
Looks familiar doesn't it?
Yeah,
Yeah, another FLUKE! RIIIGHT!
Mintmaster, you can believe what you want.
The 8 GB/sec was NOT a FLUKE.
I'm repeating myself, so we can agree to disagree.
Mintamster said:I fully understand your simple model
NO YOU DON'T.
Mintmaster said:This discussion is about bandwidth for 2xAF.
It turned into YOU TRYING to prove that my 8 GB/sec was a FLUKE, IT WASN"T.
What you posted earlier,
Mintmaster said:...I did see your post, and it had flaws that showed you still didn't understand AF. The 8 GB/s number doesn't only hold for 2xAF, it holds for all AF...
Plug these numbers into my model,
AF2 = 16 samples/ TMU, cost ~ 4 cycles
AF4 = 32 samples/ TMU, cost ~ 8 cycles
AF8 = 64 samples/ TMU, cost ~ 16 cycles
8 GB/sec holds true for all AF.
Yeah, ANOTHER FLUKE! RIIIIGHT!
Taking my earlier equation for pathological b/w without texture cache,
~ 16 TMUs x (0.5/4) Ghz x 16 samples per texel x 4 bytes per sample
...and looking at it generally, i.e.
~No. of TMUs x (clockrate of GPU /cost of sampling in cycles) x max no. of samples needed for filtering x sample memory size
...and rearranging,
[EQ1]
~clockrate x TMUs x sample memory size x (max no. samples/ sampling cost)
...and for the following inputs,
AF2 = 16 samples/ TMU, cost ~ 4 cycles
AF4 = 32 samples/ TMU, cost ~ 8 cycles
AF8 = 64 samples/ TMU, cost ~ 16 cycles
...we can see that the ratio (max no. sample/ sampling cost) ~ 4 for all these cases. So by replacing this ratio into EQ1, we get,
[EQ2]
~clockrate x TMUs x sample memory size x 4
Looks familiar doesn't it?
Yeah,
arjan de lumens said:The problem is, of course, WHICH equations to use, and what data to plug into them. While it is quite easy to compute the bandwidth needed from the texture-cache to the TMUs for optimal performance (texel-size * 4 * number-of-TMUs), that number is not necessarily very closely connected to the bandwidth needed from external memory to the texture-cache.
Yeah, another FLUKE! RIIIGHT!
Mintmaster, you can believe what you want.
The 8 GB/sec was NOT a FLUKE.
I'm repeating myself, so we can agree to disagree.