Jaws, you just aren't paying attention at all to my posts.
First of all, you were the one
who brought up 0.5 GHz, not me. I was never talking about it. My reference to 0.5/4 referred to
this post, where you divided by four and halved to get from 64GB/s to 8GB/s.
You have a lot of difficulty following my arguments. For example:
Jaws said:
Mintmaster said:
The fluke is that (16 samples / 4 cycles * half for sharing),
Wrong. The scaling of a factor of 8 is *inherent* in my derivation but it doesn't come from the above. It comes from,
~ (0.5 GHz/4 cycles * half for sharing)
~ 0.5 GHz x (1/4 cycles * 1/2 for sharing)
~ 0.5 Ghz x (1/8)
Why did you clip my post there? You completely missed the point. Forget about chip clock speed until the very end. Multiplication is commutative, and 0.5 GHz is the easy part. Bytes per pipe per clock is sole point of debate here. This was the point: Your method assumes (16 texels / 4 cycles * half for sharing) for AF2. This is incorrect, but the final number matches a different scenario. First of all, AF2 does not always need 4 cycles (you even found Dave's post yourself); more importantly, if it does need 4 cycles, "half for sharing" is incorrect. Now, I said you'll never need 16 texels, but then you say this:
Agreed it will NEVER require 16 *different* texels, hence the number is a pathological number *without* looking at texture cache reuse at the beginning of the derivation.
You simply don't understand. Even if you don't share texel data between pixels at all, you would never need 16 different texels worth of data to fill a single pixel using AF2. This argument of mine is orthogonal to the sharing.
Mintmaster said:
...It is a fluke. 8GB/s does not apply to AF2 if it is trilinear and takes 4 cycles. The knowledge required for its calculation is not simple.
Jaws said:
I want to expand on this point because, it's quite clear to me that you can't adapt to my *simple* model,
It is quite clear to me that you have very little understanding about texture filtering. Why did you highlight trilinear? I was clearly using "trilinear" and "takes 4 cycles" to describe the AF2. I wasn't talking about plain trilinear filtering. Anisotropic filtering can be done from only one mipmap level, in which case it's the bilinear variety, or from two mipmap levels, in which case it's the trilinear variety. 4 cycle AF2, as arjan described, could only possibly occur for the latter.
Only when all the samples are from a single mipmap near its transition edge will you get peak bandwidth usage. For your trilinear AF2 that requires 4 cycles, not only is the "without cache" data usage below 16 texels, but 2 of the 4 cycles use samples from a mipmap with 1/4 the texel : pixel ratio. Hence the inter-pixel sharing factor is even lower than half. Instead of (16 texels * half) for the worst case scenario I described, for your AF2 example it'll be (~12 texels * one third) in the worst case.
I fully understand your simple model. This discussion is about bandwidth for 2xAF. Such a calculation hinges entirely upon the ratio of texels to pixels at the active mipmap level, and this is where the sharing factor comes from. My factor of 0.5 applies to the peak you can get with bilinear AF. You applied it to 4-cycle trilinear AF2, which requires only marginally more data per pixel than bilinear AF2, yet takes twice as many clock cycles per pixel.
Please, do not cut my sentences when quoting. If you don't understand something, ask. All your replies to my posts have completely misunderstood what I was telling you. Contrary to your claims, I have not misunderstood anything about your calculations or assumptions.