Average effectiveness of LMA on GF3

K.I.L.E.R

Retarded moron
Veteran
is it 30% efficient at reducing bandwidth load?

1024*768(resolution)*32(colour depth)*.5(50% hit in AA)/.3(30% LMA efficiency at reducing bandwidth load) = 41943040 = 4.2*10^7 Gb/s

On another note my GF3 is running at a 250MHz core and 558.4MHz memory because somehow 560MHz can't get there. (It doesn't freeze up, it's just when I try and set it to 560MHz it reverts to 558.4Mhz)

So the total bandwidth would roughly be 8.8Gb/s at maximum?

If that's the case then: 4.2*10^7/8.8*10^9 = 47% of my video cards bandwidth is loaded.

Is any of this correct?
 
I did not really look at your numbers in depth but some issues :

- Calculating bits or bytes, 32 bits colour or 4 bytes colour ?
- What about Z Buffer and texture bandwidth ?
- Why are you doing a divide by 0.3, thats X / (1/3) = X * 3 ?
- Not sure how your convertin to Gb/s either...
- And 4.2*10^7/8.8*10^9 = 0.47%

Anyway I would say... no, its not correct.
 
K.I.L.E.R said:
Is any of this correct?

No.

I'm not sure how old are you, but if you had taken physics classes, one basic thing you should have learnt is that the units should match on both sides of an equation.

There's "bits" on the left side, and "bits/s" on the right side...

What you are missing is your FPS number.

(Apart from the mistakes Kristof already told you.)
 
K.I.L.E.R said:
41943040 = 4.2*10^7 Gb/s
41943040 = 4.2 * 10^7 = 40 MB. 4.2 * 10^7 Gb/s is about 4.2 * 10^16 bits/s (b = bits, B=bytes), which is far more than will be achieved in the near future.
On another note my GF3 is running at a 250MHz core and 558.4MHz memory because somehow 560MHz can't get there. (It doesn't freeze up, it's just when I try and set it to 560MHz it reverts to 558.4Mhz)
The clocks don't have arbitrary precision.

P.S. 1024 bytes = 1KB, not 1 * 10^3 KB which is nearly a MB.
 
Let's try a more valid calculation.
It will be for an R9700Pro as I don't feel like changing it for the test. :)

test 1
3dmark 2001SE single texture fillrate test with 32bit texture
(That's one the highest bandwidth demanding tests available.)
The score is 1571 MTexels/s (= MPixels/s)

There's 32bit fb-read, 32bit fb-write and (assuming 1:1 pixel/texel ratio and 100% texture cache efficiency) 32bit texture-read.

Thats 1572*12*10^6 B/s = 18864*10^6 B/s

The bandwidth is 310*256*2/8 B/s = 19840*10^6 B/s

The efficiency is 95% which is pretty good.

test 2
3dmark 2001SE single texture fillrate test with compressed texture
The score is 1757 MTexels/s (= MPixels/s)

There's 32bit fb-read, 32bit fb-write and (assuming 1:1 pixel/texel ratio and 100% texture cache efficiency) 4bit average texture-read.

Thats 1757*8.5*10^6 B/s = 14934.5*10^6 B/s

The efficiency is 75.3% which doesn't sound that good (especially in the light of the previous result)

test 3
3dmark 2001SE multi texture fillrate test with 32bit texture
(It uses 8x multi texturing - the max in 3dmark)
The score is 2137 MTexels/s
I theory that number should be 2600 MTexel/s so this isn't looking like it's fillrate limited.

There's 2bit average fb-read, 2bit average fb-write and 32bit texture-read.

Thats 2137*4.5*10^6 B/s = 9616.5*10^6 B/s

The efficiency is 48.5% which isn't exciting at all.

(The compressed texture/multitexture test looks fillrate limited so I'm not including it.)
 
Hyp-X said:
test 3
3dmark 2001SE multi texture fillrate test with 32bit texture
(It uses 8x multi texturing - the max in 3dmark)
The score is 2137 MTexels/s
I theory that number should be 2600 MTexel/s so this isn't looking like it's fillrate limited.

There's 2bit average fb-read, 2bit average fb-write and 32bit texture-read.

Thats 2137*4.5*10^6 B/s = 9616.5*10^6 B/s

The efficiency is 48.5% which isn't exciting at all.

(The compressed texture/multitexture test looks fillrate limited so I'm not including it.)

Um, shouldn't that be average 16-bit fb read, and same for average write?

That makes for:
2137*8*10^6 B/s = 17096*10^6 B/s, which is much closer to the theoretical number.

Of course, there is an inherent problem with these results, as they're not indicative of a real gaming situation. Unfortunately, in a real gaming situation, such things are incredibly hard to determine. That said, I'll go ahead and post numbers from a GeForce4 Ti 4200:

Clock: 250/513
Theoretical fillrate: 1000 MPix/sec, 2000 MTex/sec
Theoretical bandwidth: 8.2GB/sec

Single-textured fillrate:
863.8MPix/sec
Bandwidth used:
863.8*8.5*10^6B/s = 7342.3*10^6 B/s, or 89.5% efficiency

Multi-textured fillrate:
1911.0MTex/sec
Bandwidth used:
1911*4.5*10^6B/s = 8599.5*10^6 B/s, or 105% efficiency! This is probably actually due to the first pass not needing a frame buffer read, and/or texture cache reducing texture bandwidth needs.
 
I sucked at physics in some areas. :p
Anyway, thanks guys.

"1911*4.5*10^6B/s = 8599.5*10^6 B/s, or 105% efficiency! This is probably actually due to the first pass not needing a frame buffer read, and/or texture cache reducing texture bandwidth needs."

Where did 4.5 come from?
Where did 10^6 come from?
9599.5*10^6 come form?
HTH did you get 105% efficiency?
Doesn't a Pr(something) = 1 at max?


Hyp-X said:
K.I.L.E.R said:
Is any of this correct?

No.

I'm not sure how old are you, but if you had taken physics classes, one basic thing you should have learnt is that the units should match on both sides of an equation.

There's "bits" on the left side, and "bits/s" on the right side...

What you are missing is your FPS number.

(Apart from the mistakes Kristof already told you.)
 
Hyp-X said:
(assuming 1:1 pixel/texel ratio and 100% texture cache efficiency)

Well, if you look at the 3DMark fillrate test you can see that there is far from a 1:1 pixel/texel ratio. The textures are very blurry. The texture bandwidth shouldn't be much at all (in fact I know it isn't). Once you load a block of the texture into the cache, it would be good for many pixels. This aspect of texture bandwidth makes it very difficult to calculate efficiency.

I am surprised at your 32-bit texture multitexturing scores, however. I wouldn't have expected such a huge drop. Maybe there's some sort of thrashing going on.

In any case, I wouldn't put much weight into these tests. Alpha fillrate is far from the most important aspect of 3D performance, especially when there is no Z-buffer. I would say the most important is with one or two textures, a read and write from Z, and a write to the colour buffer. I'm pretty sure NV30 won't do very well at all in this test relative to its real-world performance.
 
Hyp-X said:
test 3
3dmark 2001SE multi texture fillrate test with 32bit texture
(It uses 8x multi texturing - the max in 3dmark)
The score is 2137 MTexels/s
I theory that number should be 2600 MTexel/s so this isn't looking like it's fillrate limited.

There's 2bit average fb-read, 2bit average fb-write and 32bit texture-read.

Thats 2137*4.5*10^6 B/s = 9616.5*10^6 B/s

The efficiency is 48.5% which isn't exciting at all.
[/i]

There's a problem with my wording here.
What I actually calculated here is bandwidth utilization.

The difference with test 1 and 2 that those are (or should be) bandwidth limited (so in those case efficiency == utilization)

If we want to calculate efficiency it is: 2137/2600 = 82.2% but this is not bandwidth efficiency it has more to do with some internal limitations. (which is confirmed by overclocking.)

Actually test 2 shows some internal limitations too, as the score increases with overclocking the core.
 
Chalnoth said:
Um, shouldn't that be average 16-bit fb read, and same for average write?

No.
I calculate with "texel" cost.
The pixel has 8 "texels" (it's pretty simple in a 1 TMU architecture).

So the pixel cost is 32bit read, 32bit write, 8 * x bit texture read
So the average "texel" cost is 2bit read, 2bit write, x bit texture read
(Where x is dependent of the texture format.)
 
Hyp-X said:
Chalnoth said:
Um, shouldn't that be average 16-bit fb read, and same for average write?

No.
I calculate with "texel" cost.
The pixel has 8 "texels" (it's pretty simple in a 1 TMU architecture).

So the pixel cost is 32bit read, 32bit write, 8 * x bit texture read
So the average "texel" cost is 2bit read, 2bit write, x bit texture read
(Where x is dependent of the texture format.)

I still don't get it... Why do you divide the 32bit read and write by 16, and not 8? Shouldn’t it be 4bit read and 4bit write?
 
Thowllly said:
I still don't get it... Why do you divide the 32bit read and write by 16, and not 8? Shouldn’t it be 4bit read and 4bit write?

/me bangs his head in the desk :oops:

ok, here's the (hopefully) correct calc:

test 3
3dmark 2001SE multi texture fillrate test with 32bit texture
(It uses 8x multi texturing - the max in 3dmark)
The score is 2137 MTexels/s

There's 4bit average fb-read, 4bit average fb-write and 32bit texture-read.

Thats 2137*5*10^6 B/s = 10685*10^6 B/s

The bandwidth utilization is 53.9% so this should not be bandwidth limited.
Yet its only the 82.2% of the theoretical 2600 MTexel/s value.
 
Hyp-X said:
No.
I calculate with "texel" cost.
The pixel has 8 "texels" (it's pretty simple in a 1 TMU architecture).

So the pixel cost is 32bit read, 32bit write, 8 * x bit texture read
So the average "texel" cost is 2bit read, 2bit write, x bit texture read
(Where x is dependent of the texture format.)

Oh, yeah, silly me. I was still thinking dual-texturing...
 
So, anyone knows how to calculate REAL world performance?
Thanks to you guys I was running all my games at high detail nicely, but now I have everything set to medium-low so I don't consume too much bandwidth.

Does this seem strange?
 
K.I.L.E.R said:
So, anyone knows how to calculate REAL world performance?
Thanks to you guys I was running all my games at high detail nicely, but now I have everything set to medium-low so I don't consume too much bandwidth.

Does this seem strange?

There is no magic formula to calculate actual performance. Also, if the games run nicely with high detail why lower your settings?
 
It's all to do with the numbers.
See, I believe that the smaller the number the better.
Lower frame rate is better as long as the bandwidth consumed is low as well.

So, I think that high framerate is a sacrifice I need to take to get lower bandwidth eaten up on my video card.
 
K.I.L.E.R said:
It's all to do with the numbers.
See, I believe that the smaller the number the better.
Lower frame rate is better as long as the bandwidth consumed is low as well.

So, I think that high framerate is a sacrifice I need to take to get lower bandwidth eaten up on my video card.

I won't even claim to begin to understand all the math these guys were spewing above (although in fairness I didn't even try to figure it out), but I'm totally confounded by your conclusion here. I don't need any numbers to tell me that a higher framerate is better. Anyone who's played 3D games for any length of time will tell you that higher FPS plays much more smoothly.

It sounds like you're looking at graphics cards based on efficiency alone, which is kind of a strange way to go. The truth is, in the end efficiency doesn't matter, it's performance that counts.
 
That is the entire point.
I'm not trying to make sense on purpose with the conclusion. LMFAO!!!

Anyway, I thought you guys might have a good laugh at my conclusion that is why I posted it. :)
I am trying to make the boards a happy place too sometimes. :)

Anyway, I would still love info on where Chalnoth ripped those numbers I quoted in an above post.
 
K.I.L.E.R said:
Anyway, I would still love info on where Chalnoth ripped those numbers I quoted in an above post.

Oh, I got them in the same way Hyp-X got them. I'll go ahead and explain more completely.

1911*4.5*10^6B/s = 8599.5*10^6 B/s, or 105% efficiency! This is probably actually due to the first pass not needing a frame buffer read, and/or texture cache reducing texture bandwidth needs.

Where did 4.5 come from?
Where did 10^6 come from?
9599.5*10^6 come form?
HTH did you get 105% efficiency?
Doesn't a Pr(something) = 1 at max?

1911*4.5*10^6B/s:

1911 I got from the benchmark (this is how many texels were rendered per second)
4.5 I got from two textures per pixel, but I suppose I'm going to have to amend this. The GeForce4 is capable of two textures per pixel pipeline per clock, but capable of four per pass.
10^6 came from the fact that the 1911 number isn't actually the number of texels rendered per second, but instead it's how many million texels are rendered per second.

Anyway, I'll recalculate:

Four textures per pixel means an average of an 8-bit read and an 8-bit write each pixel for the framebuffer. That makes for 2 bytes of bandwidth for framebuffer access per texel (8 bits=1 byte). Since compressed textures were used, I should have been sitting at 4 bits per texel (roughly...could be less due to texture caches), or half a byte of bandwidth per texel.

That makes the equation turn into:
1911*2.5*10^6B/s = 4.78GB/sec of bandwidth used

The efficiency is: 4.78/8.208 = 58.2% efficiency.

This number actually makes a lot more sense since the texel fillrate measured is very close to the theoretical of 2000MTex/sec. The multitexturing tests of the Radeon 9700 should similarly be fillrate-limited.
 
Back
Top