two interesting slides about 9800XT from [H]

991060 · Oct 1, 2003

here they are:

http://www.hardocp.com/image.html?image=MTA2NDg1OTI2NEFrQm9pUVd0ZU5fMV83X2wuanBn

http://www.hardocp.com/image.html?image=MTA2NDg1OTI2NEFrQm9pUVd0ZU5fMV84X2wuanBn

I was wondering where the "5 max Ops per clock per pipe on 9800XT" comes from. Does it come from the new pipeline diagram which Dave just posted at http://www.beyond3d.com/forum/viewtopic.php?t=8005 ？

1 tex ops +1 full vec ops + 1 full scalar ops + 1 mini full ops + 1 mini scalar ops = 5 ?

THe_KELRaTH · Oct 1, 2003

Those slides are certainly getting around...

http://www.lostcircuits.com/video/ati_r360/

Xmas · Oct 1, 2003

I don't see how this "maximum FP operations per clock" should be interpreted.
Either they count only arithmetic operations, then it's 32:8, or they count tex ops too, then it's 40:12. I don't know how they get 40:8. Of course that's all massively in favour of ATI, you'll probably never find a complex shader where the vector/scalar separation doubles performance.

Luminescent · Oct 1, 2003

I believe they're counting the number of operations which could be executed by the pipeline's alu's and adderss processors. If you notice Dave's R3xx pipeline diagram, there are exactly 4 alu's and 1 address processor to match Ati's claim of 5 ops/clock/pipeline.

Xmas · Oct 1, 2003

Luminescent said:
I believe they're counting the number of operations which could be issued/executed by the pipeline's alu's and adderss processors. If you notice Dave's R3xx pipeline diagram, there are exactly 4 alu's and 1 address processor to match Ati's claim of 5 ops/clock/pipeline.

But that doesn't explain why they claim 40(24 without co-issue):8 instead of 32(16):8 or 40(24):12. Either they count tex ops, then its 12 for NV35, or they don't count tex ops, then its 32(16):8.

Luminescent · Oct 1, 2003

Xmas said:
Luminescent said:

I believe they're counting the number of operations which could be issued/executed by the pipeline's alu's and adderss processors. If you notice Dave's R3xx pipeline diagram, there are exactly 4 alu's and 1 address processor to match Ati's claim of 5 ops/clock/pipeline.

Click to expand...

But that doesn't explain why they claim 40(24 without co-issue):8 instead of 32(16):8 or 40(24):12. Either they count tex ops, then its 12 for NV35, or they don't count tex ops, then its 32(16):8.

This is how I see it (I known all of you are sharp, pardon me if I'm overly simple):

Ati seems to multiply the number of ops possible per clock (in one pipeline) directly with the number of pipelines. Therefore 1 full vec alu op + 1 mini vec alu op + 1 full scalar alu op + 1 mini scalar alu op + 1 texture addressing op = 5 ops/clock/pipe x 8 pipelines = 40 ops/clock. However, at 412 MHz, this would give you a total ops count of 40 ops/Hz*4.12x10^8 MHz or 16.48 gops.

Bambers · Oct 1, 2003

Xmas said:
But that doesn't explain why they claim 40(24 without co-issue):8 instead of 32(16):8 or 40(24):12. Either they count tex ops, then its 12 for NV35, or they don't count tex ops, then its 32(16):8.

Maybe they have some reason to believe that nv35 can only manage 8 fp32 ops/clock. Can the 2 mini ALUs in one of nv35s pipes manage an fp32 instruction each or do they have to combine. In the thread dave posted with the pipelines it was mentioned that they have to combine for an fp32 MAD, what about other instructions?

Sxotty · Oct 1, 2003

If this stuff is at all true then the nvidia drivers are much better as ATI seems to be more than 3x the power of the NV.

I dont believe what I just wrote though, I believe the water is muddled somewhere.

991060 · Oct 1, 2003

Sxotty said:
If this stuff is at all true then the nvidia drivers are much better as ATI seems to be more than 3x the power of the NV.
.

FX runs at higher clock than radeon,and not all games are 100% shader limited,if there's any.

GraphixViolence · Oct 1, 2003

Sxotty said:
If this stuff is at all true then the nvidia drivers are much better as ATI seems to be more than 3x the power of the NV.

I dont believe what I just wrote though, I believe the water is muddled somewhere.

That's true only for apps with near 100% shader-bound performance. And in the few of those that exist today, the 3x performance differential seems to hold true.

rwolf · Oct 1, 2003

The 5900 has 135 million transistors. Obviously some things are going to be implemented as good or better than ATI. I think ATI has a better hardware balance than Nvidia.

russo121 · Oct 1, 2003

rwolf said:
The 5900 has 135 million transistors.

... As the winter is coming.....just to make heat

Doomtrooper · Oct 2, 2003

rwolf said:
The 5900 has 135 million transistors. Obviously some things are going to be implemented as good or better than ATI. I think ATI has a better hardware balance than Nvidia.

Since when does transistor count relate to performance, even in the X86 world the Athlon smacked the early P4's equipped with more transitors.

Alot of those transistors might not even be used :!:

Sxotty · Oct 2, 2003

Yeah I wasn't really serious I was just saying it is an irony.

The more you slam the hardware of your competition, the better their driver team must be to make it even remotely competitive.

rwolf · Oct 2, 2003

I think most of the transistors are wasted on fp32.

Frank · Oct 2, 2003

The first slide is correct as far as we know, but it is the theoretical maximum. The second one is a bit biased.

The 9800 will almost certainly never execute the maximum amount of instructions, but the FX could do that, as long as all the pixels in a quad are used and there is a sequence of texop, texop, FP or FX. (3*4 = 12)

All in all, the ATi won't perform all the ops possible, but it doesn't mind much what sequence they're in. While a shader limited program (FX, FX, FX...) would only execute 2*4 = 8 ops on the FX, it would run just as fast as any other program on an ATi. And the ATi will normally perform (depending on the actual instructions) 2 or 3 ops per pipe, 4 in special cases, so that would be 2.5*8 = 20 operations for just about any real-life shader program.

Although a FX does do some things (like SIN and COS) quite a bit faster than the Radeon, it can at most be half as fast when shader programs are used.

And with only texture ops, (DX8), they're both just as fast. The higher clockspeed of the FX gives it an edge, but it is hampered by the fact that at the edges of triangles not all 4 pixels in a quad are used.

rwolf · Oct 3, 2003

The shader compiler however could re-order the instructions so they do utilize the maximum thoughput of the pipeline.

T2k · Oct 3, 2003

rwolf said:
The 5900 has 135 million transistors. Obviously some things are going to be implemented as good or better than ATI.

Because of its higher transistor count? LOL...

KimB · Oct 3, 2003

It seems to me that "5 ops per clock" is merely referring to a pipeline that is capable of one tex, one mad, and separate vec/scalar ops.

So, five ops per clock could be done with:
1 tex
1 scalar multiply
1 vector multiply
1 scalar add
1 vector add

I don't think that this is anything we didn't already know. Benchmarks elsewhere on this message board have shown that the R3xx can do a separate multiply and add just as fast as a MAD, and we also knew about the separation of scalar and vector ops.

And if this is what ATI is basing their performance comparison on, then it is severely flawed. Their description of nVidia's number of operations per clock is very different from the description they apply to ATI's hardware. Similarly, it doesn't take into account other functions. According to David Kirk, the NV3x can do a sin/cos in 2 cycles, while ATI takes 7-8. If this comparison is on a per-pipeline basis, then nVidia could do sin/cos functions in half the time on a per-clock basis.

OpenGL guy · Oct 3, 2003

Chalnoth said:
And if this is what ATI is basing their performance comparison on, then it is severely flawed.

I'll just say your analysis and conclusion are severly flawed and leave it at that.

two interesting slides about 9800XT from [H]

991060

THe_KELRaTH

Xmas

Porous

Luminescent

Xmas

Porous

Luminescent

Bambers

Sxotty

991060

GraphixViolence

rwolf

Rock Star

russo121

Doomtrooper

Sxotty

rwolf

Rock Star

Frank

Certified not a majority

rwolf

Rock Star

T2k

KimB

OpenGL guy

Similar threads