Are the VS of the NV40 < VS of the R350 per clock ?

How come at 400 MHz, the 6800 Ultra, with 6 parallel Vertex Shaders, is rated at ~320 MVertices/s ( http://www.beyond3d.com/previews/nvidia/nv40/index.php?p=18 [1 Vertex = 1 Triangle... optimized triangle strips] ) while the RADEON 9800XT, at 412 MHz, is rated at 412 MVertices/s ( http://www.beyond3d.com/reviews/ati/9800xt_r360/index.php?p=8 ) ?

These numbers are, I believe, for a simple Transform with Perspective projection ( no lighting ).

Making very quick calculations it would seem that each NV40's VS takes 7.5 cycles: ( 400 MHz / 320 MVertices ) * 6.

Instead the Radeon 9800XT can do 1 Vertex per cycle, going by the theoretical numbers, which would mean each of its VS takes 4 cycles fro a basic Vertex Transform.

have the VS gotten that much slower on nVIDIA cards ? The NV2A could process 1 Vertex every two cycles ( each of its twin VS can transform a Vertex [with Perspective Projection] in a minimum of 4 cycles ).

I apologize for the messy-ness of this post: I was talking on the phone while writing.
 
Code:
Benchmarks
Theoretical Performance

At a quoted 16 pipeline, GeForce 6800 Ultra should have a very large theoretical fill-rate. Here are a few theoretical numbers for 6800 Ultra in relation to FX 5950 Ultra:

Core Clock (MHz) Fill-rate (Mp/s) Texture Fill-rate (Mt/s) Triangle (Mtris/p) Memory Clock (MHz) Memory Bandwidth (GB/s)

6800 Ultra 	400 	6400 	6400 	320 	550 	35.2
5950 Ultra 	475 	1900 	3800 	190 	475 	30.4
5900 Ultra 	450 	1800 	3600 	180 	425 	27.2
5800 Ultra 	500 	2000 	4000 	200 	500 	16.0

This is from Beyond3D's review.
 
Vertices vs tris. I think the tri number is limited by triangle setup and transform throughput. In theory, an uber simple transform could take 4 cycles, and at 2.4 gigaops/s (6 shaders * 400Mhz), it yields 600Mverts/s peak. But I bet, even using geometry instancing, it's probably alot lower in real world.
 
DemoCoder said:
Vertices vs tris. I think the tri number is limited by triangle setup and transform throughput. In theory, an uber simple transform could take 4 cycles, and at 2.4 gigaops/s (6 shaders * 400Mhz), it yields 600Mverts/s peak. But I bet, even using geometry instancing, it's probably alot lower in real world.

Democoder, so the ATI chips have THAT much better Triangle Set-up engine ?

I figured that these numbers assumed 1 Triangle = 1 Vertex, B3D's figure were in Triangles.

412 MTriangles/s at 412 MHz for the RADEON 9800XT, that has 4 VS units, is an indication to me that they are making that assumption while giving out those figures.

There seems to be some discrepancy in the two reviews.

How do you explain RADEON 9800XT 412 MTriangles/s figure ( quoted as peak figure for T&amp;L by B3D's review ).
 
It could be a different issue, Nvidia used to do some extra work in the pipeline by appending instructions to vertex shaders. This was explicitly exposed in the Xbox SDK, if this is still the case, the effect is far more pronounced on trivial shaders than complex ones because it's constant cost, that ends up being a significant portion of a trivial transform.
 
Well, then that should be stated in the review.

For the casual reader it looks like, in Triangle/Vertex processing, the ATI RADEON 9800 is far better than the 6800 Ultra and the efficiency of the NV40's VS would be to blame ( 6 VS against 4 VS at the same clock and they would still loose by 80 MTriangles/s ).

Beyond3D is not supposed to be misleading as it is run by people with lots of experience in the 3D industry and with everal connections with actual Game Programmers and GPU architects.

If I bought a McLaren F1 powered by the nice V12 BMW engine, I would be far more critical of any of its non fatal flaws than if I bought a Honda Civic with the same non fatal flaws.

Beyond3D has a reputation to keep and that is why I am writing here, right now.

ERP, your contribution is appreciated regarding the Xbox SDK info and the NV2A.
 
DemoCoder said:
These figures are peak figures. Hold your conclusion until you see a raw geometry benchmark.

Are you saying that those peak figures are meaningless ?

Fine, then they should not bother to post them.

If they do post them, because they see some purpose in it ( I do too ), they should try to catch these "issues".

What is the meaning of the RADEON 9800 at 400 MHz peaking at 400 MTriangles/s with 4 Vertex Shaders if the 6800 Ultra at 400 MHz peaks at 320 MTriangles/s with 6 Vertex Shaders ?

Why did not the review try to address the issue at all ?

I got an answer, from ERP, which sounds plausible in not much more than a few hours.

Would have this info made or broken the review ? Of course not, but it would have still be interesting to see.

It would have helped the comparison of the two chips under a practical point of view ( real-world benchmarks ) and a theoretical/architectural point of view ( synthetic benchmarks and theoretical/Hardware Vendore given peak performance figures ).

P.S.: I apologize if I seem a bit too aggressive.
 
from what I remember, the ATI R3xx series totally destroyed the NV3x in terms of vertices per clock cycle. the only way Nvidia was able to match ATI's vertex performance was to clock NV3x series very high.

R3xx had 4 Vertex Shaders

NV3x did not have Vertex Shaders but a "sea of FP units" or something like that. it worked out to be like having the equivalent of 3 Vertex Shaders in NV3x.

now NV4x has 6 Vertex Shaders, the equivalent of 2x what the NV3x had, but only 50% more than what R3xx had.

the R420 is ment to have 6 (or was it 8 ) Vertex Shaders. I expect ATI to again destroy Nvidia in the area of verts per second. even assuming nvidia has caught upto ATI in verts per clock cycle, ATI should have the raw clock cycle advantage this time (500~600 Mhz). so I am expecting ATI to beat NV40's 600 verts/sec with 700~900 verts/sec with R420. If I'm wrong, hey then I'm wrong. I'll eat my words. Nv4x still seems VERY impressive. can't wait to learn full details on R420. :D

I fully expect the Xbox 2 graphics processor, which should be a decent leap beyond R420, to push well over 1 billion verts/sec. maybe in the 1.5 billion vert/sec range.
 
R420 is supposed to have 6 VS units, like 6800U. But of course higher clocks. Apparently 6800U does not have fixed function T&amp;L components, but uses VS for this purpose.
 
Panajev2001a said:
DemoCoder said:
These figures are peak figures. Hold your conclusion until you see a raw geometry benchmark.

Are you saying that those peak figures are meaningless ?

All peak figures are fairly meaningless, which is why in the theoretical test area we list the theoretical specs and then test those against theoretical specs.

Why did not the review try to address the issue at all ?

Read the reviews. In both cases we then run through the numbers on the theorectical 3DMark tests. You an derive yourself where things should be.

As for the theoretical numbers we were told that NV40 has 2x the rate of NV30/5/8 on a per clock basis (due to 2X the number of units), so the figures were derived from that data point given as we had no specific information to go from.
 
Back
Top