7800 vertex setup rate?

MBDF

Newcomer
I was wondering if anyone knows the peak vertex setup rate of the 7800 series... is it one vertex per clock?
 
well GTX has 8 vertex units so my first guess would rather be 8 per clock. (7 units for 7800GT and 6 units for 7800GS)
then, synthetic vertex shading benchmark should be more useful.
 
Xenos is one.

On a sidenote, it takes 4 vertex ALU*clocks to complete a transform, so those with 8 vertex pipes have a transform rate that matches setup with two vertices per clock.
 
G70 -> 430MHz (june 21) -> 860 millions verts/sec
G70 GTX512 -> 550MHz( some core modifications to lower wattage at 110nm yet)->1100 millions vertz/sec.
 
I did some tests and was never able to get more than one vertex every two vertex engine clocks (the vertex engine runs at a slightly higher clock rate than the core clock).
 
picosec said:
I did some tests and was never able to get more than one vertex every two vertex engine clocks (the vertex engine runs at a slightly higher clock rate than the core clock).

You have to disable everything in the render state to hit the theoretical max.
That includes texturing, vertex color and Z I believe, the more iterators required, the longer setup takes.
Circa NV2A you got 2 iterators and every additional one cost.

Simple vertex shaders with multiple texture layers could actually be setup limited on NV2A.
 
Has anybody actually benchmarked this?
I see lots of people say 2 per clock but has anybody done a triangle setup limited test?

The reason is picosec's data meets my understanding, that its a 1/2 per clock (or 2 clocks per tri).

But I've never got round to writing a true benchmark, so I'm not sure.
 
DeanoC said:
Has anybody actually benchmarked this?
I see lots of people say 2 per clock but has anybody done a triangle setup limited test?

The reason is picosec's data meets my understanding, that its a 1/2 per clock (or 2 clocks per tri).

But I've never got round to writing a true benchmark, so I'm not sure.


Probably not the right place to ask if someone has tested it, but it's certainly very valid. As is the post above this (ERP).

I'm surprised that there's no synthetic tests out there that test this (is there?), but I suppose that is asking more from peoples' understanding than I really should.

:)
 
I always find digit-life to have the best synthetic tests.

Looks like G70 can do nearly one per clock when you have only ambient lighting, and drops to around half per clock with a single iterater (diffuse). Not sure what kind of mesh they're using, and how well optimized it is, but presumably setup rate is the limiting factor given that the cards have 8 vertex shaders.
 
DeanoC said:
Has anybody actually benchmarked this?
I see lots of people say 2 per clock but has anybody done a triangle setup limited test?

The reason is picosec's data meets my understanding, that its a 1/2 per clock (or 2 clocks per tri).

But I've never got round to writing a true benchmark, so I'm not sure.
This guy (or rather his 7800GTX) reached 262M triangles/s. Geometry::plain::Fan is the number you're looking for. I trust that benchmark. I've written it :D

I don't know if/how the card was overclocked though. From the fillrate numbers it looks like the card was running at 480MHz (ROP throughput and texture sampling performance indicate that).

edit: Here's another 7800GTX, this time it's 233M triangles/s. Appears to have run at ~430MHz.
 
Last edited by a moderator:
ERP said:
You have to disable everything in the render state to hit the theoretical max.
That includes texturing, vertex color and Z I believe, the more iterators required, the longer setup takes.
Circa NV2A you got 2 iterators and every additional one cost.

Simple vertex shaders with multiple texture layers could actually be setup limited on NV2A.

If I recall correctly, I was using a single vertex color (so one interator). The card was clocked at 450 MHZ and I was seeing 230-240 mverts / sec.
 
picosec said:
If I recall correctly, I was using a single vertex color (so one interator). The card was clocked at 450 MHZ and I was seeing 230-240 mverts / sec.

Actually your using at least 2, since position requires an iterator ;)
You may actually have to turn off Z iin order to get maximum through put.
 
ERP said:
Actually your using at least 2, since position requires an iterator ;)
You may actually have to turn off Z iin order to get maximum through put.


So my question is theoretically with anough vertex pipes tacked on to G70 it is limited by its max vertex setup?

Have we come to any conclusion of as to what the max setup is? It seems to come down to either roughly 230mverts or roughly 450mverts when talking about the 430 mhz version.

Or is it that with real world scenerio, beyond roughly 230mverts is unatainable, but reachable with sufficiant vertex proccesing?
 
Last edited by a moderator:
MBDF said:
So my question is theoretically with anough vertex pipes tacked on to G70 it is limited by its max vertex setup?

Have we come to any conclusion of as to what the max setup is? It seems to come down to either roughly 230mverts or roughly 450mverts when talking about the 430 mhz version.

Or is it that with real world scenerio, beyond roughly 230mverts is unatainable, but reachable with sufficiant vertex proccesing?

Realistically it's pretty unlikely except in extreme cases.
If you were doing a Z pass for static geometry and the triangles were all really small you might.
If you had a vertex shader than consisted of a simple transform and a bunch of copied texture coordinates, and again they were all very small tris you might.

If it works like NV2A then it's more likely if you use tris instead of strips, since setup is more expensive.

Peak setup rate is a nice number for manufacturers to quote, but it's a not a useful measure of any real performance.
 
ERP said:
Peak setup rate is a nice number for manufacturers to quote, but it's a not a useful measure of any real performance.
How close is G70 to reaching this number for clipped and culled polygons? IMO this is very important because for visible polygons you're likely to be bottlenecked by the pixel shader anyway, except for shadow maps or stencil passes.

Culled and clipped polygons tend to come in bunches, so the FIFO feeding the PS often gets emptied resulting in idling the most expensive part of the chip. Aside from vertex texturing, I'd say this is the biggest reason for having a unified shader architecture: blasting through hidden primitives.
 
Mintmaster said:
How close is G70 to reaching this number for clipped and culled polygons? IMO this is very important because for visible polygons you're likely to be bottlenecked by the pixel shader anyway, except for shadow maps or stencil passes.

Culled and clipped polygons tend to come in bunches, so the FIFO feeding the PS often gets emptied resulting in idling the most expensive part of the chip. Aside from vertex texturing, I'd say this is the biggest reason for having a unified shader architecture: blasting through hidden primitives.

Don't know on G70, clipped, culled and degenerate polys were special cased in earlier NVidia hardware and had a fixed cost of 2 clocks. Since this was the minimum time a useful vertex shader would take, it's never the bottle neck.

No idea on G70, but it's a trivial test if someone wants to write it, just keep submitting the same 1 offscreen vert as the index in a strip with a trivial 1 instruction shader.

NVidia hardware (at least older NVidia hardware) attachs extra instructions to the end of vertex shaders, so you have to be careful not to become vertex shader bound in the test.
 
Back
Top