7800 vertex setup rate?

MBDF · Feb 5, 2006

I was wondering if anyone knows the peak vertex setup rate of the 7800 series... is it one vertex per clock?

Blazkowicz · Feb 5, 2006

well GTX has 8 vertex units so my first guess would rather be 8 per clock. (7 units for 7800GT and 6 units for 7800GS)
then, synthetic vertex shading benchmark should be more useful.

DarkRage · Feb 5, 2006

MBDF said:
I was wondering if anyone knows the peak vertex setup rate of the 7800 series... is it one vertex per clock?

AFAIK, 2 per clock

Blazkowicz · Feb 5, 2006

my bad, vertex setup rate. Didn't we use to speak of triangle setup?

Guilty Bystander · Feb 5, 2006

AFAIK, 2 per clock

He's right but then again I think every high-end GPU does 2 per clock and that includes the Xenos, X1800/1900 and GeForce 7900 series.

TurnDragoZeroV2G · Feb 5, 2006

Xenos is one.

On a sidenote, it takes 4 vertex ALU*clocks to complete a transform, so those with 8 vertex pipes have a transform rate that matches setup with two vertices per clock.

superguy · Feb 6, 2006

One beeeelion vertices.

Heinrich4 · Feb 6, 2006

G70 -> 430MHz (june 21) -> 860 millions verts/sec
G70 GTX512 -> 550MHz( some core modifications to lower wattage at 110nm yet)->1100 millions vertz/sec.

picosec · Feb 6, 2006

I did some tests and was never able to get more than one vertex every two vertex engine clocks (the vertex engine runs at a slightly higher clock rate than the core clock).

ERP · Feb 7, 2006

picosec said:
I did some tests and was never able to get more than one vertex every two vertex engine clocks (the vertex engine runs at a slightly higher clock rate than the core clock).

You have to disable everything in the render state to hit the theoretical max.
That includes texturing, vertex color and Z I believe, the more iterators required, the longer setup takes.
Circa NV2A you got 2 iterators and every additional one cost.

Simple vertex shaders with multiple texture layers could actually be setup limited on NV2A.

DeanoC · Feb 7, 2006

Has anybody actually benchmarked this?
I see lots of people say 2 per clock but has anybody done a triangle setup limited test?

The reason is picosec's data meets my understanding, that its a 1/2 per clock (or 2 clocks per tri).

But I've never got round to writing a true benchmark, so I'm not sure.

AndrewM · Feb 7, 2006

DeanoC said:
Has anybody actually benchmarked this?
I see lots of people say 2 per clock but has anybody done a triangle setup limited test?

The reason is picosec's data meets my understanding, that its a 1/2 per clock (or 2 clocks per tri).

But I've never got round to writing a true benchmark, so I'm not sure.

Probably not the right place to ask if someone has tested it, but it's certainly very valid. As is the post above this (ERP).

I'm surprised that there's no synthetic tests out there that test this (is there?), but I suppose that is asking more from peoples' understanding than I really should.

Mintmaster · Feb 8, 2006

I always find digit-life to have the best synthetic tests.

Looks like G70 can do nearly one per clock when you have only ambient lighting, and drops to around half per clock with a single iterater (diffuse). Not sure what kind of mesh they're using, and how well optimized it is, but presumably setup rate is the limiting factor given that the cards have 8 vertex shaders.

Rolf N · Feb 12, 2006

DeanoC said:
Has anybody actually benchmarked this?
I see lots of people say 2 per clock but has anybody done a triangle setup limited test?

The reason is picosec's data meets my understanding, that its a 1/2 per clock (or 2 clocks per tri).

But I've never got round to writing a true benchmark, so I'm not sure.

This guy (or rather his 7800GTX) reached 262M triangles/s. Geometry:

lain::Fan is the number you're looking for. I trust that benchmark. I've written it

I don't know if/how the card was overclocked though. From the fillrate numbers it looks like the card was running at 480MHz (ROP throughput and texture sampling performance indicate that).

edit: Here's another 7800GTX, this time it's 233M triangles/s. Appears to have run at ~430MHz.

picosec · Feb 12, 2006

ERP said:
You have to disable everything in the render state to hit the theoretical max.
That includes texturing, vertex color and Z I believe, the more iterators required, the longer setup takes.
Circa NV2A you got 2 iterators and every additional one cost.

Simple vertex shaders with multiple texture layers could actually be setup limited on NV2A.

If I recall correctly, I was using a single vertex color (so one interator). The card was clocked at 450 MHZ and I was seeing 230-240 mverts / sec.

ERP · Feb 12, 2006

picosec said:
If I recall correctly, I was using a single vertex color (so one interator). The card was clocked at 450 MHZ and I was seeing 230-240 mverts / sec.

Actually your using at least 2, since position requires an iterator

You may actually have to turn off Z iin order to get maximum through put.

MBDF · Feb 12, 2006

ERP said:
Actually your using at least 2, since position requires an iterator
You may actually have to turn off Z iin order to get maximum through put.

So my question is theoretically with anough vertex pipes tacked on to G70 it is limited by its max vertex setup?

Have we come to any conclusion of as to what the max setup is? It seems to come down to either roughly 230mverts or roughly 450mverts when talking about the 430 mhz version.

Or is it that with real world scenerio, beyond roughly 230mverts is unatainable, but reachable with sufficiant vertex proccesing?

ERP · Feb 12, 2006

MBDF said:
So my question is theoretically with anough vertex pipes tacked on to G70 it is limited by its max vertex setup?

Have we come to any conclusion of as to what the max setup is? It seems to come down to either roughly 230mverts or roughly 450mverts when talking about the 430 mhz version.

Or is it that with real world scenerio, beyond roughly 230mverts is unatainable, but reachable with sufficiant vertex proccesing?

Realistically it's pretty unlikely except in extreme cases.
If you were doing a Z pass for static geometry and the triangles were all really small you might.
If you had a vertex shader than consisted of a simple transform and a bunch of copied texture coordinates, and again they were all very small tris you might.

If it works like NV2A then it's more likely if you use tris instead of strips, since setup is more expensive.

Peak setup rate is a nice number for manufacturers to quote, but it's a not a useful measure of any real performance.

Mintmaster · Feb 13, 2006

ERP said:
Peak setup rate is a nice number for manufacturers to quote, but it's a not a useful measure of any real performance.

How close is G70 to reaching this number for clipped and culled polygons? IMO this is very important because for visible polygons you're likely to be bottlenecked by the pixel shader anyway, except for shadow maps or stencil passes.

Culled and clipped polygons tend to come in bunches, so the FIFO feeding the PS often gets emptied resulting in idling the most expensive part of the chip. Aside from vertex texturing, I'd say this is the biggest reason for having a unified shader architecture: blasting through hidden primitives.

ERP · Feb 14, 2006

Mintmaster said:
How close is G70 to reaching this number for clipped and culled polygons? IMO this is very important because for visible polygons you're likely to be bottlenecked by the pixel shader anyway, except for shadow maps or stencil passes.

Culled and clipped polygons tend to come in bunches, so the FIFO feeding the PS often gets emptied resulting in idling the most expensive part of the chip. Aside from vertex texturing, I'd say this is the biggest reason for having a unified shader architecture: blasting through hidden primitives.

Don't know on G70, clipped, culled and degenerate polys were special cased in earlier NVidia hardware and had a fixed cost of 2 clocks. Since this was the minimum time a useful vertex shader would take, it's never the bottle neck.

No idea on G70, but it's a trivial test if someone wants to write it, just keep submitting the same 1 offscreen vert as the index in a strip with a trivial 1 instruction shader.

NVidia hardware (at least older NVidia hardware) attachs extra instructions to the end of vertex shaders, so you have to be careful not to become vertex shader bound in the test.

7800 vertex setup rate?

MBDF

Blazkowicz

DarkRage

Blazkowicz

Guilty Bystander

TurnDragoZeroV2G

superguy

Heinrich4

picosec

ERP

DeanoC

Trust me, I'm a renderer person!

AndrewM

Mintmaster

Rolf N

Recurring Membmare

picosec

ERP

MBDF

ERP

Mintmaster

ERP

Similar threads