R300 - Final Specs?

DemoCoder · Jul 15, 2002

400 M vertices/s * ~40 bytes/vertex = 16gigabytes/s. The only way it will ever approach 400M vertices is if you are redrawing the same vertices over and over again in the vertex cache. Even if we assume 24byte vertices, you've got a bandwidth problem.

400M is probably the theoretical max if you take the minimum vertex transform and compute how many can be done per clock times the clock rate. It doesn't take into account triangle setup, bus bandwidth, etc.

Someone needs to release a reasonably complicated vertex and pixel shader benchmark like SpecCPU or Dhrystone and let cards bench against that. "e.g. we get 13,000 VertexStones and 9,000 PixelStones"

The raw performance of the minimal vertex shader is a hard yard stick to gauge real performance by.

PC-Engine · Jul 15, 2002

The heatsink is wider because it cannot go thicker since the AGP slot on a motherboard lays parallel to the PCI slots. Unlike present day CPUs that have plenty of vertical space for a tall heatsink, addin cards fit too close to one another for a tall heatsink so instead of going higher they become wider.

Guest · Jul 15, 2002

PC-Engine said:
The heatsink is wider because it cannot go thicker since the AGP slot on a motherboard lays parallel to the PCI slots. Unlike present day CPUs that have plenty of vertical space for a tall heatsink, addin cards fit too close to one another for a tall heatsink so instead of going higher they become wider.

Thats it ... I bet. Thanks. Still though the chip itself must be a monster sized chip comparitively speaking of course.

MuFu · Jul 15, 2002

Well it consumes about 25W I believe, so dissipation is going to be very high. I'd expect some nice overclocks with the external power feed and scary-ass cooling.

MuFu.

hughJ · Jul 16, 2002

"So the triangle count is somewhere in between 200-300 million(WOW!) where the geforce 4 ti is about 125 million. Right?"

as far as I know, every "poly/triangle" performance quote you see online are really in relation to vertices.. so the 125 million number would be directly comparable to the 400 million number..

Basic · Jul 16, 2002

With a large and good enough vertex cache, and a well optimized mesh, you could even aproach 0.5 vertices/triangle. But I doubt that the triangle setup would handle 800Mtri/s. It might not even handle 400Mtri/s.

I agree with DemoCoder that it likely is a teoretical max under the assumptions he says. But fast VS implementations are not likely done with that case in mind. The reason to increase the VS throughput is to make the realy comlex vertex programs fast.

Guest · Jul 16, 2002

hughJ said:
"So the triangle count is somewhere in between 200-300 million(WOW!) where the geforce 4 ti is about 125 million. Right?"

as far as I know, every "poly/triangle" performance quote you see online are really in relation to vertices.. so the 125 million number would be directly comparable to the 400 million number..

Err I think the 125 million number is the theretical limit of the geforce 4 ti in vertices... According to this that number is attached to the geforce 4 ti 4400 the geforce 4 ti4600 does 136 million vertices. So the R300 does relatively the same number of vertices per second is what you are saying? Hrm that doesn't sound right.

http://www.nvidia.com/view.asp?PAGE=geforce4ti

GeForce4 Ti 4600 Vertices per Second:136 Million
Fill Rate:4.8 Billion AA Samples/Sec.
Operations per Second:1.23 Trillion
Memory Bandwidth:10.4GB/Sec.
Maximum Memory:128MB

GeForce4 Ti 4400 Vertices per Second:125 Million
Fill Rate:4.4 Billion AA Samples/Sec.
Operations per Second:1.12 Trillion
Memory Bandwidth:8.8GB/Sec.
Maximum Memory:128MB

GeForce4 Ti 4200 Vertices per Second:113 Million
Fill Rate:4 Billion AA Samples/Sec.
Operations per Second:1.03 Trillion
Memory Bandwidth:up to 8GB/Sec.
Maximum Memory:128MB

Dave · Jul 16, 2002

You can easily inflate transformation numbers: multiply them by three. Each vertex goes through three spaces, so count the vertex transformation rate once for each space. Nobody said they would only count it once for all three spaces. Just a thought.

Geeforcer · Jul 16, 2002

Basic said:
With a large and good enough vertex cache, and a well optimized mesh, you could even aproach 0.5 vertices/triangle.

Don't you mean 1.5?

OpenGL guy · Jul 16, 2002

Geeforcer said:
Basic said:

With a large and good enough vertex cache, and a well optimized mesh, you could even aproach 0.5 vertices/triangle.

Click to expand...

Don't you mean 1.5?

No, it's possible for 1 vertex to add 2 more triangles. Read it as 2.0 triangles/vertex

Geeforcer · Jul 16, 2002

OpenGL guy said:
No, it's possible for 1 vertex to add 2 more triangles. Read it as 2.0 triangles/vertex

How would you do that?

ERP · Jul 16, 2002

In a triangle mesh with "correct" topology, there are 6 triangles at every vertex (draw out a quad mesh and check out one of the internal verts).

Since each tri requires 3 verts, it requires 3/6 = 0.5 verts per tri.

The only topologies that actually have 0.5 verts per tri are spheres (which are actually slightlly better) and toruses, but any sufficiently large mesh will approach it.

To exploit this you need a post transform cache.

But my guess would be that ny quoted tri number is setup limited, not transform limited (as is the case with Geforce 4). With a post transform cache if your playing number games then transforms are basically free.

Geeforcer · Jul 16, 2002

Yes, you are right. For some reason, I was thinking in 2D terms

Althornin · Jul 16, 2002

Geeforcer said:
Yes, you are right. For some reason, I was thinking in 2D terms

even in 2d it can approach .5, just never equal it

Geeforcer · Jul 16, 2002

Perhaps I am not thinking clearly (its rather late), but how would you draw a 2D mesh with more triangles then vertices?

Nagorak · Jul 16, 2002

Geeforcer said:
If R300 is indeed 107M/.15 @ 315MHz, one has to wonder WTH did Matorx do wrong? Its not like there has been a major breakthrough with .15 process in a month following Parhelia's release.

Maybe the engineers at ATI actually know a little about what they're doing?

In all honesty Matrox is a private company and it's a LOT smaller than either ATI or Nvidia. I suspect they just don't have the resources to fund serious R&D. Matrox is a very niche company, so hopefully they'll be fine. The only problem is if others start moving into their niche (ATi w/hydravision? etc).

Maybe we'll see Matrox as a part of ATI before long.

Nagorak · Jul 16, 2002

Edward said:
hope trilinear and aniso will be supported now.

And I would like to know if 60+ fps at 1280x1024x32 with16x aniso in UT2003 is possible

Based on the scores we've seen with GF4 in UT2003, I'd say this is not just a possibility, but almost a guarantee.

Basic · Jul 16, 2002

Geeforcer:
I'll take your mesh on the first page as an example.
It's got 14 triangles and 13 vertices. (You counted edges when you got the number 26.)

First draw the upper row of triangles as a strip. A strip needs two vertices to start, then outputs one triangle per vertice. That's 9 vertices.
Then draw the lower row of triangles as a strip. That's 9 more vertices, but 5 of them have already been calculated, and could be in the cache.

Total 9+(9-5) = 13 vertices transformed.

13/14 < 1 vert/tri

And with larger meshes it would be even better.

An important point here though is that each strip must be short enough that the vertices don't drop out of cache before next strip tries to reuse them. Different cache rules and sizes would need different stripping to get optimal cache use.

Dio · Jul 16, 2002

<edit> still having trouble with this editor business!

I'll get it eventually. Move along. Nothing to see here.

Dio · Jul 16, 2002

Very approximately, on complex models that have a single-state render you can consider vertex rate and triangle rate to be roughly the same.

Edges are very often broken though, because of the need to have independent texture coordinates rendering the faces on either side of the line connecting two vertices. In this case vertex reuse is often quite low.

R300 - Final Specs?

DemoCoder

PC-Engine

Guest

Guest

MuFu

Chief Spastic Baboon

hughJ

Basic

Guest

Guest

Dave

Geeforcer

Harmlessly Evil

OpenGL guy

Geeforcer

Harmlessly Evil

ERP

Geeforcer

Harmlessly Evil

Althornin

Senior Lurker

Geeforcer

Harmlessly Evil

Nagorak

Nagorak

Basic

Dio

Dio

Similar threads