R300 - Final Specs?

400 M vertices/s * ~40 bytes/vertex = 16gigabytes/s. The only way it will ever approach 400M vertices is if you are redrawing the same vertices over and over again in the vertex cache. Even if we assume 24byte vertices, you've got a bandwidth problem.


400M is probably the theoretical max if you take the minimum vertex transform and compute how many can be done per clock times the clock rate. It doesn't take into account triangle setup, bus bandwidth, etc.

Someone needs to release a reasonably complicated vertex and pixel shader benchmark like SpecCPU or Dhrystone and let cards bench against that. "e.g. we get 13,000 VertexStones and 9,000 PixelStones"

The raw performance of the minimal vertex shader is a hard yard stick to gauge real performance by.
 
The heatsink is wider because it cannot go thicker since the AGP slot on a motherboard lays parallel to the PCI slots. Unlike present day CPUs that have plenty of vertical space for a tall heatsink, addin cards fit too close to one another for a tall heatsink so instead of going higher they become wider.
 
PC-Engine said:
The heatsink is wider because it cannot go thicker since the AGP slot on a motherboard lays parallel to the PCI slots. Unlike present day CPUs that have plenty of vertical space for a tall heatsink, addin cards fit too close to one another for a tall heatsink so instead of going higher they become wider.

Thats it ... I bet. Thanks. Still though the chip itself must be a monster sized chip comparitively speaking of course.
 
Well it consumes about 25W I believe, so dissipation is going to be very high. I'd expect some nice overclocks with the external power feed and scary-ass cooling.

MuFu.
 
"So the triangle count is somewhere in between 200-300 million(WOW!) where the geforce 4 ti is about 125 million. Right?"

as far as I know, every "poly/triangle" performance quote you see online are really in relation to vertices.. so the 125 million number would be directly comparable to the 400 million number..
 
With a large and good enough vertex cache, and a well optimized mesh, you could even aproach 0.5 vertices/triangle. But I doubt that the triangle setup would handle 800Mtri/s. It might not even handle 400Mtri/s.

I agree with DemoCoder that it likely is a teoretical max under the assumptions he says. But fast VS implementations are not likely done with that case in mind. The reason to increase the VS throughput is to make the realy comlex vertex programs fast.
 
hughJ said:
"So the triangle count is somewhere in between 200-300 million(WOW!) where the geforce 4 ti is about 125 million. Right?"

as far as I know, every "poly/triangle" performance quote you see online are really in relation to vertices.. so the 125 million number would be directly comparable to the 400 million number..

Err I think the 125 million number is the theretical limit of the geforce 4 ti in vertices... According to this that number is attached to the geforce 4 ti 4400 the geforce 4 ti4600 does 136 million vertices. So the R300 does relatively the same number of vertices per second is what you are saying? Hrm that doesn't sound right.

http://www.nvidia.com/view.asp?PAGE=geforce4ti

GeForce4 Ti 4600 Vertices per Second:136 Million
Fill Rate:4.8 Billion AA Samples/Sec.
Operations per Second:1.23 Trillion
Memory Bandwidth:10.4GB/Sec.
Maximum Memory:128MB


GeForce4 Ti 4400 Vertices per Second:125 Million
Fill Rate:4.4 Billion AA Samples/Sec.
Operations per Second:1.12 Trillion
Memory Bandwidth:8.8GB/Sec.
Maximum Memory:128MB

GeForce4 Ti 4200 Vertices per Second:113 Million
Fill Rate:4 Billion AA Samples/Sec.
Operations per Second:1.03 Trillion
Memory Bandwidth:up to 8GB/Sec.
Maximum Memory:128MB
 
You can easily inflate transformation numbers: multiply them by three. Each vertex goes through three spaces, so count the vertex transformation rate once for each space. Nobody said they would only count it once for all three spaces. Just a thought.
 
In a triangle mesh with "correct" topology, there are 6 triangles at every vertex (draw out a quad mesh and check out one of the internal verts).

Since each tri requires 3 verts, it requires 3/6 = 0.5 verts per tri.

The only topologies that actually have 0.5 verts per tri are spheres (which are actually slightlly better) and toruses, but any sufficiently large mesh will approach it.

To exploit this you need a post transform cache.

But my guess would be that ny quoted tri number is setup limited, not transform limited (as is the case with Geforce 4). With a post transform cache if your playing number games then transforms are basically free.
 
Perhaps I am not thinking clearly (its rather late), but how would you draw a 2D mesh with more triangles then vertices?
 
Geeforcer said:
If R300 is indeed 107M/.15 @ 315MHz, one has to wonder WTH did Matorx do wrong? Its not like there has been a major breakthrough with .15 process in a month following Parhelia's release.

Maybe the engineers at ATI actually know a little about what they're doing? ;)

In all honesty Matrox is a private company and it's a LOT smaller than either ATI or Nvidia. I suspect they just don't have the resources to fund serious R&D. Matrox is a very niche company, so hopefully they'll be fine. The only problem is if others start moving into their niche (ATi w/hydravision? etc).

Maybe we'll see Matrox as a part of ATI before long.
 
Edward said:
hope trilinear and aniso will be supported now.

And I would like to know if 60+ fps at 1280x1024x32 with16x aniso in UT2003 is possible

Based on the scores we've seen with GF4 in UT2003, I'd say this is not just a possibility, but almost a guarantee.
 
Geeforcer:
I'll take your mesh on the first page as an example.
It's got 14 triangles and 13 vertices. (You counted edges when you got the number 26.)

First draw the upper row of triangles as a strip. A strip needs two vertices to start, then outputs one triangle per vertice. That's 9 vertices.
Then draw the lower row of triangles as a strip. That's 9 more vertices, but 5 of them have already been calculated, and could be in the cache.

Total 9+(9-5) = 13 vertices transformed.

13/14 < 1 vert/tri

And with larger meshes it would be even better.

An important point here though is that each strip must be short enough that the vertices don't drop out of cache before next strip tries to reuse them. Different cache rules and sizes would need different stripping to get optimal cache use.
 
<edit> still having trouble with this editor business!

I'll get it eventually. Move along. Nothing to see here.
 
Very approximately, on complex models that have a single-state render you can consider vertex rate and triangle rate to be roughly the same.

Edges are very often broken though, because of the need to have independent texture coordinates rendering the faces on either side of the line connecting two vertices. In this case vertex reuse is often quite low.
 
Back
Top