R300 3DMARK2001 PROJECT COMPARE

Mummy · Aug 16, 2002

Well, one thing to consider is that with point sprites you arent going to get any sort of TnL cache so each vertex is transformed once, its like having a non indexed mesh, then i suppose hardware manufacturer dont care much about them as well (heh, who would use them in a game? nobody really, pc game industry sucks) they are probably not optimized much (in hardware).

What u guys think about FP TnL test in 3dmark 2001? i was wondering if using a simple vertex shader instead of the FP pipe would speed it up, since maybe FP pipe isnt improved much compared to VS and in case they emulate it with shaders maybe there's still some slowdown due to beta-drivers or bad optimization.

DemoCoder · Aug 16, 2002

Neeyik said:
The point sprite test, of course, has no polygons

Click to expand...

That's not entirely true as DX8 handles point sprites by rasterising a quad that has each vertex with the same colour (with the option of the texture coordinates being the same too). It's not a poly in the normal sense of things but it's still got vertices.

Surely that's the generic software fallback case? Specific IHV's could implement point sprites in a more optimal way than just generating a quad.

JohnH · Aug 16, 2002

Each point sprite is composed of 1 unique vertex which is unshared. The poly throughput mesh is composed of indexed triangle lists which if arranged to take account of the cache can cost as little as 0.5 vertices per triangle (or if you have a very large vertex cache...). So simple numbers, 1 Million point sprites require 1Million vertices to be transformed, 1 Million triangles arranged as a "perfect" mesh requires 0.5M vertices to be transformed, == 2x t&L perfromance difference.

Note, point sprites are still faster than just using triangles or quads for your sprite by a factor of 3 or 4 respectively. However current dx point sprites don't work with texture pages which means some developer don't like them (due to extra state changes).

Back face culling could be a factor as well, but only if rasterise poly throughput is the bottleneck...

Anyway, just my two peneth.
John.

Neeyik · Aug 16, 2002

It might be me just reading the DX8 SDK wrong but I get the distinct impression that this is how it's done for DX apps; in some ways it would explain the figures.

NVIDIA claim a vertex processing rate of 136Mverts/s - 4 verts/quad gives 34Mquads/sec. I know, I know, I know...this is all made up figures but I get around 30M point sprites per second in the tests.

Dio · Aug 16, 2002

Point sprites in general have 3 advantages:

1. They only require 1 position (but an additional point size), to be run through the vertex shader rather than 3 or 4.

2. You can do clipping optimisations - it is allowed to clip only to the actual point, rather than the point+size. With guard band clipping this should produce the same result in most cases.

3. They can trivially be rasterised as a quad instead of a pair of triangles, thereby doubling setup rate for just minor mods to a triangle rasteriser.

Not all hardware takes advantage of all these.

From an application point of view they are also a lot more convenient.

Colourless · Aug 16, 2002

I don't know what to say... that system has scored about 10x what mine does.

-Colourless

Galilee · Aug 16, 2002

That is one insane cpu

My GF4 Ti4200 and Athlon 2000+ get 11000 points. Same GF4 Ti4200 get 14000 points with a 3GHz P4.
Where it really shines is in the nature test. A Ti4600 get somewhere around 80fps with a cpu like that (and 15000 points) . R300 get 120fps

Reverend · Aug 16, 2002

JohnH said:
Note, point sprites are still faster than just using triangles or quads for your sprite by a factor of 3 or 4 respectively. However current dx point sprites don't work with texture pages which means some developer don't like them (due to extra state changes).

Not just that. If point sprites are slower that means all particles should be made using old fashion quads. Too bad you can't individually rotate, scale, etc with point sprites, the type of advantages you have with quads.

rhink · Aug 16, 2002

Interesting. The game tests are all pretty close, except dragothic low detail and nature. The R300 destroys the GF4 Ti4600 in single texture fillrate (expected, with 8 pixel pipes....), but scores almost exactly the same in multitexture fillrate... even though the R300 is capable of what, 8, or 16 simoltaneous textures compared to the Ti4600's 4? The only places the GF4 loses badly are pixel and vertex shader intensive stuff, especially pixel shader scenes. I figured a next gen card like the R300 should dominate it across the board.

Of course FSAA and aniso are different matters altogether.

2nd look.... I just realized the Ti4600 scores here:
http://service.madonion.com/compare?2k1=3566685

are probably taken from a Ti4600 that is as crazily overclocked as the CPU it's running on

.

JohnH · Aug 16, 2002

Rev, A Quad will take 4x as much vertex processing and bandwidth to load, so if you can live with the limitations of point sprites you should still use them instead of quads.

Reverend · Aug 16, 2002

Of couse, if you can live with the limitations. That's a general given, isn't it?

Neeyik · Aug 16, 2002

R300 destroys the GF4 Ti4600 in single texture fillrate (expected, with 8 pixel pipes....), but scores almost exactly the same in multitexture fillrate... even though the R300 is capable of what, 8, or 16 simoltaneous textures compared to the Ti4600's 4?

The R300 can apply 16 texture layers per pass compared to the GF4's four layers per pass. The multitexture fill rate test produces as many full screen primitives as controlled by the number of texture layers that can be applied - so in the case of the R300 it's 4 whereas with the GF4, it will be 16. In other words, it takes (or should do) 4 passes for the R300 and 16 passes for the GF4 to complete one frame. However, how long is one pass lasting for with each card? How well is each card handling the multiple passes?

pascal · Aug 16, 2002

Colourless said:
I don't know what to say... that system has scored about 10x what mine does.

-Colourless

My system is not that fast too. But we can still play good games right?

Xmas · Aug 16, 2002

Uhm, 2848.8 MTexels/s?
That's at least 356MHz for a 8x1 pipe design...

alexsok · Aug 16, 2002

Xmas said:
Uhm, 2848.8 MTexels/s?
That's at least 356MHz for a 8x1 pipe design...

Then they overclocked the R9700pro or what?

The R9700pro is 325/620

Dave Baumann · Aug 16, 2002

Uhm, 2848.8 MTexels/s?
That's at least 356MHz for a 8x1 pipe design...

Ummm. That result is a little odd, but consistent. I don't think its overclocked... (I'll shut my trap now!

)

Xmas · Aug 16, 2002

DaveBaumann said:
Uhm, 2848.8 MTexels/s?
That's at least 356MHz for a 8x1 pipe design...

Click to expand...

Ummm. That result is a little odd, but consistent. I don't think its overclocked... (I'll shut my trap now! )

I guess you know a bit more about it than I do

Anyway, I see four possibilities here:
1. that result is faked/incorrect
2. the card was overclocked
3. it does not render everything (there are no "hidden surfaces" in that test)
4. it's not an 8x1 design

Neeyik · Aug 16, 2002

The Fill Rate tests apply 64 texture layers in total, so for a resolution of 1024 x 768 that's 64 times this resolution in terms of texture pixels, as each texture is the size of the screen. Per frame, you have 50,331,648 texture pixels to be churned out. The tests record the average frame rate and then multiplies this value by the amount of texture pixels applied per frame to give you your fill rate score.

Now remember the case I mentioned a post or two before - the GF4 needs 16 passes to complete a frame whereas the R300 only needs 4. Assuming that the ATI card can do 4 longer passes quicker than the GF4 can do 16 shorter ones, it should get a higher fill rate test score as it will render more frames in one second (since each one takes less time).

Dave Baumann · Aug 16, 2002

Now remember the case I mentioned a post or two before - the GF4 needs 16 passes to complete a frame whereas the R300 only needs 4.

although the hardware can do 16 textures per pass its still limited by DX8, so I'd make that 8 passes.

jb · Aug 16, 2002

DaveBaumann said:
Ummm. That result is a little odd, but consistent. I don't think its overclocked... (I'll shut my trap now! )

Dave...what do you mean consistent...to me that means thats the out come of the test after several runs is close...which would then imply that you tested it yourself... do you have a new toy your not telling us about????

R300 3DMARK2001 PROJECT COMPARE

Mummy

DemoCoder

JohnH

Neeyik

Homo ergaster

Dio

Colourless

Monochrome wench

Galilee

Reverend

rhink

JohnH

Reverend

Neeyik

Homo ergaster

pascal

Xmas

Porous

alexsok

Dave Baumann

Gamerscore Wh...

Xmas

Porous

Neeyik

Homo ergaster

Dave Baumann

Gamerscore Wh...

jb

Similar threads