First few GFX benches

K.I.L.E.R · Jan 5, 2003

Uttar, don't take this as a hostile attack, but I've seen 3 separate threads now where you've posted essentially totally incorrect conclusions about hardware. (e.g. supersampling vs multisampling thread, etc)

Lay off him and everyone else. He is trying to learn and he doesn't need people putting him down for it.

I'm pretty sure others here as well want to learn.

Doomtrooper · Jan 5, 2003

More like trying to educate the educated.. :?

Ailuros · Jan 5, 2003

Doomtrooper,

3dmark was created to show off system capabilities under dx7/8 situations. I wouldn't think it's the applications fault if anyone results to wrong conclusions from it's scores. In that sense 3dmark is not the only application from which many have drawn conclusions.

Since you mentioned KYRO: would PVR tomorrow create a benchmark suite where they would add numerous iterations of their techdemos (like FableMark, VillageMark etc etc.) and someone would see a KYRO getting far better scores than numerous competing sollutions, it would be his fault and his alone if he'd conclude that KYRO is faster across the board than the rest.

Does that make sense as a parallel?

Arun · Jan 5, 2003

DemoCoder said:
Uttar, don't take this as a hostile attack, but I've seen 3 separate threads now where you've posted essentially totally incorrect conclusions about hardware. (e.g. supersampling vs multisampling thread, etc)

GPUs don't need to write post-transformed vertices back to video RAM. There is an on-chip FIFO that stores the last few post-transformed vertices. It's between 10 and 16 vertices depending on chip architecture. DirectX even includes an Optimize Mesh routine that essentially reorders your vertices to make optimal use of the vertex FIFO. It has a tremendous effect on performance.

The NVidia patent you refer to is talking about a small onchip buffer, not writing post-transformed vertices back to video memory (and reading them BACK! That would be a stupendously dumb design)

You know what makes me look the most stupid? That I knew there were between 10 ( NV10 ) and 16 ( NV20 ) Vertex Caches which are FIFO ( BTW, I wonder how many the NV30 got... ) .
And that's precisely why I supposed all that insane ( and ridiculous ) idea to store thousands of vertices in memory before using the indices. And yes, it's completely dumb. But that was the only explanation I could find for that thing in the patent.

Your explanation actually makes sense: they would refer to a cache larger than most others by saying "memory". I don't quite understand why they'd do that ( they've always called cache as buffers before ) - but it still sounds a lot more logical than everything I've said.

Everything is as most people ( but me

) supposed: the vertices waiting to get in Triangle Setup are in a Vertex Cache.
The *only* bandwidth cost of T&L, thus, is reading static VBs ( and IBs )
So, let's consider a 32B FVF. And consider the GFFX limit 350M vertices/s at 60FPS ( 5.8M vertices/frame )
That's, err, impossible. It would require 177MB of storage

Err... And let's consider AGP 8X 2GB/s limit. At 60FPS, that's only 1M Vertices/frame...

Okay, so assuming we can use 75% of AGP 8X ( the remaining being used for textures ) and 64MB of memory for stored vertices ( that's really a best case scenario, you couldn't play at high res at all with that ) - how many vertices per frame and per second can we have?

We can have 2.75M Vertices/frame and 165M Vertices/s
However, that costs 64*60MB: 3840MB/s - 24% of the GFFX bandwidth!
And having 64MB for static data would probably not be logical unless you've got 256MB of RAM on your GPU. And 75% of AGP 8X would only be possible if there are nearly no texture uploads, so you'll also need 256MB of RAM on your GPU there too.
Also, it's highly unlikely every single static vertex is going to be drawn in a real world game. So, in practice, it might be something like 30% of them being drawn in the same frame ( thus resulting in barely 1.35 Vertices/frame ... )

Sounds like it was required to make the switch to better polygons, and not more polygons!

I really, really hope I didn't do an error here again. I'd look so stupid if I did...

I did reread all of this, so I think it's less likely there's any major error.

I'm sorry for having done so many errors in past threads and messages. I won't post things unless I'm absolutely certain of it, and checked every single spot on the planet to make sure.

Uttar

MDolenc · Jan 5, 2003

NV30 can store 24 vertices in cache if I remember correctly.
And as you found out it's nearly impossible reach that high vertex counts since both memory and bandwidth become an issue (also remember that 32 byte vertex is bare minimum). Also remember that you need a minimal vertex shader to achieve this 350 million vertices per second mark (about 4 instructions - transformation + color output).

joozoo · Jan 5, 2003

NV1X can store 16 vertices and NV2x can store 24 vertices in vertex cahce. You can see those numbers in NvTriStrip lib source. Dont know for NV30.

Arun · Jan 5, 2003

MDolenc said:
NV30 can store 24 vertices in cache if I remember correctly.
And as you found out it's nearly impossible reach that high vertex counts since both memory and bandwidth become an issue (also remember that 32 byte vertex is bare minimum). Also remember that you need a minimal vertex shader to achieve this 350 million vertices per second mark (about 4 instructions - transformation + color output).

Of course

But my point is that not any synthetic benchmark could even get anywhere near 350 Million vertices/second.
And if they're all dynamic, like in Benmark, it couldn't get more than 60M Vertices/s ( which might result in higher triangle counts than 60M/s, because BenMark uses the Vertex Cache very effectively AFAIK )

Uttar

joozoo: I just checked, and we're both wrong.
http://developer.nvidia.com/docs/IO/1307/ATT/GDC2001_Optimization.pdf ( about 650KB )
I think it's at page 15
TNT: 16 Vertex Caches
GeForce: 10 *effective* Vertex Caches
GeForce 3: 18 *effective* Vertex Caches

However, we got no number for the GF4, and one of the big highlights was better cache efficiency. I suppose the cache is the same size as on the GF3, but who knows...

Uttar

MDolenc · Jan 5, 2003

Uttar said:
But my point is that not any synthetic benchmark could even get anywhere near 350 Million vertices/second.
And if they're all dynamic, like in Benmark, it couldn't get more than 60M Vertices/s ( which might result in higher triangle counts than 60M/s, because BenMark uses the Vertex Cache very effectively AFAIK )

You could reach 350 million vertices per second mark and Benmark would probably come quite close (it doesn't use dynamic vertices unless you told him to). Of course you might need to disable stuff like color writes

. You just render same (static) object multiple times per frame (different matrix every time), you could also use 16 bit floats to save some memory and bandwidth (so each vertex would be 12 bytes or less)... Not that it would be particularly useful for anything...

joozoo · Jan 5, 2003

No, ACTUAL vertex cache sizes are 16 for Geforce1/2 and 24 for Geforce3.I just double checked. Just go to cvs1.nvidia.com and look for NvTriStrip.h file.

Hyp-X · Jan 5, 2003

joozoo said:
No, ACTUAL vertex cache sizes are 16 for Geforce1/2 and 24 for Geforce3.I just double checked. Just go to cvs1.nvidia.com and look for NvTriStrip.h file.

If I understand correctly, there can be 6 vertices in processing, so they recommend to optimize as if the cache size would be smaller.

Thats where the 10 and 18 numbers are coming from.

Arun · Jan 5, 2003

Joozoo: Interesting. So, in each case, the effective count is 6 less than the actual count. Any idea why?
---
EDIT: Thanks Hyp-X for that explanation

Makes sense, seeing how many nVidia patents say a single Transform/Lighting/Vertex Shading unit is able to work on 3 vertices at the same time in optimal conditions.
---

MDolenc: Oh, right, Benmark is static. I think I confused it with another similar benchmark, not sure.

Hmm, yeah, it might be possible to get at 350 Million vertices/second. It's easy to get a 24 Bytes FVF ( by not using texturing ) , and you could transform&draw the same vertices many time.
That would make it possible to get at 350M Vertices/s

But then, why doesn't the Radeon 9700 Pro get more than 115M Vertices/s at BenMark? Maybe there are too little vertices on screen, and fillrate becomes a problem. So yeah, disabling color writes... Or maybe there are too many triangles on screen, and Triangle Setup becomes the bottleneck.
Evyl!

Uttar

Fuz · Jan 5, 2003

Uttar said:
I'm sorry for having done so many errors in past threads and messages. I won't post things unless I'm absolutely certain of it, and checked every single spot on the planet to make sure.

As I have been telling my players for years now, one of the best ways to learn is by making mistakes. No need to apologize.

breez · Jan 5, 2003

http://babelfish.altavista.com/babe...s/xNews.php?act=shownews&id=6&lp=de_en&tt=url

Can this be true? 3DMark that high isn't possible on todays CPU's.

Arun · Jan 5, 2003

breez said:
http://babelfish.altavista.com/babelfish/urltrurl?url=http%3A%2F%2Fwww.computer-trend.biz%2Ftests%2FxNews.php%3Fact%3Dshownews%26id%3D6&lp=de_en&tt=url

Can this be true? 3DMark that high isn't possible on todays CPU's.

It was determined for a LOT of reasons that those were fakes. And it was determined that way by a LOT of sites and people.
Gainward PCBs are red, not green. All of those images are stolen from nVidia launch documents. Also, the gainward manual is the one of a GeForce 4 ( and not even one with AGP 8X... )

The Unreal score doesn't show the level. And the GF4 is way too fast at it.

Uttar

Reverend · Jan 5, 2003

Pssst, here's a secret... never compare 4xAA performance between a R300 and a NV30...

demalion · Jan 5, 2003

Hmm...is that an extremely vague and annoying hint?

Doomtrooper · Jan 5, 2003

Reverend said:
Pssst, here's a secret... never compare 4xAA performance between a R300 and a NV30...

There is more to AA then performance, lets add IQ to that also....or lets move up the comparison to 6X AA.

Qroach · Jan 5, 2003

How about performance and IQ. Doom, you almost sound like you think the FX will out perform the the 9700 or something...

Sabastian · Jan 5, 2003

demalion said:
Hmm...is that an extremely vague and annoying hint?

Here is my interpretation. 8)

The GeforceFX is extremely bandwidth starved and gets choked out at 4X AA. :?

Doomtrooper · Jan 5, 2003

Ailuros said:
Doomtrooper,

3dmark was created to show off system capabilities under dx7/8 situations. I wouldn't think it's the applications fault if anyone results to wrong conclusions from it's scores. In that sense 3dmark is not the only application from which many have drawn conclusions.

Support for Microsoft DirectXÂ®8.1.
A new DirectX8.1 feature test showing Vertex Shaders & Pixel Shaders 1.4. *
Featuring 3 game tests plus a 4th game test using DirectXÂ®8.0 hardware accelerated features. *
Directx8.0 feature tests showing Vertex Shaders, Pixel Shaders 1.1 and Point Sprites. *
Two game tests use Ipion real-time physics by Havok.

* A fully DX8 compliant 3D accelerator is required for running this test

Since you mentioned KYRO: would PVR tomorrow create a benchmark suite where they would add numerous iterations of their techdemos (like FableMark, VillageMark etc etc.) and someone would see a KYRO getting far better scores than numerous competing sollutions, it would be his fault and his alone if he'd conclude that KYRO is faster across the board than the rest.

Does that make sense as a parallel?

No that does not make a parallel IMO, any IHV could make a benchmark that would put their hardware in the best light, be it Nvidia or ATI.
This benchmark is not written by a IHV, it is supposed to be a neutral and fair benchmark yet there is more politics flying around there it would make the CIA look like childs play.
I love the system requirements..

IntelÂ® compatible 500MHz or faster processor

Intel compatible processor ...LOL...its called X86..

Politics baby.

First few GFX benches

K.I.L.E.R

Retarded moron

Doomtrooper

Ailuros

Epsilon plus three

Arun

Unknown.

MDolenc

joozoo

Arun

Unknown.

MDolenc

joozoo

Hyp-X

Irregular

Arun

Unknown.

Fuz

breez

Arun

Unknown.

Reverend

demalion

Doomtrooper

Qroach

Sabastian

Doomtrooper