Is AGP 4x the bottleneck of Radeon 9700â€™s triangle count

Dave · Aug 24, 2002

Has ATI implemented dynamic vertex caching in D3D for R300 or isit fixed? Just curious how much work has been done with this.

K.I.L.E.R · Aug 24, 2002

I believe Ati is playing the waiting game so when Nv30 comes out they will release a set of drivers that boost polycount incredibly. Sort of like what Nvidia had done with the Geforce 3. (as Chalnoth had stated)

Xmas · Aug 24, 2002

OpenGL guy said:
Chalnoth said:

I'm just stating that Drivers held back the GeForce's in that test

Click to expand...

And I said before: How do you know it was the drivers that held it back?

What else?

Mummy · Aug 24, 2002

where is vertex buffer in modern games or game engines, in video memory, AGP memory, system memory or using hybrid method?

Anyone can answer me? Thx.

Yes

I can tell how 3dmark creates its vb's and which fvf is set in the High polygon test:

many vertex buffers, with various length but those characteristics:

FVF 18(0x12) D3DFVF_XYZ | D3DFVF_NORMAL
Pool D3DPOOL_DEFAULT
Usage 8 D3DUSAGE_WRITEONLY

rest are:
FVF 258 (0x102) D3DFVF_TEX1 | D3DFVF_XYZ
FVF 274 (0x112) D3DFVF_TEX1 | D3DFVF_XYZ | D3DFVF_NORMAL

for the high polygon test with 8 lights its the same except:

FVF 322 (0x142) D3DFVF_TEX1 | D3DFVF_DIFFUSE | D3DFVF_XYZ
Usage 0x208 D3DUSAGE_DYNAMIC | D3DUSAGE_WRITEONLY

0x60 (96) bytes vbs (like 3 of them).

Every vb created (except those v. small ones in the high poly tests) is created with the D3DPOOL_DEFAULT and the D3DUSAGE_WRITEONLY flags, so they SHOULD go in VRAM (it still depends on drivers).

I can check other apps/games too, just point me to a demo of the game u wanna test.

Zephyr · Aug 25, 2002

Mummy said:
where is vertex buffer in modern games or game engines, in video memory, AGP memory, system memory or using hybrid method?

Anyone can answer me? Thx.

Click to expand...

Yes

I can tell how 3dmark creates its vb's and which fvf is set in the High polygon test:

many vertex buffers, with various length but those characteristics:

FVF 18(0x12) D3DFVF_XYZ | D3DFVF_NORMAL
Pool D3DPOOL_DEFAULT
Usage 8 D3DUSAGE_WRITEONLY

rest are:
FVF 258 (0x102) D3DFVF_TEX1 | D3DFVF_XYZ
FVF 274 (0x112) D3DFVF_TEX1 | D3DFVF_XYZ | D3DFVF_NORMAL

for the high polygon test with 8 lights its the same except:

FVF 322 (0x142) D3DFVF_TEX1 | D3DFVF_DIFFUSE | D3DFVF_XYZ
Usage 0x208 D3DUSAGE_DYNAMIC | D3DUSAGE_WRITEONLY

0x60 (96) bytes vbs (like 3 of them).

Every vb created (except those v. small ones in the high poly tests) is created with the D3DPOOL_DEFAULT and the D3DUSAGE_WRITEONLY flags, so they SHOULD go in VRAM (it still depends on drivers).

I can check other apps/games too, just point me to a demo of the game u wanna test.

Thanks a million. Here are some insteresting games and synthetic benchmark:

Jedi Knight 2, UT2003, Commanche 4, Serious Sam 2, Dungeon Siege, CodeCreatures Benchmark Pro, and Aquanox.

Although I list so many games, listing some of them at your convenience is ok. Thx again.

vogel · Aug 25, 2002

Zephyr said:
Thanks a million. Here are some insteresting games and synthetic benchmark:

Jedi Knight 2, UT2003, Commanche 4, Serious Sam 2, Dungeon Siege, CodeCreatures Benchmark Pro, and Aquanox.

I can only speak for UT2003:

Static geometry (batches of world geometry sorted by material and transformed into world space) ends up using the managed pool whereof dynamic content like skeletal mesh vertices each have seperate vertex buffers in the default pool. Other dynamic content shares one dynamic buffer which also resides in the default pool. Vertex buffers are allocated before textures so my guess is they most likely end up in local memory.

The batched geometry (the majority of the polys in a scene) use position, normal, diffuse and as many texcoords as needed. So in the normal case (one set of texture coordinates) the stride is 36.

I think the only card UT2003 is sort of limited by triangle throughput is a Kyro II because it doesn't have TnL but enough fillrate (in comparision to other cards that lack TnL) to make that the bottleneck.

-- Daniel

vogel · Aug 25, 2002

vogel said:
I think the only card UT2003 is sort of limited by triangle throughput is a Kyro II because it doesn't have TnL but enough fillrate (in comparision to other cards that lack TnL) to make that the bottleneck.

Almost forgot the Radeon 7000. BTW, that's on low end machines - with a 2.8 GHz P4 you most likely won't be bound by triangle throughput even with software vertex processing

-- Daniel

Saem · Aug 25, 2002

Vogel,

I remember there was mention of "recylcing geometry" or something like that used in the Unreal engine build that'll be used in UT2k3. First of is there such a thing or am I hallucinating and if there is where and how does the performance improvement come in?

KimB · Aug 25, 2002

I know that I've seen it a number of times where the developers use "prefab" geometry many times in a level to lessen the memory footprint of hte geometry.

Quick example: There might be a model for a length of pipe, and a model for an elbow, and those could be put together in many different ways to make a complex labrynth of pipes.

vogel · Aug 25, 2002

Saem said:
I remember there was mention of "recylcing geometry" or something like that used in the Unreal engine build that'll be used in UT2k3. First of is there such a thing or am I hallucinating and if there is where and how does the performance improvement come in?

We are using a system of prefabs, like Chalnoth mentions, that allows level designers to construct complex levels out of a set of already existing geometry. This geometry used to be instanced to save memory and shared position/ normal/ base texture coords across instances and had separate streams for diffuse to allow for independent precomputed vertex lighting. As an optimization at level load time all geometry is transformed into world space and sorted by material to generate large chunks of geometry that can be rendered without any state changes in a single DIP call. This uses much more memory (up to 20 MByte in some levels) but is a noticeable speedup due to the increased batch sizes for DIP calls. There is a setting in the ut2003.ini file you can fiddle with in the upcoming demo that allows you to switch between the two approachs. The default is using batching.

[Engine.GameEngine]
UseStaticMeshBatching=True

-- Daniel

Saem · Aug 26, 2002

Thanks, Vogel and Chalnoth.

KimB · Aug 26, 2002

vogel said:
As an optimization at level load time all geometry is transformed into world space and sorted by material to generate large chunks of geometry that can be rendered without any state changes in a single DIP call. This uses much more memory (up to 20 MByte in some levels) but is a noticeable speedup due to the increased batch sizes for DIP calls.

Interesting...I guess that would mean that the geometry throughput in UT2k3 isn't AGP-limited? That is assuming, of course, that not all the geometry can be stored in video memory when the geometry is transformed at level load time.

Or, is this only a benefit for, say, 128MB cards?

vogel · Aug 26, 2002

Chalnoth said:
Interesting...I guess that would mean that the geometry throughput in UT2k3 isn't AGP-limited? That is assuming, of course, that not all the geometry can be stored in video memory when the geometry is transformed at level load time.

With the biggest level it's slightly above 20 MByte of vertex data so I guess most vertex buffers will actually end up in local memory on modern cards though I doubt it actually matters as we are quite fillrate bound.

-- Daniel

Ilfirin · Aug 26, 2002

vogel said:
we are quite fillrate bound.
-- Daniel

Aren't we all..

multigl2 · Aug 26, 2002

vogel said:
With the biggest level it's slightly above 20 MByte of vertex data so I guess most vertex buffers will actually end up in local memory on modern cards though I doubt it actually matters as we are quite fillrate bound.

-- Daniel

I agree with fillrate bound now... but when do you think we won't be, or what would the desired fillrate be?

vogel · Sep 2, 2002

vogel said:
Chalnoth said:

Interesting...I guess that would mean that the geometry throughput in UT2k3 isn't AGP-limited? That is assuming, of course, that not all the geometry can be stored in video memory when the geometry is transformed at level load time.

Click to expand...

With the biggest level it's slightly above 20 MByte of vertex data so I guess most vertex buffers will actually end up in local memory on modern cards though I doubt it actually matters as we are quite fillrate bound.

FWIW, I was playing a bit with our OpenGL renderer and using AGP vs local memory for static geometry only has a minor impact on average framerate but lowers the maximum framerate by quite a bit. So for games like UT2k3 AGPx4 shouldn't be a bottleneck even if all vertex data is kept in AGP memory (which only is the case in OpenGL on NVIDIA cards)... that is unless you stare at a wall

-- Daniel

Zephyr · Sep 2, 2002

Thanks, vogel

Reverend · Sep 2, 2002

Hey Daniel, going a little off the path here but Tim told me that while we can record our own demos *and* benchmark said demo, it may not be best since there'll be other kinds of overhead - he said that with the included UPT, pure rendering performance is measured and not AI and/or game code which are stuff you guys don't attempt to benchmark since consistency of results can't be guaranteed.

I'd like to provide some arguments to Tim about what he said but know better of course

. Question is, how reflective of gameplay (which is the most important thing about any benchmark, isn't it?) is UPT? Not very, based on what Tim said IMO.

vogel · Sep 2, 2002

Reverend said:
Hey Daniel, going a little off the path here but Tim told me that while we can record our own demos *and* benchmark said demo, it may not be best since there'll be other kinds of overhead - he said that with the included UPT, pure rendering performance is measured and not AI and/or game code which are stuff you guys don't attempt to benchmark since consistency of results can't be guaranteed.

I'll be glad to answer the question in a thread where it wouldn't be so blatantly off- topic

-- Daniel

Reverend · Sep 2, 2002

vogel said:
Reverend said:

Hey Daniel, going a little off the path here but Tim told me that while we can record our own demos *and* benchmark said demo, it may not be best since there'll be other kinds of overhead - he said that with the included UPT, pure rendering performance is measured and not AI and/or game code which are stuff you guys don't attempt to benchmark since consistency of results can't be guaranteed.

Click to expand...

I'll be glad to answer the question in a thread where it wouldn't be so blatantly off- topic

-- Daniel

<grumbling> ... asshole...

Please respond at the new thread at the Games forum at http://www.beyond3d.com/forum/viewtopic.php?t=2334

Is AGP 4x the bottleneck of Radeon 9700â€™s triangle count

Dave

K.I.L.E.R

Retarded moron

Xmas

Porous

Mummy

Zephyr

vogel

vogel

Saem

KimB

vogel

Saem

KimB

vogel

Ilfirin

multigl2

vogel

Zephyr

Reverend

vogel

Reverend

Similar threads