Is AGP 4x the bottleneck of Radeon 9700’s triangle count

Has ATI implemented dynamic vertex caching in D3D for R300 or isit fixed? Just curious how much work has been done with this.
 
I believe Ati is playing the waiting game so when Nv30 comes out they will release a set of drivers that boost polycount incredibly. Sort of like what Nvidia had done with the Geforce 3. (as Chalnoth had stated)
 
where is vertex buffer in modern games or game engines, in video memory, AGP memory, system memory or using hybrid method?

Anyone can answer me? Thx. :p

Yes :)

I can tell how 3dmark creates its vb's and which fvf is set in the High polygon test:

many vertex buffers, with various length but those characteristics:

FVF 18(0x12) D3DFVF_XYZ | D3DFVF_NORMAL
Pool D3DPOOL_DEFAULT
Usage 8 D3DUSAGE_WRITEONLY

rest are:
FVF 258 (0x102) D3DFVF_TEX1 | D3DFVF_XYZ
FVF 274 (0x112) D3DFVF_TEX1 | D3DFVF_XYZ | D3DFVF_NORMAL

for the high polygon test with 8 lights its the same except:

FVF 322 (0x142) D3DFVF_TEX1 | D3DFVF_DIFFUSE | D3DFVF_XYZ
Usage 0x208 D3DUSAGE_DYNAMIC | D3DUSAGE_WRITEONLY

0x60 (96) bytes vbs (like 3 of them).

Every vb created (except those v. small ones in the high poly tests) is created with the D3DPOOL_DEFAULT and the D3DUSAGE_WRITEONLY flags, so they SHOULD go in VRAM (it still depends on drivers).

I can check other apps/games too, just point me to a demo of the game u wanna test.
 
Mummy said:
where is vertex buffer in modern games or game engines, in video memory, AGP memory, system memory or using hybrid method?

Anyone can answer me? Thx. :p

Yes :)

I can tell how 3dmark creates its vb's and which fvf is set in the High polygon test:

many vertex buffers, with various length but those characteristics:

FVF 18(0x12) D3DFVF_XYZ | D3DFVF_NORMAL
Pool D3DPOOL_DEFAULT
Usage 8 D3DUSAGE_WRITEONLY

rest are:
FVF 258 (0x102) D3DFVF_TEX1 | D3DFVF_XYZ
FVF 274 (0x112) D3DFVF_TEX1 | D3DFVF_XYZ | D3DFVF_NORMAL

for the high polygon test with 8 lights its the same except:

FVF 322 (0x142) D3DFVF_TEX1 | D3DFVF_DIFFUSE | D3DFVF_XYZ
Usage 0x208 D3DUSAGE_DYNAMIC | D3DUSAGE_WRITEONLY

0x60 (96) bytes vbs (like 3 of them).

Every vb created (except those v. small ones in the high poly tests) is created with the D3DPOOL_DEFAULT and the D3DUSAGE_WRITEONLY flags, so they SHOULD go in VRAM (it still depends on drivers).

I can check other apps/games too, just point me to a demo of the game u wanna test.

Thanks a million. Here are some insteresting games and synthetic benchmark:

Jedi Knight 2, UT2003, Commanche 4, Serious Sam 2, Dungeon Siege, CodeCreatures Benchmark Pro, and Aquanox.

Although I list so many games, listing some of them at your convenience is ok. Thx again.
;) :) :p
 
Zephyr said:
Thanks a million. Here are some insteresting games and synthetic benchmark:

Jedi Knight 2, UT2003, Commanche 4, Serious Sam 2, Dungeon Siege, CodeCreatures Benchmark Pro, and Aquanox.
I can only speak for UT2003:

Static geometry (batches of world geometry sorted by material and transformed into world space) ends up using the managed pool whereof dynamic content like skeletal mesh vertices each have seperate vertex buffers in the default pool. Other dynamic content shares one dynamic buffer which also resides in the default pool. Vertex buffers are allocated before textures so my guess is they most likely end up in local memory.

The batched geometry (the majority of the polys in a scene) use position, normal, diffuse and as many texcoords as needed. So in the normal case (one set of texture coordinates) the stride is 36.

I think the only card UT2003 is sort of limited by triangle throughput is a Kyro II because it doesn't have TnL but enough fillrate (in comparision to other cards that lack TnL) to make that the bottleneck.

-- Daniel
 
vogel said:
I think the only card UT2003 is sort of limited by triangle throughput is a Kyro II because it doesn't have TnL but enough fillrate (in comparision to other cards that lack TnL) to make that the bottleneck.
Almost forgot the Radeon 7000. BTW, that's on low end machines - with a 2.8 GHz P4 you most likely won't be bound by triangle throughput even with software vertex processing ;)

-- Daniel
 
Vogel,

I remember there was mention of "recylcing geometry" or something like that used in the Unreal engine build that'll be used in UT2k3. First of is there such a thing or am I hallucinating and if there is where and how does the performance improvement come in?
 
I know that I've seen it a number of times where the developers use "prefab" geometry many times in a level to lessen the memory footprint of hte geometry.

Quick example: There might be a model for a length of pipe, and a model for an elbow, and those could be put together in many different ways to make a complex labrynth of pipes.
 
Saem said:
I remember there was mention of "recylcing geometry" or something like that used in the Unreal engine build that'll be used in UT2k3. First of is there such a thing or am I hallucinating and if there is where and how does the performance improvement come in?
We are using a system of prefabs, like Chalnoth mentions, that allows level designers to construct complex levels out of a set of already existing geometry. This geometry used to be instanced to save memory and shared position/ normal/ base texture coords across instances and had separate streams for diffuse to allow for independent precomputed vertex lighting. As an optimization at level load time all geometry is transformed into world space and sorted by material to generate large chunks of geometry that can be rendered without any state changes in a single DIP call. This uses much more memory (up to 20 MByte in some levels) but is a noticeable speedup due to the increased batch sizes for DIP calls. There is a setting in the ut2003.ini file you can fiddle with in the upcoming demo that allows you to switch between the two approachs. The default is using batching.

[Engine.GameEngine]
UseStaticMeshBatching=True

-- Daniel
 
vogel said:
As an optimization at level load time all geometry is transformed into world space and sorted by material to generate large chunks of geometry that can be rendered without any state changes in a single DIP call. This uses much more memory (up to 20 MByte in some levels) but is a noticeable speedup due to the increased batch sizes for DIP calls.

Interesting...I guess that would mean that the geometry throughput in UT2k3 isn't AGP-limited? That is assuming, of course, that not all the geometry can be stored in video memory when the geometry is transformed at level load time.

Or, is this only a benefit for, say, 128MB cards?
 
Chalnoth said:
Interesting...I guess that would mean that the geometry throughput in UT2k3 isn't AGP-limited? That is assuming, of course, that not all the geometry can be stored in video memory when the geometry is transformed at level load time.
With the biggest level it's slightly above 20 MByte of vertex data so I guess most vertex buffers will actually end up in local memory on modern cards though I doubt it actually matters as we are quite fillrate bound.

-- Daniel
 
vogel said:
With the biggest level it's slightly above 20 MByte of vertex data so I guess most vertex buffers will actually end up in local memory on modern cards though I doubt it actually matters as we are quite fillrate bound.

-- Daniel

I agree with fillrate bound now... but when do you think we won't be, or what would the desired fillrate be?
 
vogel said:
Chalnoth said:
Interesting...I guess that would mean that the geometry throughput in UT2k3 isn't AGP-limited? That is assuming, of course, that not all the geometry can be stored in video memory when the geometry is transformed at level load time.
With the biggest level it's slightly above 20 MByte of vertex data so I guess most vertex buffers will actually end up in local memory on modern cards though I doubt it actually matters as we are quite fillrate bound.
FWIW, I was playing a bit with our OpenGL renderer and using AGP vs local memory for static geometry only has a minor impact on average framerate but lowers the maximum framerate by quite a bit. So for games like UT2k3 AGPx4 shouldn't be a bottleneck even if all vertex data is kept in AGP memory (which only is the case in OpenGL on NVIDIA cards)... that is unless you stare at a wall ;)

-- Daniel
 
Hey Daniel, going a little off the path here but Tim told me that while we can record our own demos *and* benchmark said demo, it may not be best since there'll be other kinds of overhead - he said that with the included UPT, pure rendering performance is measured and not AI and/or game code which are stuff you guys don't attempt to benchmark since consistency of results can't be guaranteed.

I'd like to provide some arguments to Tim about what he said but know better of course :) ;) . Question is, how reflective of gameplay (which is the most important thing about any benchmark, isn't it?) is UPT? Not very, based on what Tim said IMO.
 
Reverend said:
Hey Daniel, going a little off the path here but Tim told me that while we can record our own demos *and* benchmark said demo, it may not be best since there'll be other kinds of overhead - he said that with the included UPT, pure rendering performance is measured and not AI and/or game code which are stuff you guys don't attempt to benchmark since consistency of results can't be guaranteed.
I'll be glad to answer the question in a thread where it wouldn't be so blatantly off- topic :)

-- Daniel
 
vogel said:
Reverend said:
Hey Daniel, going a little off the path here but Tim told me that while we can record our own demos *and* benchmark said demo, it may not be best since there'll be other kinds of overhead - he said that with the included UPT, pure rendering performance is measured and not AI and/or game code which are stuff you guys don't attempt to benchmark since consistency of results can't be guaranteed.
I'll be glad to answer the question in a thread where it wouldn't be so blatantly off- topic :)

-- Daniel
<grumbling> ... asshole... :)

Please respond at the new thread at the Games forum at http://www.beyond3d.com/forum/viewtopic.php?t=2334
 
Back
Top