Direct X 9 limit 500 objects in 1 frame (true?)

Hello All,

I read an article on hardocp that talked about DX9 and DX10. On the second page it states that DX9 has a "small batch problem" that makes it so that only about 500 objects can be on one frame at a time. Is this true?

If this is true, then Ageia will not be able to show tons of extra objects on the screen because of a DX9 limitation in DX9 games.

Also if this is true, then Ageia might get a big boost from running games in DX10 for the sole reason of not having to deal with the small batch problem.

Again if true, this looks like it could really limit Ageia.

Am I missing something or way off base? Thankx,
Dr. Ffreeze
 
Yes, Direct3D 9 has a "context switch" overhead - which requires the developer to pay attention to how they arrange their Draw**() calls. Its a big topic, but you might have come across "batching" on various IHV's slides over the last few years - this is the primary way we avoid the context switch overhead.

Another common trick is "instancing" - allowing you to encode many (1000's) of objects into a single draw call. I would imagine the "Physics Effects" stuff would be very well suited to this technology.

Yes, Direct3D 10 has a much lower per-draw overhead which effectively eliminates this problem.

There isn't really a fixed limit... there is a maximum number of primitives you can send in a single draw call, but even then you could send multiple draw calls...

hth
Jack
 
could this "context" overhead be the reason for the low fps when Ageia cards are used in current games? (as discussed elsewhere)
 
The small batch problem comes about when you submit draw packets that don't do much work. In this case the draw overhead takes up a significant amount of time and isn't hidden by GPU processing. I don't know what they mean by 500 objects. What do they mean by objects? Certainly not vertices or even triangles.
 
You can easily have about 3600 soldiers on screen in Rome:Total War.
Above that it tends to take a dramatic slow down seemingly regardless of PC specs, resolution or detail settings.

The game engine itself can allow up to 38400 soldiers in one battle but as soon as you pop over that 3600 in a custom battle you get a warning that performance will be poor.

I have generally assumed this to be an API limitation somewhere.
 
chavvdarrr said:
could this "context" overhead be the reason for the low fps when Ageia cards are used in current games? (as discussed elsewhere)

It could be, but it depends on the games implementation.

Games already do a lot of work to limit batch counts, including building primitive lists with more than a single object in them. This ought to alleviate the drawing lots of stuff for physics issue at the cost of some CPU time.

IMO the batch cost issue is the primary reason that PC games are so CPU heavy in the first place.
 
When I was implementing an algorithm that needed very frequent render state changes, I got 250,000 draw calls per second on a lowly Athlon XP 2400+. I think 500 calls per frame is a bit of an exaggeration.

My guess is you could get away with 5,000 calls per frame on a newer processor, and still only use 50% of the CPU time.
 
Mintmaster said:
When I was implementing an algorithm that needed very frequent render state changes, I got 250,000 draw calls per second on a lowly Athlon XP 2400+. I think 500 calls per frame is a bit of an exaggeration.
What sort of performance did you get, and what sort of batch-sizes were you using?

Common wisdom for optimal performance seems to be between 400-1000 batches per frame. Any more than that and you're going to start incurring a lot of overhead.

Mintmaster said:
My guess is you could get away with 5,000 calls per frame on a newer processor, and still only use 50% of the CPU time.
Its not an exact science, so its difficult to come up with a generalisation like this. What do you mean by 50% CPU time? 50% of the CPU spent submitting batches, because *every* Direct3D application I've written tends to use every last CPU cycle it can get its hands on unless I put artificial limits in :D

Its a bit old now, but Nvidia's "Batch, Batch, Batch:" What Does It Really Mean? (PDF, PPT) might be of interest here. The stats are pretty dated now (1ghz CPU?? did anyone ever use stuff that slow?? :LOL:)

Jack
 
JHoxley said:
Its a bit old now, but Nvidia's "Batch, Batch, Batch:" What Does It Really Mean? (PDF, PPT) might be of interest here. The stats are pretty dated now (1ghz CPU?? did anyone ever use stuff that slow?? )
I think you missed the point of that presentation. If you're totally CPU bound due to draw calls, then your rendering speed is determined only by some linear scaling of draw calls to CPU speed. Hence Wloka's law: ~25k batches/sec per GHz at 100% CPU usage only on submitting batches. If you have a brand spanking new 4 GHz machine, then great! You get to do ~100k batches/sec, without being able to do any other work whatsoever. If you're game AI/physics/whatever takes up 50% of the time, then you only get ~50k batches/sec on your 4 GHz machine.
 
Bob's got the right idea, JHoxley.

It doesn't matter how big my batches were, as I'm saying I got a result higher than expected. Larger batch sizes would only reduce batch rate, even if poly rate increased. And 250k calls per second is the relevant performance metric. Not sure what you're asking for.

The best way to think about performance is in number of cycles for a task. Then you add up your cycles and divide the clock frequency by this sum. Draw calls take about 40K CPU cycles (some variance depending on the CPU, naturally) according to Wloka's law.

However, I didn't find the law very accurate for me. Off by a factor of 5 or so. Could be because I only had a small renderstate change (stencil buffer) between calls. BTW, I've read that presentation, along with another one called "Batching 4EVA!".
 
i just done a simple test and was getting more than a million calls second with glCallList(.) (all geometry onscreen) this is on a 2ghz machine
 
OpenGL draw calls are much cheaper than Direct3D draw calls. No cost of switching to kernel space since most of the driver is in user space and no marshalling by an extra level of run time.
 
true, but it gives an indication of the sort of improvemt d3d10 will most likely see ( im guessing ~5x, ie its not just a few percent )
 
Back
Top