New instancing demo

mongoled said:
euan said:
cool :)

Option 2 makes my PSU give out a faint but high pitch noise, and option 4 gives out a lower pitch and sounds a bit like white noise. More fascinating that instancing itself. :LOL:

Heheheee, mine too,

you dont happen to have an nforce2 based mobo? My computer 'sings' to me depending on what parts of the system are being taxed. The last 3 tests definatley are more taxing on the mem subsystem in order of progression 3<4<5.

To be more specific the sound in test 5 conveys to me that my system memory is being canned, whilst test 4 seems to be more CPU cache and some system mem, whilst 3 is most probably CPU cache being tested (EDIT:this is wrong LOL). Tests 1 and 2 my computer aint singing

:)

Could someone xplain at which point u are taking your readings from? It would be cool if Humus could implement an average fps, that would make assesment alot more easier, negating human error in recording the fps.

Oh and I found this version to be a little buggy, i had to mess abt with min/max/fullscreen be4 i managed to get the effets to show properly. Sometimes was getting the effects half screen, sometimes black screen.

1. 460 >---- 160
2. 385 >---- 150
3. 425 >---- 200
4. 125
5. 59

This is fresh instal SP2/cats 4.9/dx9.0c

9800Pro>XT 480/400
nforce2 @ 235x11.5 (2.7Ghz)

mong

Yeah Nforce 2 mobo!

Just don't move the mouse when you start the demo. The fps is very consistant on my system.
 
Alstrong said:
Should we be expecting an increased use of particle systems now? :D
No. As you can see, there are three ways here where particles can be rendered quickly. The performance between them is quite close, and so a developer shouldn't really be bothered as to which one renders most quickly, but rather which of the techniques is easiest to use and/or has least CPU overhead.

What you should take from this, however, is that geometry instancing is much faster than one draw call per particle. So, geometry instancing should clearly be used whenever it is unfeasible to use the other techniques to improve performance.

I believe one such case would be in an RTS where you have a many of the same type of model on the screen at once. It'd take too much time to pack them all into the same geometry buffer each frame so that they could be rendered in one call each frame.
 
euan said:
Yeah Nforce 2 mobo!

Just don't move the mouse when you start the demo. The fps is very consistant on my system.

Heehehee, knew it,

nforce2 = singing board

:D

No mouse movement, just retested it, the frames only drop on tests 1-3 and it happens when the effect moves to the 'forefront' of the screen.
 
Alstrong said:
Should we be expecting an increased use of particle systems now? :D

I still don't quite understand when is the best situation to use GI... :?

Is it just huge amounts of simple objects (particles, low polygon objects)?
Yes, it kind of that. It allows to draw the same geometry several times with one call (and different parameters for each instance), which saves the overhead of multiple calls. This overhead is most significant compared to drawing time when the amount of geometry is small.

So it's of use for scenery or for strategy games where a lot of similar units are on screen (like the Total War games).
 
Thanks Chalnoth, ET. :)

*thinks of Starcraft 2*

Anyone know if any of the other solutions (2 and 3) was used for Warcraft 3?
 
Hmm, I get something like:

430
450
540
185
119

on a 6800 GT @ 400 mHz core. The first two go up and down a lot though!

Athlon64 FX53, Win XP SP2 and 61.77 drivers.
 
No mouse movement, just retested it, the frames only drop on tests 1-3 and it happens when the effect moves to the 'forefront' of the screen.
Same here... Around 280 FPS dropping to around 100 when the "stream" goes through the camera.
 
How the driver does instacing

ATI and nVidia could quite easily be doing instancing in different ways, and therefore it is quite feasible that one card will be better than another in terms of instancing.

I could imagine for older products, instancing is done purly in the driver itself (basically doing multiple dp2 calls internally). Drivers for more advanced cards could be 1/2 driver and 1/2 hardware, resulting in better performance, and finally, the latest (or yet to be released) HW will do it nativly in the HW, resulting in the best performance.

If developers start using this feature (I know I would, I've been wanting this feature ever since I started using vertex shaders), then IHV support for this will be developed and get more mature, until its just a standard feature.

Thanks for writing the demo Humus, as now alot more people know about instancing.

Try MSDN for more info
 
Baraclese said:
The application quits at startup with an invalid instruction.. I guess that must be my old thunderbird@800MHz :(

Fixed. I had it compiling with SSE instructions to boost performance a bit. The new version will be a bit slower, like 10% or so, but it should run on any CPU now.

It should also run on 8500 and GF3 now, unless there are more stuff I have missed.
 
DegustatoR said:
Exactly. This demo probably doesn't use NV40's GI at all.

There's no difference between NV40 instancing and R300/R420 instancing, except that MS refused to add a caps bit for it and arbitrarily linked it to the entirely unrelated VS3.0, and therefore it must be queried through a FOURCC format on VS2.0 hardware.
 
ellingsen1 said:
Could it be that the last few tests are very CPU dependant?

Yup. Performance could be a good deal higher for the instancing/contant instancing/VB upload paths. Moving from MSVC 6.0 to MSVC.NET bumped speed on all three, while the other two remained about the same. They rely however on a good deal of CPU work done outside the app in the DX runtime and the driver, so a faster CPU will help them.
 
Alstrong said:
Should we be expecting an increased use of particle systems now? :D

I still don't quite understand when is the best situation to use GI... :?

Is it just huge amounts of simple objects (particles, low polygon objects)?

I don't think we'll see many more particles systems because of this. It was my first idea of an ideal usage scenario where instancing would show benefits, but after playing around with it it's not the most ideal application though. I've found that ideal cases are with models from perhaps 20-100 triangles. Larger than that and the instancing through constants seems to take the lead. If you got plenty of spare CPU time the full expansion to vertex buffer becomes competive too or even wins. Instancing moves more work over on the GPU, so if the GPU is already the limiting factor, then spending time on the CPU to ease the GPU burden will help performance. It's like how some people noticed T&L slowed things down back in the days, but that wasn't neccesarily because T&L itself was slow, just that the GPU was already the bottleneck and now you pushed more work on it.
 
So this should be more or less a checklist feature in-game? i.e. let the user determine if it's good or not to enable.
 
Humus said:
It should also run on 8500 and GF3 now, unless there are more stuff I have missed.

And it did! It works with the GF3 now. :D
The instancing, as expected didn't work, though.
 
Back
Top