Water! PhysX Position Based Dynamics

[H] says single 580 - does that mean single 580 for PhysX or single 580 doing rendering *and* PhysX?
 
You would probably want to use PhysX and Rendering using the same GPU as the trip down to system RAM and then back to someone else's framebuffer would be insanely expensive!
 
You would probably want to use PhysX and Rendering using the same GPU as the trip down to system RAM and then back to someone else's framebuffer would be insanely expensive!

The practical use of dedicated nVidia cards for PhysX in games with a substantial increment in performance prove that it isn't an issue.
 
Can´t wait to play all those single character games on 2 square meter levels filled with hyper-realistic water.

Just kidding.
 
The practical use of dedicated nVidia cards for PhysX in games with a substantial increment in performance prove that it isn't an issue.

Oh I forgot the mainstream GPU PhysX isn't real physics. It's basically only the rendering of effects which doesn't affect the rigid bodies in your simulation. So using it in SLI mode would definitely present a performance improvement.
 
You would probably want to use PhysX and Rendering using the same GPU as the trip down to system RAM and then back to someone else's framebuffer would be insanely expensive!

I had AMD HD 5850 Crossfire doing the rendering running with a separate Nvidia card for dedicated PhysX and I never had any problems at al
 
I had AMD HD 5850 Crossfire doing the rendering running with a separate Nvidia card for dedicated PhysX and I never had any problems at al

Really? Nvidia says otherwise here : http://www.nvidia.com/object/physx_faq.html#q8

Can I use an NVIDIA GPU as a PhysX processor and a non-NVIDIA GPU for regular display graphics?
No. There are multiple technical connections between PhysX processing and graphics that require tight collaboration between the two technologies. To deliver a good experience for users, NVIDIA PhysX technology has been fully verified and enabled using only NVIDIA GPUs for graphics.
 
So the PCIe transfers aren't a bottleneck in this case? 2 way transfers over the dedicated PhysX device and then finally sending the results to main GPU for rendering should be quite costly.
 
So the PCIe transfers aren't a bottleneck in this case? 2 way transfers over the dedicated PhysX device and then finally sending the results to main GPU for rendering should be quite costly.

What do you think is being transferred that is so costly? It's not like it's dumping the rendering data across the bus...it's just requesting the Physx computations be done by an external processor.
 
I have implemented my own rigid body simulator so I have a basic understanding of how physics calculations are done. This is how I think will be done on a dedicated physics hardware:

- Copy Buffers from RAM to PhysX device memory : Initial position, velocitiy, mass and other physical properties of all the physical objects to be simulated.

- PhysX device simulates these objects integrating them over the timestep. This results in their updated positions.

- The position buffer is read back from PhysX device to RAM.

- It is then sent to the GPU for rendering.

So there are atleast 2 PCIe transfer per frame for the position buffer. The main GPU would not magically get the results from the PhysX device.

Now I think about it, a position vector is of 12 bytes. To simulate a million particles for example, we need 12 MB of data to be transferred per frame. Assuming the PCie transfer speed is 4GB/s, it takes around 3 ms to transfer one buffer. That results in about 6 ms memory transfer penalty. This is per frame and is very costly in my opinion.

In a SLI configuration, this penalty is not present as each GPU renders its own frame and would be doing both the PhysX and Graphics.
 
Now I think about it, a position vector is of 12 bytes. To simulate a million particles for example, we need 12 MB of data to be transferred per frame. Assuming the PCie transfer speed is 4GB/s, it takes around 3 ms to transfer one buffer. That results in about 6 ms memory transfer penalty. This is per frame and is very costly in my opinion.

In a SLI configuration, this penalty is not present as each GPU renders its own frame and would be doing both the PhysX and Graphics.

From my own experience of running AMD 5850 Crossfire with a dedicated 9800GT for PhysX I'm saying you're opinion is wrong.
 
you are forgetting context switching

PhysX can run on the same GPU that's performing rendering, but the performance will be sub-optimal. This is because the GPU can't run PhysX and render your graphics at the same time: when PhysX calculations need to be performed, a "context switch" from rendering to CUDA must occur, and another context switch is required when switching back to rendering. Although these context switches occur very quickly (on the order of microseconds), they also must occur very frequently, and the context switching time extracts a noticeable performance penalty (one of the performance advantages of NVIDIA's forthcoming "Fermi" architecture is that context switching is much faster). Context switches take so much time that PhysX running on a relatively low-end GPU that's dedicated to the task will handily outperform PhysX running on a high-end GPU that's also performing rendering.

http://benchmarkreviews.com/index.p...sk=view&id=460&Itemid=72&limit=1&limitstart=4

fermi's advantages:rolleyes:
To tackle that latter sort of problem, Fermi has much faster context switching, as well. Nvidia claims context switching is ten times the speed it was on GT200, as low as 10 to 20 microseconds.

http://techreport.com/review/17670/nvidia-fermi-gpu-architecture-revealed/2

some googling shows that kepler cards are better off without a dedicated card. probably with the caveat that game shouldn't be gpu constrained.
 
From my own experience of running AMD 5850 Crossfire with a dedicated 9800GT for PhysX I'm saying you're opinion is wrong.

I would like to see a performance comparison chart. I never said it wasn't possible to do it in real-time but it would definitely be slower than keeping all the buffers on the main GPU itself.
 
you are forgetting context switching

some googling shows that kepler cards are better off without a dedicated card. probably with the caveat that game shouldn't be gpu constrained.

Context switching between CUDA and Graphics should only be a matter of microseconds as you mentioned. In one frame, the context switching only needs to happen once. Do the CUDA stuff first and then render your results. This is a lot less than transferring your results over the PCIe bus though.

One optimization is possible while using a dedicated PhysX device I did not think of before. When you are doing the physics calculations and transferring buffers, the main GPU can start rendering other stuff in the meanwhile. This could potentially save most of the cost of these transfers!
 
Back
Top