bit-tech Richard Huddy Interview - good Read

Is he saying anything more than that you're able to run multiple instances of the api in a multithreaded app ;)
At least it seems to confirm that the developer has to do all the work for a multithreaded cpu implementation on top of doing the gpu-physx part.

We continue to invest substantial resources into improving PhysX support on ALL platforms--not just for those supporting GPU acceleration.

But yet they haven't made the most obvious optimization for one of the most common platforms.
 
It's true, that Vantage's CPU test uses all CPU cores, which are available: For every core it adds more "stuff".
The question is: My pc has a quad-core CPU. If Vantage's CPU test runs in a "dual-core mode", are then all four cores in use or just two?
 

Just came across that. Seems to corroborate what ChrisRay was saying earlier. Demirug made an excellent point as well - all the complaining about PhysX not maximizing CPU cores seems to completely ignore the fact that custom built physics implementations don't either. Maybe there's something to all this besides Nvidia's evil intentions to cripple CPU performance?
 
all the complaining about PhysX not maximizing CPU cores seems to completely ignore the fact that custom built physics implementations don't either. Maybe there's something to all this besides Nvidia's evil intentions to cripple CPU performance?

I thought the [H] video shown upstream was indeed stressing all of the CPU? "Using the CPU? Yes, it's definitely using all cores of the CPU"

One should only have to run Ghostbusters and have yes or no on physx utilization. My bet would be that Ghostbusters on PC has much better CPU utilization than the aforementioned Batman and Sacred.

Unless, of course.. the latter were designed to only run with very low CPU utilization on Radeon powered machines. We're not even talking about multicore utilization here, we're at a point where the PhysX cannot even stress one single core.
 
Last edited by a moderator:
[H] is using task manager which is hardly an accurate way to tell how much an application is using multiple CPU cores (or how much work the CPU is doing).

The people who wrote and develop the API are telling their story and it seems to be confirmed by people who have actually used the SDK. Anything else amounts to guesswork.
 
The story being that PhysX can't (and never could) accelerate a single scene with multiple threads on the x86. You can create multiple threads but unless you want to simulate non interacting systems it's useless ...
 
Just came across that. Seems to corroborate what ChrisRay was saying earlier. Demirug made an excellent point as well - all the complaining about PhysX not maximizing CPU cores seems to completely ignore the fact that custom built physics implementations don't either. Maybe there's something to all this besides Nvidia's evil intentions to cripple CPU performance?
The difference is that with say Batman we have proof positive that the operations PhysX are doing are bottlenecking as long as it's running slower as with PhysX on the GPU and not maxing out the CPUs. We know it's relatively latency insensitive and trivially parallelizeable (it can run on a GPU after all). We don't know what the bottlenecks in those other games are.
 
Tomshardware got a bit puzzled on PhysX on CPU in their Batman review. Weird. Put software PhysX on high and the CPU utilization decreases instead of increases. Shouldn't PhysX use the CPU more and not less when you put PhysX on high on the CPU?:

Batman%20PhysX%20Soft%20CPU.png

Why is CPU utilization lower when PhysX is enabled? And why is CPU utilization so low at all? If the CPU is bottlenecking the PhysX calculations, shouldn't the increased load be pushing the CPU to its limits?
We're trying to get clarification from the developers at Rocksteady about this phenomenon--it's almost as though the game is artificially capping performance at a set level, and is then using only the CPU resources it needs to reach that level.
http://www.tomshardware.com/reviews/batman-arkham-asylum,2465-10.html
 
Last edited by a moderator:
We don't know what the bottlenecks in those other games are.

Yes, that's exactly my point. Why is it that PhysX games qualify for such scrutiny yet other physics engines don't? There also aren't any examples of other physics engines doing particle simulations on the CPU with good performance.

Tomshardware got a bit puzzled on PhysX on CPU in their Batman review. Weird. Put software PhysX on high and the CPU utilization decreases instead of increases. Shouldn't PhysX use the CPU more and not less when you put PhysX on high on the CPU?:

Yeah that's really weird. I'm not one for conspiracy theories so it could be something as simple as PhysX putting a higher load on the memory subsystem - after all Nvidia did mention that SPH is much faster on Fermi due to better caching. Also, as Sontin pointed out the PhysX thread(s) could be starving other threads that are normally free to run.
 
Yes, that's exactly my point. Why is it that PhysX games qualify for such scrutiny yet other physics engines don't? There also aren't any examples of other physics engines doing particle simulations on the CPU with good performance.

I suspect its due to the fact that PhysX can be turned on/off and there are multiple levels of implementation which can be activated/deactivated in the games settings. I have not personally seen the ability to turn off Havok physics in any games, so I guess thats the reason.

Since physics is another one of those ridiculously parellel workloads it doesn't make sense for it to not be implemented on multiple cores and the perceived conflict of interest between Nvidias GPUs and CPU implementation simply adds fuel to the fire.
 
Yeah that's really weird. I'm not one for conspiracy theories so it could be something as simple as PhysX putting a higher load on the memory subsystem - after all Nvidia did mention that SPH is much faster on Fermi due to better caching. Also, as Sontin pointed out the PhysX thread(s) could be starving other threads that are normally free to run.

Both starving other threads as Sontin argued and an artificial cap as Tomshardware argued are plausible explainations. If you see the results on the performance, the 4870 has the same framerate going from 1280x1024 to 2560x1600. Normally, this would be a huge performance hit, but in Batman AA, the FPS is the same regardless of resolution:

Batman%20PxHigh%201280.png

Batman%20PxHigh%202560.png


That makes Batman even weirder. Its the only game I've seen where there is no performance hit going to 2560x1600. But, if Sontin is right and its the threads that are starved, shouldn't that effect the framerates more on a larger resolution?
 
Since physics is another one of those ridiculously parellel workloads it doesn't make sense for it to not be implemented on multiple cores and the perceived conflict of interest between Nvidias GPUs and CPU implementation simply adds fuel to the fire.

Agreed, I just don't get why the benchmark for PhysX games is higher than for other games. I can't reconcile the uproar over PhysX with the fact that most entire game engines (including physics) do not scale past two cores. Take Crysis for example:

crysisb.png


http://www.guru3d.com/article/cpu-scaling-in-games-with-quad-core-processors/9
 
Agreed, I just don't get why the benchmark for PhysX games is higher than for other games. I can't reconcile the uproar over PhysX with the fact that most entire game engines (including physics) do not scale past two cores. Take Crysis for example:

Its an unfortunate situation, im trying to give Nvidia the benefit of the doubt here but theres a lot of doubt, especially when a game with so much cross vendor controversy like Batman: AA shows competing hardware as 100% PhysX limited.

Game engines in entirety may not scale particularly well over more than two cores but many individual parts of the game engine can be scaled relatively easily over more than two cores and physics is one of them.

Amdahl's Law dictates that a game engine can only scale until its limited by its most serial component. In this case its an incredibly parellel workload which is apparantly triggering Amdahl's law on the systems with AMD GPUs and PhysX. This absolutely does not make sense.
 
But does that look like a well threaded implementation?
Granted, I haven't played either Batman nor Dark Void, but it does seem unlikely that any one of them would have a single physics task that would maximise any one core on the low setting. Assuming that the developers have split the physics tasks into reasonably small workloads, why aren't those then split more evenly amongst the available cores?

Is this something that can easily be set up by specifying how many threads you want to run the tasks on, and the PhysX library handling the rest, do you need to manage the PhysX threads yourself to balance them, or do you need to run multiple contexts of PhysX with all the overhead that would include?
If it is either of the first two options, then it would point to the devs and how they have chosen to implement/use the PhysX physics. If it is the latter, then it would point to the library having a sub-par multicore CPU implementation.

It would also be very interesting to learn more about how the PS3 implementation is done, if (and how) it uses the SPEs, how it balances between the work, to compare with how it runs on the CPU on a PC. If the PS3 solution has a good solution for how to assign the jobs to SPEs, there is little reason why a similar system isn't available on the CPU (if it isn't).

Do anyone information they can share which will help to shed light on these questions?
 
Yeah that's really weird. I'm not one for conspiracy theories so it could be something as simple as PhysX putting a higher load on the memory subsystem - after all Nvidia did mention that SPH is much faster on Fermi due to better caching. Also, as Sontin pointed out the PhysX thread(s) could be starving other threads that are normally free to run.
Easy enough to test, although I put the odds somewhere between slim and none, underclock the memory (without SIMD it's not going to be the cache). Don't have the game though.
 
One thing people also need to remember when comapring GPU to CPU physics is that all the rigid body collisions with destructable architechture I have seen on the CPU in games result in where the fragments dissapear after 5-10 seconds.
From Ghostbusters to Force Unleashed.

So not only are you getting lower performance...you also get less fidelity and immersion with CPU-physics.

An analogy would be that GPU physics is like running with AA and AF and getting +30 FPS, while CPU physics is like no AA and AF and yet you still only get ~3 FPS when doing the same task.
 
Its an unfortunate situation, im trying to give Nvidia the benefit of the doubt here but theres a lot of doubt, especially when a game with so much cross vendor controversy like Batman: AA shows competing hardware as 100% PhysX limited.

It's worse than that. Toss an NV PhysX gpu in that competing vendor rig and PhysX will run fine, but AA is still disabled unless you hack the inis to fool the proggy into thinking it's an NV card and then, miraculously, AA works.

The "Nvidia workaround" was to force AA through CCC which gave very slow AA compared to the in-game MSAA. Nice trick to win the benchies.
 
Back
Top