GPU Physic vs. CPU Threading

fehu

Veteran
http://arstechnica.com/articles/paedia/cpu/valve-multicore.ars

some time ago Valve hosted a summit where they claimed that the future of physic (among many other game related thing) is to be done only on the cpu, preferably a quad core cpu

this is a u-turn from the tendency to shift all the possible task to other card and particularly on the gpu

the technology that they showed is already working, and if they spent money and time to develop it means that is the best thing that they can do, and I belive that in the valve studios are very capable

so what is the right way of doing this kind of stuff?
cpu may be slower at handling physic, but is what they have done for the last years
gpu may be faster and handle sound too, but if it does so many thing the cpu remains idle and the graphics (the reason why a brought a costly video card...) may start slowing...
 
Anything out of Gabe Newell's mouth should be taken with a grain of salt. Valve is very accomplished at packaging and selling hype over mundane stuff. They may great games, not because of technology, but because of great direction, story, gameplay, and content. But technologically, ho-hum. The source engine IMHO is an ugly hack.
 
I read that article and didn't spot anything revolutionary - it's a nicely packaged and presented set of commonly understood ideas and principles. It is interesting news that they're going down this route - but what they're doing is probably not going to be hugely different to what many other studios are considering or even already working on.

so what is the right way of doing this kind of stuff?
There isn't really a single correct way to do this sort of thing. I think the only agreed consensus is that a multi-threaded is the way forward in at least the short to mid term - exactly how you utilize the multiple cores is open to interpretation.

gpu may be faster and handle sound too, but if it does so many thing the cpu remains idle and the graphics (the reason why a brought a costly video card...) may start slowing...
That would probably be a bad piece of software - even now, where single-core CPU's are still dominant developers will ruthlessly profile their application's performance to determine where the bottlenecks are and attempt to compensate accordingly. If they're burning out the GPU and leaving the CPU idle then they'd better have a very good excuse ;)

Anything out of Gabe Newell's mouth should be taken with a grain of salt. Valve is very accomplished at packaging and selling hype over mundane stuff. They may great games, not because of technology, but because of great direction, story, gameplay, and content. But technologically, ho-hum. The source engine IMHO is an ugly hack.
I concur with this - I've been at odds with plenty of people over HL2/Source before (fantastic artwork, average technology).

hth
Jack
 
In my humble opinion, the CPU is best suited for complex gameplay physics, while the GPU can handle the eyecandy physics.

Imagine a flight simulator. Thousands of parameters influence the flight behaviour, but it's not a massively parallel task suited for a GPU. On the other hand, flying through a cloud or over water could add some whirls, but it's not critical to the pilot that this is computed accurately.

Anyway, physics is overhyped. The moving leaves in Crysis seriously don't need a PhysX card or DirectX 10 card. And dual-core CPU's don't just have double processing power. If previously 10% was allocated for physics, it can now use 11x more.
 
In my humble opinion, the CPU is best suited for complex gameplay physics, while the GPU can handle the eyecandy physics.
I don't think modern GPUs have to be consigned to "eye candy physics". There's no real reason why they can't be just as capable as their CPU counterparts nowadays.

Thousands of parameters influence the flight behaviour, but it's not a massively parallel task suited for a GPU.
True, although as soon as you want to scale up to N planes... Also note that there are probably a lot of parts to the equations of flight that *are* parallelizable, such as solving linear systems and differential equations.

On the other hand, flying through a cloud or over water could add some whirls, but it's not critical to the pilot that this is computed accurately.
I see no reason why the G80 (for example) can't compute something just as accurately as the CPU. Most of this sort of thing doesn't need double precision (Cell SPUs are fp32 for example), and it can be emulated if necessary. Plus I've seen some hintings of GPUs supporting double precision "soon" as well.

Anyways I'm just not seeing why we need to put the hardware in a box anymore. Is there something fundamental about the way that GPUs are designed that you feel is inappropriate for physics calculations? In my experience, physics is actually one of the easier things to map to GPUs...
 
Valve studios capable? Sure, as long as it doesn't includes anything related with netcode, hitboxes and hits registration :rofl: The future of physics processing for regular humans is on GPU imo.
Graphics and phisics should be calculated on same level since they depend on each other heavily.
If they'd wanted to process phisics on second core they could do that long ago. But they haven't.
There is loads of dual core processors in users PC's, for those without it there would simply be option with lower detail phisics that can fit on 1 core total. But they still haven't done that. But they have with GPU's so it's quiet clear where they want to go (today) ;)
 
I don't think modern GPUs have to be consigned to "eye candy physics". There's no real reason why they can't be just as capable as their CPU counterparts nowadays.

The problem is latency. You push data and shader code to the GPU and wait for a result. Sometimes for multiple frames (30-50 ms).

The to-the-metal API/drivers might mitigate this by circumventing the graphics command queues and give higher priority to compute tasks.

Cheers
 
I'm fairly sure both CUDA and CTM fix that problem; I'm pretty sure CUDA can even run at the same time as rendering if you got more than one shader cluster on your GPU, I'm less sure about CTM on R580 though.

Uttar
 
I don't think modern GPUs have to be consigned to "eye candy physics". There's no real reason why they can't be just as capable as their CPU counterparts nowadays.
As far as I know they still are not so hot for working with pointers, small data types (booleans, characters), lots of branching, and megabytes of instructions. The instruction latency on a CPU is only a few clock cycles (at ~3 GHz), while on a GPU it's tens of clock cycles (at ~1 GHz). And with a CPU's CISC instructions and out-of-order execution it can actually handle multiple operations per clock cycle. Thanks to the cache it doesn't have to wait for data very long either (try disabling it in the BIOS). Some algorithms might easily take over 100 times longer to execute on a GPU, and that's without the communication and scheduling overhead.

The only way the GPU can compete is when every clock cycle it has independent work for each of its pipelines. Then it's one huge number crunching machine. Graphics is one of these embarrassingly parallel tasks. But outside of eyecandy physics there's not much game code I would consider running on a CPU.
True, although as soon as you want to scale up to N planes... Also note that there are probably a lot of parts to the equations of flight that *are* parallelizable, such as solving linear systems and differential equations.
For 1,000 planes I might start considering to use the GPU. But nowadays it's really adequate to compute the accurate flight model for the current plane, and let the rest follow some predefined route. The linear and differential systems are most likely still of quite low degree, so by the time the data is sent to the GPU, it could already have been processed efficiently with SSE. For moderately parallel algorithms it's still non-trivial to map it to the number cruncher. Beware of Amdahl's law.
I see no reason why the G80 (for example) can't compute something just as accurately as the CPU. Most of this sort of thing doesn't need double precision (Cell SPUs are fp32 for example), and it can be emulated if necessary. Plus I've seen some hintings of GPUs supporting double precision "soon" as well.
I wansn't talking about numeric precision. I meant that for something like swirling smoke it doesn't matter much if an accurate physical model is used, or some good looking approximation. The computations for the approximation could be so simple that the CPU is good enough for this parallel task. One core still offers several GFLOPS, plus all the pointer arithmetic and branching you need.
Anyways I'm just not seeing why we need to put the hardware in a box anymore. Is there something fundamental about the way that GPUs are designed that you feel is inappropriate for physics calculations? In my experience, physics is actually one of the easier things to map to GPUs...
The embarrassingly parallel eyecandy physics are definitely well fit for running on the GPU.
 
As far as I know they still are not so hot for working with pointers, small data types (booleans, characters), lots of branching, and megabytes of instructions.
They are becoming increasingly so, and even the previous generation did it pretty well. Plus I'm not convinced that these things are particularly ubiquitous in physics - I'm no expert, but I've done some work in the area.

Some algorithms might easily take over 100 times longer to execute on a GPU, and that's without the communication and scheduling overhead.
One could certainly hand-code an algorithm that runs terribly on a parallel machine (and vice versa might I add)... I'm not debating that. It has just been my experience that physics is one of the easier things to parallelize.

But outside of eyecandy physics there's not much game code I would consider running on a CPU.
I simply disagree. I guess we're going to have to start discussing the particular physics algorithms at use here to continue the debate.

The linear and differential systems are most likely still of quite low degree, so by the time the data is sent to the GPU, it could already have been processed efficiently with SSE.
All you're saying is that the CPU is fine for lesser workloads. That's cool with me, but then if it's not a problem on the CPU, of course we're not going to bother moving it elsewhere (unless we're on the Cell, where the main CPU is pretty weak).

For moderately parallel algorithms it's still non-trivial to map it to the number cruncher. Beware of Amdahl's law.
So find a more parallel algorithm :) Like I said, we're going to have to start talking specifics here I suppose. Amdahl's law - while it holds some truth - is increasingly being used as an excuse for a lack of algorithmic innovation. It's getting to the point where I'm going to have to see proof (i.e. theoretical computer science) that a problem is simply impossible to parallelize, and I can count such problems/proofs on one hand.

The computations for the approximation could be so simple that the CPU is good enough for this parallel task.
I guess I just don't see what you mean... sure if things are simple and cheap, don't bother. But this is beyond the realm of whether or not scaling up with CPU-like or GPU-like architectures is going to be more useful for physics in the long run.

The embarrassingly parallel eyecandy physics are definitely well fit for running on the GPU.
I think we're shown that we can do much more than "embarrassingly parallel" problems on GPUs, and still get hefty speedups. Sorting, BLAS, FFT, neural networks, etc. are not what I'd call "embarrassingly parallel", and yet we can get order of magnitude improvements by running them on the GPU. There are many more complex examples at <a href="http://www.gpgpu.org/">GPGPU.org</a>.

Anyways I'd resist the tendency to dismiss parallelism simply because it involves rethinking some algorithms and code. In my experience it's quite possible to find parallel algorithms for even the most stubborn "scalar" problems.
 
What I was actually trying to say is that physics is a hype. More specifically, throwing dedicated hardware at it is a hype.

For the last decades, games were perfectly capable of running physics calculations on the CPU. Now that gameplay is not so renewing any more, the silver bullet is spending GFLOPS of computing power on physics, using dedicated hardware? Also, dual-core CPUs already allow to do vastly more physics calculations, but still there are people willing to pay 300 $ for a dedicated card... Look at a game like Half-Life 2 (or imagine its successor). Where would you spend GFLOPS of physics calculations on? Stacking a thousand crates to climb through the window instead of three?

Physics is a hype that sells physics cards and recently graphics cards. That's all. I'm eagerly awaiting news about a revolutionary game where complex physics is a primary factor of gameplay. But so far I haven't seen anything worth investing in dedicated hardware just for the physics.
 
The best example IMHO is the SmithWaterman/HMMR style algorithms being run on the GPU. These problems are not "embarassingly parallel", in fact, the dynamic programming nature and recursion makes these problems tricky to extract top performance from the GPU.

In Smith Waterman for example, the value of a cell at (X,Y) is determined by the value at (X-1,Y), (X-1, Y-1), and (X,Y-1) (the cells to left, upper-left, and top). That means, you can't compute (X,Y) until you compute those, and the dependency chain spreads out across the problem space.

The traditional "parallelized" approach for SW is to sweep out anti-diagonals from the top left corner, e.g. draw a line from bottom left to upper right of grid. This is because the element at (X,Y) and (X+1, Y-1) can be computed with no interdependencies based only on the the last anti-diagonal data.

However, the anti-diagonal approach doesn't maximize efficiency, so other tricks are needed to do many subproblems in parallel.

p.s. physics. With the exception of collision detection and updating datastructures with new force constraints, GPUs are far faster than CPUs at doing the other steps of the physics calculation, such as solving constraint matrices.
 
What I was actually trying to say is that physics is a hype. More specifically, throwing dedicated hardware at it is a hype.
I agree with that. Sorry for my misinterpretation of what you were saying :(

Physics is a hype that sells physics cards and recently graphics cards.
I can't argue with that. I'll leave the usefulness of physics for game-play out of this as I'm certainly no expert on game design. All I was trying to say is that if you have a big workload of physics - for whatever reason - it's quite reasonable to do it on the GPU. I guess we were going off on different tangents...
 
Graphics and phisics should be calculated on same level since they depend on each other heavily.

The issue in game development and GPGPU game usage must surely be the lowest common denominator factor, the best selling games aren't those that demand a high end graphics card, heck they are the ones that'll run on integrated and are fun to play.

So based on that fact, dual core will become common before graphics cards with power to spare for physics calculations, and therefore that is the best way the target for developers who want to sell to the widest audience to aim for.
 
The best example IMHO is the SmithWaterman/HMMR style algorithms being run on the GPU.
For purely scientific physics calculations, I definitely agree that GPGPU can offer a significant advantage. But we're talking about games here. I can't imagine a game really needing Smith-Waterman or anything alike.
These problems are not "embarassingly parallel", in fact, the dynamic programming nature and recursion makes these problems tricky to extract top performance from the GPU.
That's exactly what I'm saying. GPUs are only really good for the "embarassingly parallel" physics problems. Not Smith-Waterman, but typically eye candy related things. Everything else still runs best on the CPU.

So for a moderate -atbitrarily complex- physics calculations load, a multi-core CPU can handle it fine. For a heavy -embarassingly parallel- physics calculation load, the GPU can be used. But in my opinion you're really doing something wrong if you need the GPU for game physics. Even a ground-breaking game like Crysis should be happy with a dual-core for all physics. Heck, the A.I. load is probably higher (and it plays a bigger role in gameplay experience). Should we be hyping dedicated hardware for A.I. now?
 
That's exactly what I'm saying. GPUs are only really good for the "embarrassingly parallel" physics problems. Not Smith-Waterman, but typically eye candy related things. Everything else still runs best on the CPU.
Correct me if I'm wrong, but I think the example that DemoCoder gave was to show that GPUs can actually do a lot even with problems that *aren't* "embarrassingly parallel"... indeed even with problems that seem like they simply aren't going to be parallelisable at all.

From personal experience, I simply disagree that GPUs (and other parallel machines) are suitable only for a small subset of "embarrassingly parallel" problems. Those are certainly easily mapped to these machines, but as I mentioned *most* problems can be solved efficiently by a parallel machine.

Then the questions exists: assuming we're working in an environment with both processors readily available, and that the implementations of both are done (or equally easy to do), why *not* run it on the GPU? You can argue with those assumptions in the context of games of course (and I'd agree), but on a theoretical level, I see no reason in the long run to continue to prefer CPU-like architectures for tasks that can clearly be run as efficiently (if not more so) on a highly parallel machine.

Getting back to the topic, it's clear to me that "CPU Threading" in its current form (synchronized caches, lock/unlock nonsense, task-level parallelism) simply will not scale up to the hundreds and thousands of processors that we're going to have available in the near future. Data parallelism will, so I think that's a good place to focus research and development.
 
Even a ground-breaking game like Crysis should be happy with a dual-core for all physics. Heck, the A.I. load is probably higher (and it plays a bigger role in gameplay experience). Should we be hyping dedicated hardware for A.I. now?

What? Like this:
http://www.aiseek.com

I see no reason in the long run to continue to prefer CPU-like architectures for tasks that can clearly be run as efficiently (if not more so) on a highly parallel machine.

That's probably why Intel and AMD are planning on adding large parallel processing elements to future CPUs.
 
Correct me if I'm wrong, but I think the example that DemoCoder gave was to show that GPUs can actually do a lot even with problems that *aren't* "embarrassingly parallel"... indeed even with problems that seem like they simply aren't going to be parallelisable at all.
First of all, in my humble opinion the Smith-Waterman example was a bad example, in the context of games. Sure it's still faster on the whole GPU, but I want to keep 90% of its processing power for graphics! The only physics algorithms that are efficient enough to only consume a minor fraction of processing power, are the embarrassingly parallel ones.
Then the questions exists: assuming we're working in an environment with both processors readily available...
Right there lies the problem. As much as game developers and GPGPU enthousiasts would like it, not everyone has a G80. And as Thorburn already noted, not everyone will have one (or something similar) for a very long time, but multi-core is already mainstream. You can get a Pentium D for less than 100 $, and it's only a 40 $ upgrade on Dell (while a 7300LE costs 50 $ and a X1300 Pro costs 100 $ extra). So it makes a lot more sense for game developers to do the physics calculations on the extra core then on the GPU. Games that are meant to run on systems sold today are almost guaranteed to run on a dual-core CPU. But the GPU could be anything from a G80 to a X3000. That's anything between 500 and 20 GFLOPS, compared to 10 GFLOPS for one core of the cheap Pentium D. If you want to do the same thing on the GPU and keep 90% for graphics, you need at least 100 GFLOPS. And then we're not even doing anything new and we're assuming that the algorithm is perfectly parallelizable for the GPU...
...and that the implementations of both are done (or equally easy to do), why *not* run it on the GPU? You can argue with those assumptions in the context of games of course (and I'd agree), but on a theoretical level, I see no reason in the long run to continue to prefer CPU-like architectures for tasks that can clearly be run as efficiently (if not more so) on a highly parallel machine.
I don't believe that in the long run mainstream GPUs will increase in performance faster than CPUs. CPUs have only just started to exploit thread parallelism, and the number of cores will double when transistor density doubles. GPUs are bound by the same advance in technology. So there's no reason to believe that game developers will be more inclined to use the GPU for physics in the future than they are today.
Getting back to the topic, it's clear to me that "CPU Threading" in its current form (synchronized caches, lock/unlock nonsense, task-level parallelism) simply will not scale up to the hundreds and thousands of processors that we're going to have available in the near future. Data parallelism will, so I think that's a good place to focus research and development.
Exactly. CPUs can still make a lot of architectural changes. By sacrifying cache area, simplifying out-of-order execution and trading branch prediction for hardware thread scheduling, there can be more functional units and peak performance goes up a lot. This way the CPU will look a lot more like a GPU, and the performance gap decreases. GPUs don't have that architectural freedom. All they can do is try to cram more functional units on the chip when the transistor budget increases.
 
Right there lies the problem. As much as game developers and GPGPU enthousiasts would like it, not everyone has a G80.
As I mentioned in the next sentence, I agree. I'm trying to keep this conversation theoretical, but I'm perfectly aware that practical solutions will have wildly different criteria (certainly in the short term). I'm comfortable with that.

This way the CPU will look a lot more like a GPU, and the performance gap decreases. GPUs don't have that architectural freedom. All they can do is try to cram more functional units on the chip when the transistor budget increases.
Oh I don't care whether you call it a CPU or a GPU... that's besides the point. CPUs will probably become more "GPU-like" in the ways that you mention, and GPUs are always evolving to become more programmable... the two will clearly become closer and closer together, perhaps negating the need for one or the other at some point. That's still besides the point that I was trying to make.

Maybe the confusion lies in the terms GPU and CPU... I'm merely using them to describe a current architecture. Those architectures and labels can change - and they will.

All I'm saying is that the current architecture of GPUs is quite suitable for physics work. Whether or not CPUs will evolve to become more like this, whether we need the extra power, whether CPUs or GPUs will take over the world... none of those are for me to say. I have some predictions of my own, but they are beyond the scope of this conversation really. Sorry if I miscommunicated my point. I think we're actually in agreement...
 
Anyway, physics is overhyped. The moving leaves in Crysis seriously don't need a PhysX card or DirectX 10 card. And dual-core CPU's don't just have double processing power. If previously 10% was allocated for physics, it can now use 11x more.
IMO long shaders are overhyped, penetration during animation is more of an annoyance to me than poor skin tone.
 
Back
Top