PDA

View Full Version : Some thoughts on the physics situation


JF_Aidan_Pryde
27-Jun-2006, 19:13
It's been interesting to read about people's thoughts about the place of physics processors. I haven't really followed Ageia since their announcement in March. Just recently I was surprised to find that they've hard launched. So I decided to read up on this whole physics business. I think the situation is far more remarkable than what has been commented so far.

First, the debate about multi-core CPUs and GPUs doing physics. There is a fundamental problem here and that is currently both are saturated with work. The GPU is fully utilised in its current capacity; it has no room to do physics, irrespective of whether it can do so. (Multi-GPUs are out for the same reason if you believe in driving very high-res gaming). In a well-designed multi-threaded game engine, the CPU should also be at near saturation. So likewise, they are also unavailable to do serious physics.

Second, the question of whether there is a need for 'serious physics', the kind that would require additional hardware. The way I think about this is that physics is really visual in its end result. So if there has always been a need for better graphics, and physics is essentially visual, then sure, there is a need for better physics. The difference is that physics is a different kind of graphics, solvable with a different kind of architecture, and that creates motivation for a separate ASIC.

It's hard to imagine that Ageia will, during the foreseeable future, profit from its chip sales. I don’t' know what kind of volume they need to be sustainable, maybe someone in the know can chip in here. But it's very interesting to think about, from a marketing perspective.

It will take uncommon conviction to pay $300 for this product with the current level of game support. A video card is needed for 2D. The same can't be said for the PPU.
That said 3dfx was also selling $300 boards with likewise rudimentary support when it launched. Many analysts predicted that it would fail, citing its lack of 2D, making it not a 'compulsory' component of the computer. The PhysX is kind of in the same position. But $300 in 2006 terms is actually cheaper than 3dfx's Voodoo in 1996.

The comparison with the Voodoo, however, stops about there. The Voodoo made games run faster, the PhysX seems to do the opposite. The Voodoo provided a significant leap in visual quality, the PhysX is currently limited to moving a bunch of flying boxes. Perhaps we'll see better titles in the future.

But the interesting thing is that despite all this Ageia has managed to win OEM deals. Anyone who followed 3dfx (especially their downfall) will recall how poorly they did with OEMs. So the fact that Dell is shipping PhysX is pretty impressive.

Some closing thoughts about what NVIDIA and ATI might do. In both companies, the belief that GPU = God is very strong. David Kirk will gladly tell you why all other ASICs other than GPUs are no good, and the problems that they try to solve, be it raytracing or physics, are better solved on the GPU. This is of course, just marketing. The only reason he's saying that is because the GPU is already there and that is has some programmability. The GPU cannot hope to do both graphics and physics and do both well. So if there is demand for dedicated physics, then neither graphics company should hope to solve it by releasing 'support' with their current GPU architectures. That strategy may provide a stopgap measure but it will not make them competitive with Ageia.

The way I see it now (early morning, subject to change when I sober up), both NVIDIA and ATI want and will get into the Physics ASIC business. After all, both are looking for new revenue streams, both are desperate to differentiate their products and both are constantly looking for reasons to make people upgrade. This is why they have developer support, to help make games that will force people to upgrade their graphics cards. But while graphics has become much harder to differentiate, physics is still in its infancy.

So I think in the end it's not about whether Ageia's card has a place but whether their engineers can defend their turf against the onslaught of engineering resources of NVIDIA and ATI. As it is, it's hard to imagine that Ageia can out do them both.

_xxx_
27-Jun-2006, 20:21
One little point you missed: in a well designed, multithreaded game engine, physics is just one of the well designed threads.

There is no future for any kind of ASIC there nor will it ever be viable, given that everything but the DVD-drives ;) will be capable of doing some kind of "good enough" physics as it seems.

Seriously, pretty much every kind of multi CPU/GPU environment will be able to run physics fairly well. And AGEIA is here to stay, but as a pure software supplier as I see it.

EDIT: and this topic has been beaten to death already, why didn't you just post it in one of the active threads?

Nick
27-Jun-2006, 23:38
Although it has already been beaten to death, I'll try to give some direct opinions...
First, the debate about multi-core CPUs and GPUs doing physics. There is a fundamental problem here and that is currently both are saturated with work.
No game as of yet fully uses my dual-core Athlon 64 X2. Furthermore, Intel's Core 2 Duo will bring a whole new level of floating-point performance, and will bring dual-core to the mainstream market with very agressive prices. Furthermore, Direct3D 10 will lower the CPU overhead.
The GPU is fully utilised in its current capacity; it has no room to do physics, irrespective of whether it can do so.
GPUs are constantly extended with features that actually lower FPS (anti-aliasing, HDR, etc). Yet, people do activate them if it improves gameplay. Likewise, I believe that they are willing to sacrifice a few percent performance for physics if it improves the game experience. Your mileage may vary, but even then it's more cost effective to invest your money into a better graphics card than in a PhysX card. The GFLOPS per dollar is way higher. With Direct3D 10 that processing power can be used for physics processing very efficiently.

Also think of it this way: Assume a graphics card has to sacrifice 10% of its performance for peak physics processing (i.e. equivalent to a PhysX card). Then, if a game requires twice as much physics processing, using a PhysX card would half your framerate, while with a graphics card it would only go from 90% to 80%. Adding a third chip means adding a third potential bottleneck...
The difference is that physics is a different kind of graphics, solvable with a different kind of architecture, and that creates motivation for a separate ASIC.
Physics only need additions and multiplications. The same we find on CPUs (which we've been using for physics for decades), and GPUs. I higly doubt there is any operation on the PhysX chip that we don't have on a CPU or GPU. So it doesn't justify a separate ASIC with a 'specialized' architecture.
So I think in the end it's not about whether Ageia's card has a place but whether their engineers can defend their turf against the onslaught of engineering resources of NVIDIA and ATI. As it is, it's hard to imagine that Ageia can out do them both.
Don't forget about Intel and AMD. Games won't instantly use more physics processing than a CPU can handle. Don't underestimate their efficiency for physics processing either.

Nick
27-Jun-2006, 23:49
And AGEIA is here to stay, but as a pure software supplier as I see it.
If they're smart, they should already have teamed up with Microsoft to provide the software implementation of the DirectX Physics API they're working on. On the other hand, Microsoft probably rather pays just once for it instead of paying royalties for an essentially free API. And I don't think AGEIA's current software implementation matches up to Havok's. Also, Havok seems to have some good deals in the console market. Maybe AGEIA is simply doomed. :p

JF_Aidan_Pryde
28-Jun-2006, 04:32
Although it has already been beaten to death, I'll try to give some direct opinions...

GPUs are constantly extended with features that actually lower FPS (anti-aliasing, HDR, etc). Yet, people do activate them if it improves gameplay. Likewise, I believe that they are willing to sacrifice a few percent performance for physics if it improves the game experience.

The difference is that NV and ATI had to put in extra hardware to allow for HDR and AA. They certainly didn't just start re-using existing resources to do them. For early SSAA, they tried to do that but that was also why it wasn't competative. So if NV and ATI want to do competative physics, they have to re-architect their GPU or use another chip.


Also think of it this way: Assume a graphics card has to sacrifice 10% of its performance for peak physics processing (i.e. equivalent to a PhysX card).
I know this is what ATI has been saying but I can't see this being workable in a single card scenario. What they are saying, that Ageia's 125M ASIC with 128MB of dedicated memory is about as useful as 10% of the processing time on a mid-end GPU, just doesn't add up. Either ATI is just marketing (likely) or Ageia's implementation is botched. If the PhysX truly is that insignificant, then it has no place in the market.


Physics only need additions and multiplications. The same we find on CPUs (which we've been using for physics for decades), and GPUs. I higly doubt there is any operation on the PhysX chip that we don't have on a CPU or GPU. So it doesn't justify a separate ASIC with a 'specialized' architecture.

Prior to 3dfx, we've been using CPU for 3D for decades too. 3D graphics is likewise just a bunch of MADs. I agree that Ageia is stuck in a really tough spot; on one hand they have CPUs with full programmability increasing in the number of cores pushing them aside. On the other, GPUs with ever increasing programmability and now cores are pushing them from the other side. If the GPU just modestly re-align itself for better physics, which is not quite different from adding hardware to support T/L, shaders or HDR, then the motivation for a separate card will be seriously compromised.

_xxx_
28-Jun-2006, 07:51
JF, your understanding of how stuff works is quite a bit lacking. The post above makes little sense.

Chalnoth
28-Jun-2006, 07:58
The problem with your analysis that multi-core CPU's will be used up and unavailable for physics is that for multi-core CPU's to be well-used, they need workloads that are amenable to parallelization. There aren't all that many things that are easy to parallelize, but physics is one of them. Thus, I claim that a game that does the physics on the CPU will make better use of multi-core than a game that does not.

Nick
28-Jun-2006, 11:12
The difference is that NV and ATI had to put in extra hardware to allow for HDR and AA.
Doesn't matter. I could have just said "more realistic effects with longer shaders" for the same argument. If people get the option to improve gameplay, they will enable it even if it sacrifices performance (up to a certain point). This counts for both longer shaders and physics. To crank up the performance they're much better off investing in the CPU and/or GPU than in a PhysX card.
So if NV and ATI want to do competative physics, they have to re-architect their GPU or use another chip.
Why? There isn't anything vital missing on the GPU to do physics efficiently. Certainly not for Direct3D 10 cards, which support integer shader operations and direct unsampled memory access. And a unified architecture can balance different tasks.
What they are saying, that Ageia's 125M ASIC with 128MB of dedicated memory is about as useful as 10% of the processing time on a mid-end GPU, just doesn't add up.
ATI and NVIDIA have been designing chips for over a decade and use the latest 90 nm technology. AGEIA just slapped something together to get it on the market, using 130 nm technology. The numbers look reasonably good but it wouldn't have made any difference if they used 64 MB or 256 MB of memory, and it's not hard to reach 125 million transistors if you waste it on inefficient SIMD units and cache-like structures.
If the PhysX truly is that insignificant, then it has no place in the market.
First adopters will buy anything as long as the marketing can convince them. And I have to say AGEIA's marketing is pretty impressive. Unfortunately (for them), it's not going to last. They had some pretty bad reviews and won't be ready for the mainstream market any time soon.
Prior to 3dfx, we've been using CPU for 3D for decades too. 3D graphics is likewise just a bunch of MADs.
Wrong. You can't compare GPU versus CPU with PPU versus CPU. I've been working on software renderers for many years now, and it has become very clear that CPUs are quite capable of arithmetic shader instructions (MAD), but very bad at texture sampling (TEX). A texture sampler on the GPU can do one sample per clock cycle, while a CPU needs tens of clock cycles. The TEX instruction has a very long latency, but it's fully pipelined. A CPU has to emulate this with tens of instructions and can only continue after they've all finished.

Physics is a totally different story. It's plain math and there is no magic instruction like TEX that justifies using dedicated hardware. It needs mostly standard floating-point additions and multiplications. And a 3 GHz dual-core CPU is plenty good at that. GFLOPS are a direct indication of physics processing capability, and Core 2 Duo should beat PhysX P1 with just one core. Furthermore, don't underestimate CPU efficiency. Cache latencies are incredibly low and tuned SSE code can get really close to the theorectical GFLOPS. Last but not least communication between cores is way faster than communication with an add-on card.

The laws of physics are against AGEIA. ;)

Parousia
28-Jun-2006, 12:20
The way I think about this is that physics is really visual in its end result. So if there has always been a need for better graphics, and physics is essentially visual, then sure, there is a need for better physics.
I can see your point but to me somehow graphics is distinctly different from physics.

Mate Kovacs
28-Jun-2006, 19:55
Physics only need additions and multiplications.
Well, as long as you're talking about unconstrained motion. E.g. collision detection and response would be a bit problematic with just those. :)
I higly doubt there is any operation on the PhysX chip that we don't have on a CPU or GPU.
Even if that's true, it doesn't mean that there isn't any operation that we don't have on a CPU or GPU, though it would be useful for physics.

OpenGL guy
28-Jun-2006, 20:30
You can't compare GPU versus CPU with PPU versus CPU. I've been working on software renderers for many years now, and it has become very clear that CPUs are quite capable of arithmetic shader instructions (MAD), but very bad at texture sampling (TEX). A texture sampler on the GPU can do one sample per clock cycle, while a CPU needs tens of clock cycles. The TEX instruction has a very long latency, but it's fully pipelined. A CPU has to emulate this with tens of instructions and can only continue after they've all finished.
It's actually even worse for the CPU as GPUs are doing bilinear (or better) filtering in a single cycle :) So that's 4 (or more) samples per clock per pixel with filtering.

Geo
29-Jun-2006, 00:53
Which gpu is that doing better than bilinear filtering per cycle?

OpenGL guy
29-Jun-2006, 01:05
Which gpu is that doing better than bilinear filtering per cycle?
S3 had single cycle trilinear on a few chips and I believe the original GeForce did as well. PowerVR had some chips that could do single cycle trilinear for compressed textures.

Blazkowicz
29-Jun-2006, 01:20
CPUs will get more cores and SPE-like units, and GPU will be even bigger SIMD monsters, much more suited to physics with the DX10 generation I think (multitasking, data output mechanism, FP32 rendertargets..)

PPU chips are doomed, or can only exist for one generation.

Demirug
29-Jun-2006, 07:35
PPU chips are doomed, or can only exist for one generation.

If a PPU can offer you more (useable) calculation power for less money than a GPU at the same job they can survive if they a smart enough to support common interfaces. But I don’t believe that a future PhysX chip can challenge a future GPU.

_xxx_
29-Jun-2006, 08:33
Even if the PPU had ten times the calculating power it wouldn't help. The reason is the limitations of the PCI bus, introduced latency etc. In order for it to work as supposed, there would have to be a much faster connection between the PPU and the rest of the system as well as some tweaks in the gfx-driver to help that. I don't think nV and ATI will ever "help" there since they compete against it.

Nick
29-Jun-2006, 15:05
Well, as long as you're talking about unconstrained motion. E.g. collision detection and response would be a bit problematic with just those. :)
I meant to write "mostly" additions and multiplications. But I thought it was clear...
Even if that's true, it doesn't mean that there isn't any operation that we don't have on a CPU or GPU, though it would be useful for physics.
If you know any, please enlighten me (note that there has to be a significant advantage, and it has to be used a lot, just like TEX on a GPU)!

AlStrong
29-Jun-2006, 15:46
Even if the PPU had ten times the calculating power it wouldn't help. The reason is the limitations of the PCI bus, introduced latency etc. In order for it to work as supposed, there would have to be a much faster connection between the PPU and the rest of the system as well as some tweaks in the gfx-driver to help that. I don't think nV and ATI will ever "help" there since they compete against it.


APP (Accelerated Physics Port) ? :wink: But seriously, what about PCI-E; isn't that fast enough?

Mate Kovacs
29-Jun-2006, 16:32
If you know any, please enlighten me (note that there has to be a significant advantage, and it has to be used a lot, just like TEX on a GPU)!
Efficient and robust collision detection is the tough part of a dynamics simulator (integrating the equations of unconstrained motion is a piece of cake, especially when it comes to game physics).

So IMO, physics HW should have functions dedicated to collision detection *, like closest features tracking between polygonal models, some sort of broad-phase culling, etc.

EDIT: * And I mean, among other things, of course.

NocturnDragon
29-Jun-2006, 16:55
Which gpu is that doing better than bilinear filtering per cycle?

Geforce 1 (single cicle trilinear)

_xxx_
29-Jun-2006, 17:23
APP (Accelerated Physics Port) ? :wink: But seriously, what about PCI-E; isn't that fast enough?

16x could be, but it's just 1x and not even here yet. Only the PCI-version is out so far. And I say "could", because if it would really be done properly, it would need a proprietary bus to the gfx-card and require seemless integration within the gfx drivers.

Demirug
29-Jun-2006, 17:56
16x could be, but it's just 1x and not even here yet. Only the PCI-version is out so far. And I say "could", because if it would really be done properly, it would need a proprietary bus to the gfx-card and require seemless integration within the gfx drivers.

As long as you don’t want to transfer fully skinned objects 1x PCIe would be fine.

Additional chipsets could support fast point to point connections between PCIe cards.

If I understand nVidia right they want to push the physic data from the physic card to the graphics card over a SLI bridge. This would be hard to beat.

_xxx_
29-Jun-2006, 18:13
As long as you don’t want to transfer fully skinned objects 1x PCIe would be fine.

For doing somewhat correct/realistic physics with the whole gameworld? I severely doubt it. I'm talking about absolutely every vertice here having physics-influenced properties/behaviours, which is what I would consider "correct".

The physics as good as the "correctness" of todays gfx related to reality would be the least I'd expect there.

Chalnoth
29-Jun-2006, 18:18
I don't see why you need the physics meshes to be as high-detail as the visual meshes.

Nick
30-Jun-2006, 00:29
So IMO, physics HW should have functions dedicated to collision detection *, like closest features tracking between polygonal models, some sort of broad-phase culling, etc.
This doesn't need much more than standard floating-point additions and multiplications. Besides, I believe AGEIA designed its hardware to be generic enough to handle any physical calculation of the present and the future. It's fully programmable, but I very much doubt it has advanced out-of-order execution or threading to deal with long latency instructions.

_xxx_
30-Jun-2006, 08:27
I don't see why you need the physics meshes to be as high-detail as the visual meshes.

In that case, why would you need a PPU?

EDIT: if we're talking about "realistic" physics, it would have to be high detail. That's what Ageia's advertising at least. If physics should NOT be that complex, that denies the need for a PPU since your PC will be able to calculate low-detail physics without promlems without the aid of the PPU, see HL2 or such (which is what I'd describe as low-detail physics).

Mate Kovacs
30-Jun-2006, 09:23
So IMO, physics HW should have functions dedicated to collision detection *, like closest features tracking between polygonal models, some sort of broad-phase culling, etc.
This doesn't need much more than standard floating-point additions and multiplications.
I disagree.
If it's unable to efficiently handle the kind of data structures * that are needed for today's collision detection algorithms, then the fp add/mul performance isn't going to have any relevance.

* For which you need so much more than fp add/mul, IMHO.
EDIT: So it seems to me that it all comes down to how you interpret "much more", which is kind of wishy-washy for an argument to be based on it. :)

Nick
30-Jun-2006, 13:31
I disagree.
If it's unable to efficiently handle the kind of data structures * that are needed for today's collision detection algorithms, then the fp add/mul performance isn't going to have any relevance.

* For which you need so much more than fp add/mul, IMHO.
I'm sorry but you're not making much sense to me. Could you try to be more specific and use facts instead of humble opinions?

For all I know, software physics engines are limited by floating-point performance. In particular, well optimized physics engines are SSE limited. And Core 2 Duo will have two times more SSE execution units, which are twice as wide as a Pentium 4's. So compared to a single-core it's eight times faster. I can't see how this can be improved with 'dedicated' hardware.
So it seems to me that it all comes down to how you interpret "much more", which is kind of wishy-washy for an argument to be based on it. :)
Not really. Any other instruction I can think of which is useful for physics, is already present on CPUs and GPUs. So if the PPU has any exclusive instruction, it won't be used a lot. For GPUs it's instantly obvious to point out the need for a fully pipelined TEX instruction because it's hard to emulate with generic instructions. Now where is this 'obvious' specialized instruction for PPUs?

stepz
30-Jun-2006, 14:44
Unfortunately I don't have enough physics experience to offer straight facts, but I do have a (pretty well based) opinion.

AFAIK efficient collision detection and other similar physics algorithms need efficient access to advanced data structures (i.e. atleast scatter-gather would be needed) and heavy vector based floating point processing power. GPUs do have the heavy floating point lifting equipment, but fail on the advanced data structures front. It seems that even in D3D10, scatter would be insanely slow. CPUs on the other hand are really good at datastructures but fall short on the processing power front.

Another issue with GPUs might be branching granularity, but that is only a gut feel not based on anything solid.

The Aegeia PPU architecture is significantly different from both GPU and (modern x86) CPU architecture. Its actually eerily similar to Cell architecture. I feel that this is for a reason. You really cant get by only with streaming writes, which is the GPU creedo and with complex OOO CPUs you just don't have the room for enough parallel threads and vector units.

PS: Nick, given your assembly programming background, I'm surprised you got this wrong: Core2 architecture has twice the per clock vector power of P4, not four times. P4 does one SSE2 vector op per cycle when Fadds & Fmuls are interleaved. Core2 can do one Fadd and one Fmul each cycle, so twice the power and you still need 50/50 add/mul mix. Architecturally I think they have the same number of units but twice as wide and now on different ports. Its still 4 times as fast in total as a single core P4 though.

Mate Kovacs
30-Jun-2006, 15:47
For all I know, software physics engines are limited by floating-point performance.
Yeah, because they use efficient collision detection algorithms, so the integration code becomes the bottleneck. Which is obviously not the same as getting limited by collision detection done the dumb way (n^2), being unable to handle the data structures needed for an advanced algorithm efficiently.

In other words, if you can't do collision detection efficiently (being unable to handle the data structures necessary), you'll be slow (practically as well as theoretically) no matter how fast your fp add/mul is.

GPUs are not really good at handling those darn data structures.
Some posts from the neighbourhood:
http://www.beyond3d.com/forum/showthread.php?p=773607#post773607
http://www.beyond3d.com/forum/showthread.php?p=775117#post775117
http://www.beyond3d.com/forum/showthread.php?p=773275#post773275

CPUs are, on the other hand, well-suited to handle them, but they're not designed specifically for physics, so they're not especially good at doing e.g. fp-hungry stuff.

(Remember, I did not say that you don't need fp add/mul, my point was that you need "much more" than just those, and that fp performance is irrelevant if you don't have the functionality to make use of efficient collision detection algorithms. EDIT: BTW, you don't need fp operations to perform e.g. an AABB sweep test.)

Not really. Any other instruction I can think of which is useful for physics, is already present on CPUs and GPUs. So if the PPU has any exclusive instruction, it won't be used a lot. For GPUs it's instantly obvious to point out the need for a fully pipelined TEX instruction because it's hard to emulate with generic instructions. Now where is this 'obvious' specialized instruction for PPUs?
I'm not talking about single instructions, I'm talking about functionality. The whole architecture has to be designed with dynamics simulation in mind, to be able to carry out all the computations necessary for a simulation 'tick' on its own (no need for a CPU to detect those darn collisions, etc).

EDIT: And once more: I don't know if AGEIA's PhysX actually implements those facilities. I'm just saying that even if it does not, that's still no proof that we don't need PPUs that do.

Mate Kovacs
30-Jun-2006, 16:26
In particular, well optimized physics engines are SSE limited. And Core 2 Duo will have two times more SSE execution units, which are twice as wide as a Pentium 4's. So compared to a single-core it's eight times faster. I can't see how this can be improved with 'dedicated' hardware.
After you've done with collision detection, all the integration can be done in parallel, so it needs some kind of streaming model. But before that, you need random memory access, branching and stuff like that to utilise efficient collision detection algorithms.

So it'd be like if you did the collision detection (and response) on a CPU, the integration on a GPU, but you didn't have to send all the data back and forth between them.

PS: I can't be "much more" specific than that. :)

Nick
30-Jun-2006, 19:19
AFAIK efficient collision detection and other similar physics algorithms need efficient access to advanced data structures (i.e. atleast scatter-gather would be needed) and heavy vector based floating point processing power. GPUs do have the heavy floating point lifting equipment, but fail on the advanced data structures front. It seems that even in D3D10, scatter would be insanely slow.
Scatter can be turned into gather by writing out the indices and then performing a gather to do the updates. This will become much easier with Direct3D 10, which supports integer operands and output streams.
CPUs on the other hand are really good at datastructures but fall short on the processing power front.
Core 2 Duo is going to be a really big step to improve that.
The Aegeia PPU architecture is significantly different from both GPU and (modern x86) CPU architecture. Its actually eerily similar to Cell architecture. I feel that this is for a reason. You really cant get by only with streaming writes, which is the GPU creedo and with complex OOO CPUs you just don't have the room for enough parallel threads and vector units.
It's similar to Cell except that its clock frequency is only a fraction of it! A CPU makes up for its lack of high parallelism with a high clock frequency and high efficiency.
Nick, given your assembly programming background, I'm surprised you got this wrong: Core2 architecture has twice the per clock vector power of P4, not four times. P4 does one SSE2 vector op per cycle when Fadds & Fmuls are interleaved. Core2 can do one Fadd and one Fmul each cycle, so twice the power and you still need 50/50 add/mul mix. Architecturally I think they have the same number of units but twice as wide and now on different ports. Its still 4 times as fast in total as a single core P4 though.
The information that's available now is a bit limited to determine the exact configuration. But as far as I know a Pentium 4 can process only 64-bit per clock cycle, because it has only 64-bit wide SSE units (requiring a second cycle to start the other half of the instruction), and only one port for both addition and multiplication. From what I can tell, Core 2 Duo has two 128-bit SSE units each on a different port. It's probably one adder and one multiplier, but with interleaved instructions that's four times faster than a Pentium 4. So that's four times faster for a single-core, eight times faster for a dual-core.

In practice, dual-core brings a whole lot more than double the processing power. If previous a game used say 25% of a single-core to do physics processing, then moving that task to the second core on a dual-core allows four times more physics processing. Combined with the wider execution units of Core 2 Duo that's pretty phenomenal. Add to this that the 25% has been freed up for other tasks (on top of the overall more efficient architecture), and Direct3D 10 resolves some driver bottlenecks, and doing complex physics on the CPU becomes really interesting!

stepz
30-Jun-2006, 20:19
Scatter can be turned into gather by writing out the indices and then performing a gather to do the updates. This will become much easier with Direct3D 10, which supports integer operands and output streams.
I'd hazard a guess that this would be pretty slow. But in essence, yes its possible.

Core 2 Duo is going to be a really big step to improve that.

It's similar to Cell except that its clock frequency is only a fraction of it!
Slower, but wider. Anyway I'm not taking an opinion one way or the other on the Aegeia PPU. Just saying that physics processing wants a distinctly different model from either GPU or CPU. Atleast in their current form. I don't know the featureplans of GPU makers, but AMD has stated interest in doing assymetric multicore processors, with some processors being small vector cores.


The information that's available now is a bit limited to determine the exact configuration. But as far as I know a Pentium 4 can process only 64-bit per clock cycle, because it has only 64-bit wide SSE units (requiring a second cycle to start the other half of the instruction), and only one port for both addition and multiplication.
AFAIK pretty sufficient information is available about the pipeline configuration. You're accurate on that point that P4 has 64bit SSE units on one port. But the port is used only for the issue. The P4 (as current K8) can schedule a 128bit add on one cycle and a 128bit mul on the next.

From what I can tell, Core 2 Duo has two 128-bit SSE units each on a different port. It's probably one adder and one multiplier, but with interleaved instructions that's four times faster than a Pentium 4.
Correct, that the vector ports are Fadd + mov, Fmul/Fdiv + mov, shuffle + mov.

Nick
30-Jun-2006, 20:35
In other words, if you can't do collision detection efficiently (being unable to handle the data structures necessary), you'll be slow (practically as well as theoretically) no matter how fast your fp add/mul is.
This is true for the GPU, but it's going to improve considerably with Shader Model 4.0 and unified architectures.
CPUs are, on the other hand, well-suited to handle them, but they're not designed specifically for physics, so they're not especially good at doing e.g. fp-hungry stuff.
True for a Pentium 4, not much longer for a Core 2 Duo.

My only point is that AGEIA gets a lot of competition from different sides. And my prediction is that it's not going to survive it, because there's nothing unique enough to throw dedicated hardware at. Give CPUs more floating-point power, or GPUs more programmability, and you have perfect physics processors. And that's what's already happening...

Nick
30-Jun-2006, 20:47
So it'd be like if you did the collision detection (and response) on a CPU, the integration on a GPU, but you didn't have to send all the data back and forth between them.
With a PPU you also have to send data back and forth to the CPU (and then to the GPU), just at another stage. And I wouldn't be surprised if AGEIA still did some processing on the CPU.

So it still seems better to me to either do all the processing on a powerful CPU, or do some of the processing on the CPU and let the GPU handle the rest and immediate start rendering the results (and send them back to the CPU to update the scene).

Nick
30-Jun-2006, 21:00
You're accurate on that point that P4 has 64bit SSE units on one port. But the port is used only for the issue. The P4 (as current K8) can schedule a 128bit add on one cycle and a 128bit mul on the next.
So the port can take a 128-bit instruction for each execution unit every two clock cycles, starting a 64-bit operation on each execution unit every cycle? So it can sustain one 128-bit instruction every clock cycle (interleaved)? Interesting! It makes sense of course, to keep all execution units busy. And in this case Core 2 Duo would indeed be exactly two times faster (per clock per core).

I never owned a Pentium 4... :D

stepz
30-Jun-2006, 21:30
edit: Checked it over, according to Has de Vries (http://www.chip-architect.com/news/2003_09_21_Detailed_Architecture_of_AMDs_64bit_Cor e.html) its the same on K8 (Athlon 64). And K7 if I'm not mistaken.

Mate Kovacs
30-Jun-2006, 22:05
Give CPUs more floating-point power, or GPUs more programmability, and you have perfect physics processors. And that's what's already happening...
I mostly agree on this. You can pretty much get away with more fp power (and parallelism) on the CPU or more flexibility on the GPU side, but I still don't think it means that a dedicated physics HW couldn't be more efficient by any means. If you define efficiency as "how little work you have to do on existing stuff", then any PPU will be 'inefficient', compared to modified CPUs/GPUs, of course.

With a PPU you also have to send data back and forth to the CPU (and then to the GPU), just at another stage.
Well, I'd argue that it depends on the type of the PPU. :) You could make such a PPU that'd need the whole stuff only once, then only the external forces/thrusts acting at the beginning of each tick, so it wouldn't need to send anything back to the CPU, only explicit queries or notifications would be necessary.
EDIT: And it'd need a direct link to the GPU, of course.

And I wouldn't be surprised if AGEIA still did some processing on the CPU.
Me neither, sadly. :)

So it still seems better to me to either do all the processing on a powerful CPU, or do some of the processing on the CPU and let the GPU handle the rest and immediate start rendering the results (and send them back to the CPU to update the scene).
The method using the GPU as well sounds more convincing to me, because the integration is a piece of cake for any stream processing thingy. But I guess you'd need SM4.0 to do it efficiently. I mean, if you have e.g. rigid bodies, then you integrate the positions/orientations in a stream, and besides sending the results back to the CPU, you want to render a model according to each position/orientation pair, but probably a different one for each body. Is this even possible with SM4.0? IIRC it's not possible with the simple geometry instancing present in SM3.0.

JF_Aidan_Pryde
01-Jul-2006, 16:46
I wonder how 'big' of a problem physics is, especially compared to graphics.

For graphics, in the beginning, it was mostly about texture sampling. Today it has grown to involve vertex and pixel shaders, multiple textures, AA and stencil units. When the first graphics accelerators were released, one could still get by with software renderers; letting the CPU do point sampling was still a feasible fallback. But with the years, the graphics load stacked up. Now it's inconceivable that the CPU can do all the work that it's currently doing plus all the shader, texture and AA work.

What about physics? Are there a whole bunch of improvements that can be had with additional hardware, as analogous to graphics?

The improvement in graphics also came from better shader models. We went from flat, gouraud to phong as hardware improved. Is there a similar pattern with physics algorithms?

Without really understanding the scope of physics, the range of available algorithms, their complexity and how well they map to specialized hardware, I don't think we can really predict if physics hardware has a future.

SPM
01-Jul-2006, 20:42
AFAIK efficient collision detection and other similar physics algorithms need efficient access to advanced data structures (i.e. atleast scatter-gather would be needed) and heavy vector based floating point processing power. GPUs do have the heavy floating point lifting equipment, but fail on the advanced data structures front. It seems that even in D3D10, scatter would be insanely slow. CPUs on the other hand are really good at datastructures but fall short on the processing power front.

Another issue with GPUs might be branching granularity, but that is only a gut feel not based on anything solid.

The Aegeia PPU architecture is significantly different from both GPU and (modern x86) CPU architecture. Its actually eerily similar to Cell architecture. I feel that this is for a reason. You really cant get by only with streaming writes, which is the GPU creedo and with complex OOO CPUs you just don't have the room for enough parallel threads and vector units.


Sounds like a good reason to use Cell on a PPU - with the PPE managing the data structures, and a high bandwidth between the PPE and SPEs on the same chip, and scatter/gather list DMA to feed the SPEs, it should solve the problem of relatively low bandwidth between the PCs CPU and the PPU card. Cell is also considerably more powerfull than the Aegia PPU, and should be a lot cheaper - if Ageia picks up defective Cell chips unusable on the PS3 because more than 2 SPEs have failed. I can't see why Ageia isn't doing this, unless Sony is preventing them from doing this.

SPM
01-Jul-2006, 21:00
Without really understanding the scope of physics, the range of available algorithms, their complexity and how well they map to specialized hardware, I don't think we can really predict if physics hardware has a future.

My thoughts exactly. I think a programming/modelling language for games physics or extensions to a existing programming/modelling languages/tools would be more suited to exploiting the flexibility that is possible with physics implementations. Perhaps an extensible and end user modifiable set of C++ objects or physics extensions to Cg or animation tools would be more appropriate than an API.

Chalnoth
01-Jul-2006, 21:12
Why would you want a special language dedicated to physics?

SPM
01-Jul-2006, 21:41
Why would you want a special language dedicated to physics?

Because you would want to integrate physics into animation and having user modifiable objects incorporating physics properties would allow you to modify the physics and apply it in a custom way to suit the gameplay. For example you could use rag-doll animations for distant objects that are hit, and physics based animations for close up objects. You could apply pseudo physics to an object instead of an animated sequence to give more realism rather than full physics which would require more computing power. For example rather than animating individual particles in an explosion in an aircraft, you may create an animation of the explosion and modify it based on a physics based formula eg. to account for wind shear. Without this kind of flexibility you would run into excessive computing overhead required to modelling everything with physics in realtime, and you may end up with explosions and other effects looking the same, and therefore artificial on all games that use the same API.

Chalnoth
01-Jul-2006, 22:20
Which you could program in C++ as well.

SPM
03-Jul-2006, 20:25
Which you could program in C++ as well.

Actually a standard well thought out reusable source code object library in C++ consisting of physics primitives and more complex objects built up from those primitives would be ideal (prefereably GPLed so you could adapt the standard objects to suit). This would certainly be preferable to a set of API calls that allow manipulation of a fixed set of physics objects.

Chalnoth
03-Jul-2006, 23:53
Actually a standard well thought out reusable source code object library in C++ consisting of physics primitives and more complex objects built up from those primitives would be ideal (prefereably GPLed so you could adapt the standard objects to suit). This would certainly be preferable to a set of API calls that allow manipulation of a fixed set of physics objects.
I completely disagree. As I've said a few times, physics is really, really simple in terms of what you can do. So I think it's highly amenable to an API structure.

For example, you can use the following variables to completely describe the motion of any rigid body:

1. Mass
2. Moment of Inertia tensor (3x3 symmetric matrix, so need only store 6 numbers)
3. Velocity (3 numbers)
4. Angular velocity (3 numbers)
5. Position (3 numbers)
6. Rotation (3 numbers)
7. Sum of all forces (3 numbers)
8. Sum of all torques (3 numbers)

Any rigid object, no matter how complex, can have its motion exactly calculated from the above terms. You can simplify the calculations if your object is a simple geometric form (often done in physics API's). Once you tack on to the above connections to other objects (constraints that will lead to applied forces), and collision detection, you will have a complete physics engine for use with rigid bodies.

Anyway, what I'm attempting to say here is that problems with game physics are sufficiently simple and few in their possible combinations that they are amenable to an API interface. Sure, you might consider a more advanced physics engine that deals with, say, fluids, deformable solids, or breakable solids, but those are also similarly-simple and tenable problems to tackle in a completely general way within an API.

MipMap
08-Jul-2006, 10:13
[QUOTE=Nick] Physics only need additions and multiplications.

Try to do physics calculations without division - its like binary without the '1' s
:lol:

MipMap
08-Jul-2006, 10:32
[QUOTE=JF_Aidan_Pryde]I wonder how 'big' of a problem physics is, especially compared to graphics.

Graphics is about shading and rendering millions of pixels derived from 100,000's of polygons (currently).

Physics is about calculating (probably) 100,000's to millions of interactions between irregular volumes with mass.

Although I cannot provide hardcore facts here, it is much easier to shade a polygon than to calculate a volume interaction (such as collision between one curved object and another).

(If you are in any doubt about the above statement then check out some of Dinesh Manocha's publications on the topic at this link http://www.cs.unc.edu/~dm/)

As the demand for scenes containing large quantities of such interactions grows (lots of explosions etc) (assuming it will, this is not a given) hardware acceleration for these operations could be a good idea.

I don't see many gamers shelling out 300$ for an add-in board however - but I could imagine a graphics board with a 50$ extra chip on board...

Maybe we are really seeing the birth of a new computer peripheral chip here - it sort of reminds me of when sound cards came out - there were nearly as many people saying "who needs a sound card" or "who would pay 200$ for a sound card" then as the people saying such about physics acccelerators...

MipMap
08-Jul-2006, 10:39
problems with game physics are sufficiently simple ...

I agree in principle for forward calculations such as "given this object with this mass and velocity what will its position be in 10 sec." but inverse calculations are not so straightforward.

Thus, calculating if two curved surfaces collide is not so easy. Currently, tacky heuristics are used in most games which allow you, for example, to stick the barrel of your gun through the wall and shoot your opponent, since the game engine cannot discriminate the intersection accurately...

Chalnoth
08-Jul-2006, 10:59
By simple I meant that there are few enough combinations for game physics to be amenable to an API format. Yes, collision detection is the hardest problem in game physics.

P.S. End those quotes in your posts with [/quote]...

Blazkowicz
08-Jul-2006, 12:14
Thus, calculating if two curved surfaces collide is not so easy. Currently, tacky heuristics are used in most games which allow you, for example, to stick the barrel of your gun through the wall and shoot your opponent, since the game engine cannot discriminate the intersection accurately...

I'm under impression that in most games, bullets are fired from the player's point of view (which may be consistent neither with the gun's muzzle nor the actual playermodel's eyes :)), and in any good game the shot is stopped or more or less weakened by a wall depending on the projectile's "penetrativity" and the wall material (don't hide behind a wooden door).

but I'm only considering FPS and I'm particularly thinking about counterstrike 1.5 :oops:

_xxx_
08-Jul-2006, 14:27
I'm under impression that in most games, bullets are fired from the player's point of view (which may be consistent neither with the gun's muzzle nor the actual playermodel's eyes :)),

Yeah, the guns in Q3 and D3 engine games are about 1m in front of the player in mid-air. I hate that.

IgnorancePersonified
11-Jul-2006, 04:04
from elitebastards.com I got this link (http://enthusiast.hardocp.com/article.html?art=MTA5NywxLCxoZW50aHVzaWFzdA==)

All dressed up with no-one to blow is a phrase that comes to mind.

Fred
14-Jul-2006, 11:12
Chalnoth, thats a prohitibitively large amount of information. You can store that information per object, but not per fragment, which you ultimately need and want if your engine is going to look anything remotely realistic. The bandwidth alone necessary is immense, nevermind the calculations that are going to be involved in nonlinear regimes.

Thats the catch with physics. We can use easy going algorithms that approximate lighting and so forth b/c our eye is easily tricked *sometimes*. Not so with our brains ability to predict physical phenomena. Everytime a ball hits the edge of a square and doesn't deform and scatter appropriately, we get an impulse that says 'fake'.

The whole trick then is to decide what level of coarse graining we can get away with.. Presumably something thats adaptable, as some scenes can be dealth with per object, whereas others will require per fragment. I know of no good way this can be decided by hardware, only a software programmer can do it right. Hence my aversion to physics hardware and hardcoded limitations

Chalnoth
14-Jul-2006, 11:27
Chalnoth, thats a prohitibitively large amount of information. You can store that information per object, but not per fragment, which you ultimately need and want if your engine is going to look anything remotely realistic. The bandwidth alone necessary is immense, nevermind the calculations that are going to be involved in nonlinear regimes.

Thats the catch with physics. We can use easy going algorithms that approximate lighting and so forth b/c our eye is easily tricked *sometimes*. Not so with our brains ability to predict physical phenomena. Everytime a ball hits the edge of a square and doesn't deform and scatter appropriately, we get an impulse that says 'fake'.

The whole trick then is to decide what level of coarse graining we can get away with.. Presumably something thats adaptable, as some scenes can be dealth with per object, whereas others will require per fragment. I know of no good way this can be decided by hardware, only a software programmer can do it right. Hence my aversion to physics hardware and hardcoded limitations
Just talking rigid body dynamics, there's no reason at all to store things like the moment of inertia and whatnot per-fragment. Nobody's really going to notice if any ball smaller than a kickball doesn't deform when it hits the edge of a cube. Deformable objects are another beast entirely.

IgnorancePersonified
11-Sep-2006, 01:44
Anandtech (http://www.anandtech.com/video/showdoc.aspx?i=2828)have an update with City of Villians.

Skrying
11-Sep-2006, 02:13
Article points out why Ageia will fail.

It'll be only for the ultra high end and because it takes extra work to add to a games code, game developers wont see the benefit when only 1% of their players will end up using one.

The article proved that if you're running a single core and looking to get the best improvement then going to dual core is better than picking up a PhysX.

RacingPHT
11-Sep-2006, 04:41
Scatter can be turned into gather by writing out the indices and then performing a gather to do the updates. This will become much easier with Direct3D 10, which supports integer operands and output streams.

I seriously doubt that.. it's hard (or impossible) to emulate appending operation, or anything else rely heavily on pointers. for example, how to build a tree whose leaves containing a list of nearby objects on GPU?

Scatter may be easier for Geometry shaders which support "append" operation but is still very restricted at other aspects. such work may still load on CPUs and need many inter-chip communications with pipeline stall penalty.

SPM
30-Sep-2006, 19:13
I completely disagree. As I've said a few times, physics is really, really simple in terms of what you can do. So I think it's highly amenable to an API structure.

For example, you can use the following variables to completely describe the motion of any rigid body:

1. Mass
2. Moment of Inertia tensor (3x3 symmetric matrix, so need only store 6 numbers)
3. Velocity (3 numbers)
4. Angular velocity (3 numbers)
5. Position (3 numbers)
6. Rotation (3 numbers)
7. Sum of all forces (3 numbers)
8. Sum of all torques (3 numbers)

Any rigid object, no matter how complex, can have its motion exactly calculated from the above terms. You can simplify the calculations if your object is a simple geometric form (often done in physics API's). Once you tack on to the above connections to other objects (constraints that will lead to applied forces), and collision detection, you will have a complete physics engine for use with rigid bodies.

Anyway, what I'm attempting to say here is that problems with game physics are sufficiently simple and few in their possible combinations that they are amenable to an API interface. Sure, you might consider a more advanced physics engine that deals with, say, fluids, deformable solids, or breakable solids, but those are also similarly-simple and tenable problems to tackle in a completely general way within an API.


Yes Physics primitives are simple, but the reason I suggested a documented object library built on a set of more primitive objects, rather than a fixed API, is that you can go into the source code of the standard physics library, and tweak things to change behaviour and effects it the way you want, rather than be limited to the features of the API. I am talking about more developed and specific objects like explosions, rag dolls, smoke trails, water coming out of a hose etc. - ie. a set of partly done and documented effects which you could then tweak and customise, or add to, in order to suit your application. The point is if you are dealing with things that are that simple why do you need an API either - APIs will be restrictive than object libraries.

Guden Oden
01-Oct-2006, 04:08
The article proved that if you're running a single core and looking to get the best improvement then going to dual core is better than picking up a PhysX.
Proved schmoved.

Most people don't have PCs with mobos that accept dual core CPUs. A PPU however needs only an empty PCI slot (and a molex power connection), and most people have several unfilled slots.

Skrying
01-Oct-2006, 04:41
Proved schmoved.

Most people don't have PCs with mobos that accept dual core CPUs. A PPU however needs only an empty PCI slot (and a molex power connection), and most people have several unfilled slots.

$50 motherboard or $300 physics card. Hmmmmmmm.

Even better, $50 motherboard, $180 CPU (that's a Core 2 Duo) and better general performance increases and still cheaper.

So.... what were you trying to prove again?