PPU = good for vertex processing?

Neeyik

Homo ergaster
Veteran
I was just thinking about the now-ubiquitous PPU and was thinking of applications other than accelerating calculations required for physics modelling; the first that popped into my mind was vertex processing, the reason being that it's a chip full of parallel 32-bit FPUs. The other reason for thinking this is the way that Intel are behind the technology - their graphics product perform all vertex processing via the CPU; how possible would it be for a system with an Intel GPU and an AIB PPU to offload the vertex workload to the physics chip?
 
I think a PPU would only be good for low-end vertex work.

A PPU has a fairly complex array of functional blocks - so out of the 125m transistors they seem to be talking about for initial devices, there might only be 10% that could usefully do vertex work (i.e. the right kind of functionality arranged in the right kind of pipeline).

There's also the question of quantity of data. Consider low- to mid-range graphics versus high end physics: 30 frames per second (of graphics), each scene with 500,000 vertices versus 30 ticks per second (of physics) with 50,000 objects. Hey, I'm just guessing here.

If you assume that only 10% of a PPU can do vertex work, then it seems like it would be 2 orders of magnitude off target.

Anyway, I base all this on nothing more than a glance at the block diagrams in the patent. It seems to me that you'd only get budget level graphics out of a PPU accelerated vertex system: CPU-PPU-GPU. And that's using a pretty expensive PPU...

Jawed
 
DiGuru said:
It would be very desirable if the PPU could do collision detection.

If it can not do collision detection it will not be a PPU.

If Phys-X can do anything that novodex do it will support collision detection.
 
Demirug said:
DiGuru said:
It would be very desirable if the PPU could do collision detection.

If it can not do collision detection it will not be a PPU.

If Phys-X can do anything that novodex do it will support collision detection.

Yes, but where are the vertices used to do the collision detection? Who will calculate the results? Or do you send bounding boxes to the PPU?
 
The PPU need a object description that contains geometry data. There are many ways to do this. From can be a simple bounding box to a mesh of triangle.
 
Demirug said:
The PPU need a object description that contains geometry data. There are many ways to do this. From can be a simple bounding box to a mesh of triangle.

And what does it return?

What I mean is, that you want the PPU to do the collission detection to prevent sending the updated data over again and again. And if it can do that and return position updates / changed geometry, it can do vertex processing. And it being massively parallel, it will surely be better at it than the CPU, won't it?
 
Given the relatively large amounts of memory on the PPU I'd guess it holds the entire scene in that memory and on CPU side you just send updated external forces. I'd assume it returns positions and velocities.

I still do not have a clear understanding of what a PPU actually does, nor how fast it really is, the patent as I read it describes little more than a large N-Way vector processor with a command queue.

If that's actually what it is it could probably do vertex work, better than a general processor, not aswell as a GPU, but any efficiency you gained would be lost transfering the verrices to and from it.
 
Yes, that would be my guess as well, that it does the entire scene management and sends it directly to the GPU when ready. Otherwise, there is not very much you could win that wouldn't be lost by the additional data transfers as far as I understand things.

Edit: Alpha textured objects would be an interesting case. I don't think it will be able to handle that.
 
I think something isn't quite right about the above description although I may be misunderstanding things.

If a GPU can handle 500,000 vertices or so then let's just say that.

Now a PPU is supposed to handle up to 40,000 objects as the highest number I've seen, but since 50,000 was used let's just use that.

Now each object would consist of a minimum of 4 vertices which would form a 3-point base pyramid which uses the least amount of vertices and faces to form an enclosed space.

That would mean the least amount of vertices a PPU could handle in so much as how many are available is 200,000. If cubes/boxes were used as the smallest available object then the number of vertices available to work with is 400,000 vertices due to the requirement of needing 8 vertices for form an enclosed six face box. Given that Aegia was non-descript about this we've no idea the type(s) of objects used to in conjunction with the numbers that have been circulating. One could make an argument that the objects could be more complex or less complex than boxes on the average, thus the range would be:

200,000 <= # of vertices <= 400,00 ( OR ) >= 400,000

I need help as to whether being able to manipulate 200,000 vertices would be of value given the losses in needing to transfer data (it's getting close at least right? close to the numbers of a older GPU/VPU no?), but I think 400,000 or better would be worth it. (but I'm still guessing of course.)

The main problem with these numbers is that first and foremost a PPU is meant to acceleration physics interactions/calcs making vertex processing a secondary task which will never approach using all of the parts processing power.

Even initially when it makes the most sense that the PPU would be be doing hair simulations etc which would only add aesthetic/visual flair to games I think the PPU would be fully leveraged to this task if for no other reason than to show it's value and future potential. After such a time as the PPU is allowed to directly affect gameplay or rather gameplay requires interactions only a PPU could provide I would think most would not be happy with losing the afore mentioned aesthetics so the PPU would be under even more duress to do more interactions in the stead of vertex processing.

My guess is that the PPU would be to busy doing other things even if it could be put to task doing vertex processing. Maybe I'm off but that's how it looks to be to me.
 
scificube said:
The main problem with these numbers is that first and foremost a PPU is meant to acceleration physics interactions/calcs making vertex processing a secondary task which will never approach using all of the parts processing power.
That's my main point.

One might argue that in amongst all the other work that the PPU does, early vertex rendering work might "drop out", sort of providing the triangle data for free. The PPU takes account of character animation, destructible objects and collisions, say, but it doesn't take account of the viewport or the level of detail required for graphics rendering. That, I think, is where the problems start.

If, on the other hand, you dedicate the entire PPU solely to graphics work, you still have a question over how appropriate the PPU's computational pipeline(s) are to graphics. And the level of detail.

As I said, I only glanced at the patent. I dare say I haven't the stomach for deconstructing it... I don't have the link for it, now - mainly because I just don't want to get involved in that level of detail - I'm surprised no-one else has tackled it...

Maybe it's the scary maths. Seems scarier than graphics...

Jawed
 
But if the PPU isn't able to directly manipulate the meshes, how is it going to do it's stuff? Let's take something simple: a pile of boxes. One of them is hit by something. How are we going to compute the forces?

To be able to calculate things, the PPU needs to know there is a pile of boxes that are going to interact. Because if you only send the command: "I've got this box here, it is hit by this force vector, calculate the movement", you will see that it collides with some of the other boxes in the pile. So, you end up sending an essentially endless amount of commands to the PPU. That won't work.

Another approach is to just off-load the actual calculations to the PPU. In that case, you use it as a co-processor for vector calculations. But the actual calculations are peanuts, compared to the amount of data you need to shift to and from the PPU, across the slow PCI(e) bus to tell it everything it needs to know to calculate a simple transaction of force. That won't work either.

And if you only make it calculate single forces, it needs to update all the position information and directions, send that back to the CPU, and update the scene that's in main memory. So, the CPU has to do the same transformations as well!

So, you NEED the PPU to be able to "see" the whole scene.
 
The patent describes an exemplary embodiment of the FPE inside the PPU as consisting of an integer unit, 4 scalar FPUs, and 4 4-way SIMD FPUs (AFAICT they are controlled with a single VLIW instruction stream). In terms of floating point power, that's roughly the equivalent of 4 "vertex shader units". IMO the notable difference is that each of the functional units in a PPE can exchange data with eachother as well as various memory banks. The programmer also has explicit control over what data gets sent where (via the DME), making it way more flexible than the vertex shader model. I would actually expect a PPU to be quite good at vertex work.

Static geometry can very well be identical between GPU and PPU, but any kind of dynamic stuff (skinned characters) is probably sent as a rigid bodies { position, orientation, velocity, bounding volume, constraints (i.e. joint types), links to other rigid bodies } or particles.

The way I envision things is that the driver tracks the complete set of active and inactive rigid bodies and decides which ones need to be sent to the PPU for processing, recieving position/velocity/orientation updates in return. In the case of soft bodies (water surfaces, cloth) though, it could make sense for the PPU to just send a GPU compatible mesh back to the driver.
 
According to the novodex sdk manual, the core runtime doesn't need to know the exact shape of the object you want to do collision detection. Most of time, it just uses a bunch of bounding volume to "approximate" the original mesh(and the info it returns is just the updated position/orientation of the whole mesh, not the ones for each indivisual vertex), it's not 100% accurate, but the result is acceptable, especially in fast-moving scenarios. If PPU works in the same way as novodex runtime, it's obviously not suitable for vertex shading.

I think a better way to utilize ppu's power is to do occlusion culling on it. The basic operation between collision detection and some kind of occlusion culling is quite similar, I think it wont take much work to make ppu and the novodex api compatible with culling purpose.
 
Sorry I obviously wan't clear.

The scene the PPU sees is going to be much simpler than the one the GPU is rendering.

I also doubt that the PPU talks directly to the GPU. It almost certainly just returns positions and velocities for each of the simulated bodies to the application, and the application passes them to the renderer along with the appropriate meshes.
 
ERP said:
It almost certainly just returns positions and velocities for each of the simulated bodies to the application, and the application passes them to the renderer along with the appropriate meshes.

That would be my guess, at least the current software version works in this way.
 
ERP said:
Sorry I obviously wan't clear.

The scene the PPU sees is going to be much simpler than the one the GPU is rendering.

I also doubt that the PPU talks directly to the GPU. It almost certainly just returns positions and velocities for each of the simulated bodies to the application, and the application passes them to the renderer along with the appropriate meshes.

How should the PPU talks directly to the GPU? The two chips don't know anything about the private protocol of each other.

You need the CPU to take the data from the PPU and convert it to a format that the GPU can work with. To make anything run at the same time each chip (GPU, PPU) need to work on a different frame. This will add you one additional frame of latency.
 
They dont know anything about eachother, until you reprogram the PPU ... software knows everything you know, and the PPU is almost certainly software driven.
 
psurge said:
The patent describes an exemplary embodiment of the FPE inside the PPU as consisting of an integer unit, 4 scalar FPUs, and 4 4-way SIMD FPUs (AFAICT they are controlled with a single VLIW instruction stream). In terms of floating point power, that's roughly the equivalent of 4 "vertex shader units". IMO the notable difference is that each of the functional units in a PPE can exchange data with eachother as well as various memory banks. The programmer also has explicit control over what data gets sent where (via the DME), making it way more flexible than the vertex shader model. I would actually expect a PPU to be quite good at vertex work.

Static geometry can very well be identical between GPU and PPU, but any kind of dynamic stuff (skinned characters) is probably sent as a rigid bodies { position, orientation, velocity, bounding volume, constraints (i.e. joint types), links to other rigid bodies } or particles.

The way I envision things is that the driver tracks the complete set of active and inactive rigid bodies and decides which ones need to be sent to the PPU for processing, recieving position/velocity/orientation updates in return. In the case of soft bodies (water surfaces, cloth) though, it could make sense for the PPU to just send a GPU compatible mesh back to the driver.

Yes, I agree with that (and most of the other posts as well ;) ) That would make the most sense.

I think there are three different considerations:

1. Is the PPU only as good as the current software implementation, or is it able to do more and better calculations? In the latter case, you need the objects to match the "original" ones closer, so very rough bounding boxes might work less well, and it might be better to just send the complete meshes over to it.

2. Is the PPU able to do more advanced stuff, like destructable objects, clothes and bouncing boobies? I assume so, as those would be the main selling points. If it cannot do those, what's the use of it? And if it can do that, it must be able to calculate and return meshes.

3. If the CPU has to do too much work and / or transfer too much data, (calculate bounding boxes -> send to PPU -> wait -> get data back -> perform transforms according to new positions and vectors -> send to GPU), it might only make sense if the PPU can do it much better than the CPU could in that time.

So, it would actually make sense to send over the whole scene as-is up front, and have the PPU do the transformations and send it to the GPU. Or at least do as much of the work as possible and return (transformed parts of) the updated scene to the CPU. Although the first possibility would restrict the extra effects the CPU can do, you could substitute most of them with shaders. And it is much faster. The only things that might be hard to do this way, are complex sub-object skinning (when the PPU sends it to the GPU) and alpha-textured 2D meshes for the collision detection.

But then again, we don't know and it is all just speculation at this time. ;)
 
Back
Top