*spin-off* Naughty Dog, SPUs, Radiosity

The PhyreEngine GDC 2011 slides has the most comprehensive coverage on SPU rendering. Everything from vertex culling, skinning, screen tile categorization, particles, volumetric lighting, SSAO, DoF, to MLAA was presented in a coherent framework:
http://research.scee.net/files/presentations/gdc2011/2011-GDC-PhyreEngine.pdf

It has some performance numbers for these techniques too.

EDIT: I am stupid. Pasted the wrong link ! :oops:

The real one is here:
http://www.scribd.com/doc/16104972/Deferred-Lighting-and-PostProcessing-on-PS3

The first link is an overview of PhyreEngine 3.0, which talks more about their high level features and Vita support.

Hmm... many other PS3 technical slides here too:
http://www.technology.scee.net/presentations

... including a Move SDK presentation. Didn't know the SDK already supports PS Eye voice recognition. >_<
 
Seems like geometry is costly in memory in the NGP.

Also all those cool techniques make me wish for a dual or quad cell with a massively improved PPU and double the local storage with some other tweaks to the SPU for the PS4 CPU.
 
If they go with the SPUs, Sony may need to figure out how to couple the SPUs tighter with the GPU, like the Compute Shaders. That way the SPU work and GPU work can be combined into "one pass" easily. Anyway, this is nextgen speculation stuff.

The techniques in that slides are awesome indeed. It's a rather old presentation, I didn't pay attention to it until a few days ago when looking at Battlefield 3 slides. I think some of the basic "Work Smarter" principles can be applied to 360 and WiiU too. The Battlefield 3 engine (FB2 ?) improved on the basic Edge library. e.g., For triangle culling, DICE also implemented more custom culling (not occlusion culling).

The volumetric lighting part talks about how they implement it on the SPUs by sampling the shadowmap (instead of relying on the artist to place/fake the effect).

Will see if I can find any Geomerics for SPU presentation.
 
Can't find anything specific to SPU Geomerics yet. But I found Bizzare Creation's SPU assisted rendering slides from a year ago:
http://www.spuify.co.uk/?p=355

Pretty interesting too. The lighting runs on only 3 SPUs and is slightly faster than RSX (There are other work shared between the 2). Not sure if they max out the h/w. ^_^
 
Also all those cool techniques make me wish for a dual or quad cell with a massively improved PPU and double the local storage with some other tweaks to the SPU for the PS4 CPU.
Recent fully programmable GPUs (Fermi and AMD GCN) are more efficient in all the tasks presented in that document. These chips have super fast local work memory in their shader modules (for data sharing between threads), substantially faster branching than the current console GPU hardware is capable of, synchronization primitives (barriers, atomics, etc), and have full access to the texture sampler units as well (fast filtering, fast texture decompression, fast untiling) and fast random access to the whole memory space (good latency hiding). So you can share SSAO calculations with neighbor pixels, do efficient light cone culling on the GPUs (for really efficient screen space tiled deferred shading), implement very efficient sorting algorithms such as radix sort (simple bitonic sort is really slow), etc. And you don't have to worry about high latency of mixed GPU->CPU->GPU calculation, reducing your g-buffer data bit depth to minimum because of bandwidth issues, or trying to implement some custom software texture filtering/caching system to approximate 16x anisotropic filtering for your high quality EVSM shadows.

For current generation consoles, you have to do what you have to do, to save some precious GPU cycles. But it's not likely something that would be efficient for next generation, unless of course we have a console with too much CPU power. But I doubt that.
 
Sure but there are limits of how much you can have the GPU do at one time. So Having a CPU that's capable of it even at slower rates will always be a boon as well.
 
Sure but there are limits of how much you can have the GPU do at one time. So Having a CPU that's capable of it even at slower rates will always be a boon as well.
But then alternatively you could spend those CPU transistors on the GPU instead, so it can do more... ;) Who'd want to be a console hardware design with all these choices to make?!
 
Which in turn you'd use making prettier graphics leaving cool things like physics by the wayside and on and on in circles we go. :p
 
But then alternatively you could spend those CPU transistors on the GPU instead, so it can do more... ;) Who'd want to be a console hardware design with all these choices to make?!

But considering you're opting for a 2-chip design as opposed to an APU, if you're GPU is already a monster chip such that making it any larger would make it far too big or too hot for a console, why not have a more capable CPU to go alongside it and suppliment its functionality as well as providing more capable general computing resource for stuff like physics, AI etc.

I'm keen to see pretty graphics next gen as much as anyone else, but i'm even keener to see what devs can do with a much stronger CPU for stuff like better animation, physics, AI etc... i.e. all the stuff that got thrown by the wayside this gen for the sake of making games more pretty (and we were doing so well this gen with games like RF:Guerilla and it's GeoMOD destruction... where did we go wrong :().
 
That's different to Xenus's idea of a CPU being used for graphics work, and Sebbbi saying CPU's are superceded in those technqiues by current GPUs. To be specific, if the intention is to include a monster CPU with a view to it operating like Cell in PS3, you'd get better graphics results using those transistors on the GPU, but then you'd lose versatility for other tasks.
 
That's different to Xenus's idea of a CPU being used for graphics work, and Sebbbi saying CPU's are superceded in those technqiues by current GPUs. To be specific, if the intention is to include a monster CPU with a view to it operating like Cell in PS3, you'd get better graphics results using those transistors on the GPU, but then you'd lose versatility for other tasks.

Oh i agree Shifty that a GPU would be more efficient at the graphics related tasks than a beefier Cell chip. I wasn't directly commenting on Xenus's idea of beefing up Cell to make it even more capable for graphics tasks.

My post was more towards a scenario where choosing a 2 chip solution with fixed chip area for both GPU and CPU; once you've chosen your GPU to be as big as you can concievable run in a console without it being too hot and big, then why not have a Cell based CPU chip, but improved for both graphics and general purpose computing tasks?

I think mine is a slightly different premise. In that once you have a GPU designed to be as monstrous as you can run it, a CPU chip with the versatility of CELL that can be used for both graphics and running general game code faster wouldn't be redundant would it? In that devs would always find a use for it.

Edit:
Perhaps my premise is flawed, but it seems to me that CELL this gen has been limited in it's employ by having to make up forthe deficiencies of RSX. If liberated from that next gen, i'd be interested to see what devs could do with an improved Cell design on stuff like physics, AI etc. Or perhaps an entirely different CPU design would be even more efficent than CELL for this?

Or maybe i've gone way too far off topic :p
 
The whole concept of CPU vs GPU is flawed. You need to consider how much bandwidth you need, how much fixed function versus flexible cores, how much integer vs floating point and at what precision, how much branching logic etc.

From there, you design the data pipeline, and what way logic is allowed to interact with that data pipeline. That should inform how many of what kind of logic is on what chips. Which part of that you call CPU or GPU is, in my view, pretty irrelevant. ;)
 
That's different to Xenus's idea of a CPU being used for graphics work, and Sebbbi saying CPU's are superceded in those technqiues by current GPUs. To be specific, if the intention is to include a monster CPU with a view to it operating like Cell in PS3, you'd get better graphics results using those transistors on the GPU, but then you'd lose versatility for other tasks.

Ultimately it's about balance...

Most modern games leverage large proportions of the CPU compute power massaging massive amounts of data (vert transforms, animations, occlusion, culling, data streaming etc...) to feed the GPU...

Unless you have sufficiently balanced system you're going to end up with bottlenecks whereby you have all this GPU horsepower to burn & not enough CPU cycles to feed it efficiently...
 
My post was more towards a scenario where choosing a 2 chip solution with fixed chip area for both GPU and CPU; once you've chosen your GPU to be as big as you can concievable run in a console without it being too hot and big, then why not have a Cell based CPU chip, but improved for both graphics and general purpose computing tasks?
If you're limited by GPU power and heat, then yes, chuck in as a big a CPU as possible! :mrgreen:

Edit:
Perhaps my premise is flawed, but it seems to me that CELL this gen has been limited in it's employ by having to make up forthe deficiencies of RSX. If liberated from that next gen, i'd be interested to see what devs could do with an improved Cell design on stuff like physics, AI etc.
Yeah, if Cell weren't having to pick up the slack from RSX, it'd have a lot to offer non-graphics engine work. May be just an improved Cell with decent OoO PPE would be enough for next gen with a decent GPU.

Or maybe i've gone way too far off topic :p
Oh yeah, this is an ND software tech thread, not next-gen design thread! :oops:
 
GPU is better suited for massively parallel math heavy raw processing. CPUs are complex and consume a lot of power, because they are designed to run complex unoptimized single threaded code that has lots of branches. If you use a CPU to run simple highly optimized code loops that do not have branching and do not need complex sequential processing, you are wasting a lot of power and transistors. Of course Cell SPUs are better in this case than the common cache coherent consumer CPUs, since they are simpler, and have higher math/logic density. But GPUs are almost pure math beasts, with very little transistors wasted on logic.

You can never have too much GPU processing capacity in a gaming console. And with the new fully programmable GPUs, you can always use the remaining cycles to help the CPU in math heavy large scale calculations. So the GPU will never have any idle cycles. The more the merrier.

Next Visual Studio will have C++ AMP (Accelerated Massive Parallelism) and AMD will have their own C++ compatible GPGPU language extension. CUDA is also evolving all the time, so we are starting to have really good development tools for parallel GPU processing.
 
Not sure that's totally valid. GPU's only have massive throughput through massive parallelism. If you want to perform the same maths on 16 'objects' at once, great. But if you want to perform different maths on 8 'objects', until individual shader units in a GPU can process independent you're better off with highly clocked CPUs. I'm thinking particularly of audio synthesis as an example here, something I've long wanted to implement on Cell as a musical instrument. Each sound is typically a very linear set of calculations which doesn't strike me as a good fit for GPGPUs. What little I've seen about GPGPU in audio is things like spatial modeling, but not waveform generation.
 
Not sure that's totally valid. GPU's only have massive throughput through massive parallelism. If you want to perform the same maths on 16 'objects' at once, great. But if you want to perform different maths on 8 'objects', until individual shader units in a GPU can process independent you're better off with highly clocked CPUs.
Agreed. GPUs are only good for massively parallel operations (thousands of threads). To mask the memory latency you need much more threads than you have shader cores. If you have thousands of objects (physics enabled debris for example), then GPU processing is a pretty good idea. But running a single shader program for only 16 threads (16 objects) is basically just going to cause a big pipeline bubble (lots of wasted GPU cycles).

CPU is definetely better for serial (single thread) operations, where you apply same code only for small instance of data, or you have high data dependancy for previous calculations. Many algorithms however can be serialized very efficiently. Parallel algorithm design is still a bit of a dark science compared to the well known serial algorithm design. But luckily a lot of parallel algorithm research has been already done for supercomputers (starting from early 60s), and many of the same algorihms are indeed pretty easy to port for GPUs (and even perform very nicely). It just requires a pretty different mindset, as every algorithm step needs to be parallelized (single GPU threads are very weak alone).
 
I want an RPG where NPCs have daily schedules like in Ultima 7. The bakerman wakes up, goes to his bakery, puts the flour and water together to make bread, then you can buy it from him. Rob his house while he's away, kill him in his sleep, or make bread yourself.
It's been what, 20 years ago? I realize it's a lot more complex to do with 3D graphics and skeletial animation and speech and sound effects - but still, that world felt so much more alive than a lot of stuff we have today. And every time there's a choice, it's always about remaking the same dumb NPC with more polygons and textures and whatever else contributes to the looks, instead of trying to create the illusion of life through actions.
 
Back
Top