DICE's Frostbite 2

Let's not get too caught up in the Engine Wars folks...

Clearly engines are going to be pushing the hardware in different ways to achieve different end goals. Whether that's immediately tangible in terms of what you see on the screen is going to be the trade-off.
 
Looking at the number of SPU code permutations, it seems like they're doing a huge amount of work on the SPUs, and it says they're doing post-processing effects (I'm assuming depth-of-field etc) on the GPU rather than the SPUs. Most PS3 games have seemed to go the opposite route, offloading post-processing effects onto the SPUs. I guess I'm curious to know what the reasoning was to offload shading to the SPU rather than what's "normally" done by the likes of PS3 exclusives.

Probably because it matches their run-time better ? The SPUs may be busy with another frame at the same time. I remember RSX is pretty capable at loads of pixel work too.


Yes, this is the only part of the "comparison" that's interesting to me. I'd like to understand why dice felt material shading etc was more suited for the SPU than the GPU, where other developers, even first parties, have offloaded different things. That part isn't explained so much in the slides. They basically just say they need to offload the GPU to push the overal graphics presentation.

I think in some subsystems, they have it running on both GPU and CPU. So it may be up to the specific games to decide which route suits them better.
 
Probably because it matches their run-time better ? The SPUs may be busy with another frame at the same time. I remember RSX is pretty capable at loads of pixel work too.


I think in some subsystems, they have it running on both GPU and CPU. So it may be up to the specific games to decide which route suits them better.

Looking back at the slides for "Parallel Futures of a Game Engine" (http://www.slideshare.net/repii/parallel-futures-of-a-game-engine-2478448), I imagine that the shading code fits the description of "stateless pure functions, no side effects, data-oriented."
 
Yes, this is the only part of the "comparison" that's interesting to me. I'd like to understand why dice felt material shading etc was more suited for the SPU than the GPU, where other developers, even first parties, have offloaded different things. That part isn't explained so much in the slides. They basically just say they need to offload the GPU to push the overal graphics presentation.

It'll have to come down to the individual game, as the shift to SPUs does incur a memory cost, afterall. In the case of CryEngine 3, they have the whole lighting chain from RSM generation to LPV rendering into a 3D texture etc to consider as well. The dependency there for the final accumulation might be a headache (I'm not sure).
 
Looking back at the slides for "Parallel Futures of a Game Engine" (http://www.slideshare.net/repii/parallel-futures-of-a-game-engine-2478448), I imagine that the shading code fits the description of "stateless pure functions, no side effects, data-oriented."

As in just input and output ? I think so. As long as the data can be "streamed" in chunks systematically to process, it should be fine.

The other possible considerations are the source and destination data location (in system or video RAM), accuracy and complexity needs (branchy ?), whether RSX/SPU is more busy at that moment (and near future), plus what's the next thing to do ?
 
Yeah... the first parties have been doing stuff like this since U2 and KZ2, may be earlier. The SPU culling for example has been around since the beginning (It's one of the earliest Edge tools). But they refined and matched it up with other tech here. Then they throw in the artists to make levels suitable for the engine characteristics and design direction.

It bears repeating that 3rd parties have been doing it for *ages* as well. In fact some of them did it (gasp) before ND or any first party on Ps3 did. I wonder if people find that surprising. I still have no clue why 3rd parties continue to be thrown under the bus, they are just as competent and capable as anyone else. Maybe this is a lesson in marketting, because the 1st parties constantly trump their code publicly whereas 3rd parties who do much of the same stuff don't, and as a result you have the "don't use the spu's" type of nonsense recurring over and over again.


At the presentation they said that they did it this way because rendering directly into system memory was so slow that it justified spending the time on the copy from local memory. I'm guessing they didn't bother transfering the baked lighting output because they're already summing the results of the SPU deferred pass with the results of lights calculated by the GPU, so there was no need to copy it across.

Later on Matt Swoboda was saying during his PhyreEngine talk that they had measured local->system memory transfers for a 1280x720 buffer to be 0.7ms, and he wasn't sure how they were doing it in only 1.3ms. Either way they must have hit a serious performance cliff with 4xMRT in local memory. I've never tried more than 2 simultaneous RT's on PS3, personally.

Interesting, yeah 1.2ms does sound way too long hence why I presumed they were packing in data from that discarded buffer into the other ones (ie, doing more than just a mem copy), and also because resolves on 360 are around 0.4ms or so, hence that 1.2ms figure seemed way out there. I'm curious to see what tile size they go with on pc and 360 as well, or if they need to tile at all. There is no choice but to tile it on ps3 to make it useable on spu, but seems like there would be some redundency that way. There must be an optimal processing size for a given platform, and 64x64 seems in the ballpark for ps3, I don't think they can go much bigger than that. On 360 I guess it depends if they are doing this on vmx or gpu, or a bit of both. They are keeping a fully functioning rsx implementation around (as 3rd parties have already been doing for years) which means they already have a fully working version on 360 gpu. So what will they do? I wonder if they will move the simplest part of the process to vmx in small 32x32 tile size (or whatever works best with its register set for speed), and in parallel do the rest on gpu with a far larger tile size, or perhaps with no tiling at all. Really curious to see the different implementations.

Regarding mrt, yeah I had never done 4 either, just two with one in xdr and the other in gddr. I was told once though by another coder that if you do 4xmrt on ps3 that you just need to put two buffers in xdr, one at the top of xdr memory and one at the bottom, and the other two in gddr in the same way, one at the bottom and one at the top, to maximize speed. I never looked into that to see if that was just an old wives tale, or if there was any truth to it regarding speed benefits.


And I feel happily vindicated, as I always believed programmability and clever developers would push the envelope. ;)

Don't feel too vindicated :) You'll always be able to do more with fully programmable compared to fixed function, I don't think anyone ever disputed that. But fully programmable though means less performance/watt compared to fixed function so it's not like it's a silver bullet. The stuff you see taking many milliseconds on spu could be done far faster with dedicated hardware, and use less power in the process.
 
All the deferred shading stuff from Dice looks like it's been done by a single programmer and she's been hired about a year ago...
 
It bears repeating that 3rd parties have been doing it for *ages* as well. In fact some of them did it (gasp) before ND or any first party on Ps3 did. I wonder if people find that surprising. I still have no clue why 3rd parties continue to be thrown under the bus, they are just as competent and capable as anyone else. Maybe this is a lesson in marketting, because the 1st parties constantly trump their code publicly whereas 3rd parties who do much of the same stuff don't, and as a result you have the "don't use the spu's" type of nonsense recurring over and over again.

True, but based on the flow of discussion, we already know third parties like DICE are doing it. ^_^

Not sure about throwing third parties under the bus... We also knew everyone, including the third parties, are "forced" to use SPUs since day one because PPU performance is low and RSX has vertex limit. So everyone has to use the SPUs one way or another. That doesn't necessarily mean all would champion its use though. Third parties may have multiple platforms to worry about after all. Their goal is platform parity.

OTOH, Sony is on the hook to deliver Edge tools to help all PS3 developers (Low-level animation, SPU culling, ...). Deferred rendering and SPU post processing were heavily featured in KZ2 first. MLAA also debuted on GoW3. Although Sabotuer has another custom SPU AA that's pretty good too but it seems to be applied late in the dev cycle. OTOH, DICE's SPU occlusion culling is earlier than KZ3's. So perhaps over time, some third parties are starting to differentiate themselves on the SPUs. It looks like Sony is pushing for stereoscopic 3D this cycle so far.
 
I think still have tons of work to do/reveal for a real game. The content needs to match up with the engine capability. Curious about their streaming system too.
 
I think still have tons of work to do/reveal for a real game. The content needs to match up with the engine capability. Curious about their streaming system too.

Yeah. They haven't said anything about the virtual texturing system yet, which was also mentioned in their slides. It would be interesting to see how it compares to Brink (in a non-combative manner). Obviously it's quite different from megatexture with the destructible environment.

The short gameplay that was shown looked fairly polished, if brief. They didn't show the destruction yet, with the exception of that building falling over, but there weren't any clips of walls or structures being chipped away or broken.
 
Someone needs to do a distributed PS3 renderer (for me ! for me !). :devilish:

The Cell programming model would be a good fit.
 
There's supposed to be virtual texturing as well - remember how a PR guy said that their tech allows them for the game to look as good on the consoles as on the PC, even with limited memory? Also, the SPU slides also seem to point at it.
 
Don't know about most developers, but it is possible to end up with a smaller PS3 team even if it needs more development effort. You don't always get the resources you need. OTOH, more people doesn't always mean more success. You need the "right" people to build a good PS3 team.

Yah know, maybe, maybe I would have believed this in the early days of the PS3 when it was being outsold by the GBA. However for the last 3+ years we've seen solid effort into ps3 development, outside of the random lesser ports. So I HIGHLY doubt this is the case even now.

Here's a quote about the development of the MT Framework:

The PC/360 version was developed by 3 people and then 5, and for the PS3 version 4 people were added.

So they start with 4, then 5 while working on the PC/360 version, then add an additional 4, making that a total of 9 people for the ps3 version versus 5 for the pc/360. And this can be found in this forum in the game developer presentation thread. This was back in 2007 and a good example to what I believe was and still is the norm.
 
Hmm, would collecting those numbers really mean anything? Doesn't the efficiency depend how the data is stored and organized, and at what precision, which would vary by game?
 
Repi's DX11 paper is now up at: http://publications.dice.se/

Well, one of the slides explicitly mentions Frostbite 2 for future EA titles. Is it possible that one day I'll be playing a sports game running on the Frostbite engine?

Computer shaders: 16x16 thread group for each tile? 1 thread per pixel? Do modern GPUs really have this kind of threading performance to handle the overhead of threading with such a small granularity? I guess the answer is yes?
 
That could be interesting we start to get a pretty nice recollecting of run times for various graphical process on Cell (thanks to studios like Dice and Sony exclusive studios).

That's useless because all the timings given are heavily asset and load dependent. It's not like we could extrapolate from BF3's timings to C2 timings or vice versa.
 
So they start with 4, then 5 while working on the PC/360 version, then add an additional 4, making that a total of 9 people for the ps3 version versus 5 for the pc/360. And this can be found in this forum in the game developer presentation thread. This was back in 2007 and a good example to what I believe was and still is the norm.

*Shrug* We won't know until we look at what those 4 PS3 developers were doing. They may need to undo existing work, and develop PS3 specific components from scratch. Sometimes they may also need to "live" with early decisions that may not make sense for PS3.
 
I've almost spilled my drink upon seeing the first terrain screenshot ;)
Very impressive, especially considering how much trouble we have building any kind of terrain, we usually use per-shot matte paintings instead because it's so complicated to get good results...
 
Back
Top