Lighting of Killzone: Shadow Fall Slides

Too bad they didn't talk about the audio h/w given that audio data occupies the largest chunk in system memory.

Wondering what they do with the remaining 2 CPU cores.

I remember some fleeting rumor on gaf that sony was disabling a core for yield. i kind of doubt it since i've not heard anything like that for durango, but it's a thought. 1 disabled 1 os, or two for os...

And I guess not every hardware decision between 720/PS4 needs to be the same anyway.
 
Check some of the items where they show time in cycles and microseconds.
The simplest math is when they highlight a single item on one core, like the sAllocatorMutex::Lock slide.
It's 68,233 cycles /(1/1,000,000*42 second) = ~ 1.6 billion cycles/second.

edit:
It works the same for the soldier update slide, which has multiple packets that are handily summed up for us.

Thanks, makes sense and it was nigh impossible to view on mobile.
 
Slide 11
Our approach on PS4 is similar to what we did to PS3.

We’re using our own job scheduler to distribute the load over 6 CPU’s.

A lot of the old SPU jobs easily fell into place, but it was mostly the rendering code.

We spent most of the time porting gamecode and AI to jobs and then a lot of time fixing crashes.

As of now all of our code now runs in parallel using jobs.
Where do you read that? It's not in the pdf linked in this thread.

Edit: Is it in that 582MB sized Powerpoint presentation you linked in the other thread?
Edit2: Indeed, the slide itself claims they are distributing the jobs over all cores (as in the pdf), but the annotations say that this means actually 6 threads.
I based my sentence above on the screnshots of the profiler, which shows only 6 threads.
 
Last edited by a moderator:
Their highlights.

‣PS4 is really easy to program for!
‣Wide multithreading is a must, consider using jobs
‣Be nice to the OS thread scheduler and avoid spinlocks
GPU is really fast!
‣Watch your vertex shader outputs
‣Don‟t be afraid of using conditionals
‣GDDR5 bandwidth is awesome!
‣If you map your memory properly
‣Use the smallest pixelformat for the job
‣Use compute (and tell us about your experiences)
 
Last edited by a moderator:
I'm more surprised to see only 670k polys with lighting in city landscape scene.
Also only 1700 drawcalls for GPU in the same scene.
And 1000 jobs seems low.

---
Michael Valient is the new Carmack...

Lol? In what way?
 
Last edited by a moderator:
Michael Valient is one of the nowadays top engine developers . And he makes the best technical presentations!.
 
Last edited by a moderator:
I doubt we can use a static shot of some dirt to conclude we have reached diminishing returns at the beginning of next gen. ^_^

It's like focusing on Kate Upton's (or Cindy Crawford's) mole, and then conclude that all women look like crap.

Probably need to look at how things move and "communicate".


Nice slides btw. I don't understand all of it though. ^_^
At least it's clear how they tally up the memory map size.

Also confirmed that they moved from the mostly single threaded PPU code to the now jobs-oriented multi-threaded CPU code.

The real beauty is in the microdetail . The problem is that ugliness too! :)
 
Last edited by a moderator:
Diminishing in comparison to PS2 to PS3 rubble pile jump? :) Could be, yes. But I think it's still a nice upgrade. The pic doesn't show it well since it's downsized and with the KZ3 asset rendered in the KZSF engine too. Walk up close to that pile and it'll become clear imo. That's what we should all hope and expect of next gen, more details!

I agree.
Though I am a "detail lover" so maybe I am biased ;)
 
Last edited by a moderator:
Correct me if I'm mistaken here, but from those images up above it's looking like the KZ:SF demo is using 4.5GB total?

So.. Does this mean Guerilla had access to 8GB dev kits or what's going on? Nowhere near 8GB of course, but its well above 4GB.

http://www.vgleaks.com/orbis-devkits-roadmaptypes/

Even the initial devkit had 8GB of RAM, i'm sure they developed the game with 4GB in mind, but now they are not optimizing it that much. :LOL:

Edit: Not 8 but 10.2 GB (System + Graphics Memory)
 
Only 670k poly for the environment, it's do able on current gen. Infact Uncharted games pull a little over a million poly per frame for environment.

Also that rubble in Killzone 3 looks like this:
http://abload.de/img/killzone36bsi0.png

You can see how high poly count is needed here, so there are no diminishing returns.
 
Only 670k poly for the environment, it's do able on current gen. Infact Uncharted games pull a little over a million poly per frame for environment.

Also that rubble in Killzone 3 looks like this:
http://abload.de/img/killzone36bsi0.png

You can see how high poly count is needed here, so there are no diminishing returns.
I was quite pleased to see that KZ4 rubble didn't use displacement mapping.
Properly used polygons can look a lot better with a lot better shader efficiency,
 
Shifty, we have a very similiar discussion split across 2 threads because the content was linked to in both. Anyway we can combine it into 1 (perhaps the other Killzone presentation specific thread)?

Where do you read that? It's not in the pdf linked in this thread.

Edit: Is it in that 582MB sized Powerpoint presentation you linked in the other thread?
Edit2: Indeed, the slide itself claims they are distributing the jobs over all cores (as in the pdf), but the annotations say that this means actually 6 threads.
I based my sentence above on the screnshots of the profiler, which shows only 6 threads.

Yes sir. The full PPTX which has some very interesting little videos of things like their particle system demonstrating what they refer to as force fields. I have no idea if that's how its normally done by other devs but seeing it in action was very cool, IMO.

http://www.vgleaks.com/orbis-devkits-roadmaptypes/

Even the initial devkit had 8GB of RAM, i'm sure they developed the game with 4GB in mind, but now they are not optimizing it that much. :LOL:

Edit: Not 8 but 10.2 GB (System + Graphics Memory)

I have a sneaking suspicion that whether the retail PS4 was going to have 4 or 8 GB total, the demo code they ran on the devkits would have been the same. Current lack of memory optimization seems to a common theme on some slides.

Slide 9
As you see on on the slide, we have an alarming amount of render targets in the demo.

I honestly have no idea what’s in there but it looks like a good thing to dive into.

Slide 7
This is all we have in the system memory.

Sound is a mix of compressed and uncompressed in-memory samples, there is much more sound streaming from disk.

Physics meshes used to be a size issue on previous platforms.

Now they’ve become smaller and the memory is so much bigger they’re insignificant.

We still have to find out what goes into ‘various’, we haven’t really optimized for memory yet
 
Last edited by a moderator:
Digital Foundry has their own analysis of the Postmortem up. They also have their own speculation about the job scheduler.

Guerrilla has evolved the model it developed for PlayStation 3 - it has one thread set up as an "orchestrator" (this would have been the PPU on the PS3), scheduling tasks which are then parallelised over every core. This is the so-called "jobs-based" technique that was used in a great many current-gen titles in order to make the most of the 360's six threads and the PS3's six available SPUs. In going "wide" across many cores, Guerrilla has upped the ante: 80 percent of rendering code was "jobified" on PS3, 10 per cent of game logic and 20 per cent of AI code. On PS4, those stats rise to 90 per cent, 80 per cent and 80 per cent respectively.

Interestingly, Guerrilla's presentation explicitly refers to "every" core being used, but the screenshots of the profiling tools - developed by the team itself owing to the work-in-progress nature of Sony's own analysis software - only seems to be explicitly identifying five worker threads. As of right now, we have no real idea of how much CPU time the PS4's new operating system sucks up and how much is left to game developers, and we understand that the system reservation is up in the air. However, the profiling tool shows that in the here and now there are indeed five workers threads, plus the "orchestrator" and each of them is locked to a single core. The inference we can draw right now is that while OS reservation hasn't been locked down, developers have access to at least six of the eight cores of the PS4's CPU.

I didn't' realize one of the threads out of the six was just the orchestrator (leaving 5 worker threads).
 
Digital Foundry has their own analysis of the Postmortem up. They also have their own speculation about the job scheduler.

I didn't' realize one of the threads out of the six was just the orchestrator (leaving 5 worker threads).
They are actually labeled worker0 to worker5, so eurogamer should check their counting abilities. ;)
If one looks a bit more closely, the job scheduler uses only tiny amounts of resources, it schedules also jobs to its "own" core.
 
They are actually labeled worker0 to worker5, so eurogamer should check their counting abilities. ;)
If one looks a bit more closely, the job scheduler uses only tiny amounts of resources, it schedules also jobs to its "own" core.

Ah, now I see what you mean (didn't notice that before). I take it you're referring to that second tier of tasks under the manager and scheduling jobs for Worker 0? You might want to shoot Richard or DF a tweet. He may have simply missed that.
 
They are actually labeled worker0 to worker5, so eurogamer should check their counting abilities. ;)
If one looks a bit more closely, the job scheduler uses only tiny amounts of resources, it schedules also jobs to its "own" core.
Yeah... it would be silly to have one thread dedicated only for scheduling. Of course it schedules tasks for itself as well :)
 
They should add CryEngine to physical-based engines list too :p There is even new art-tech presentation that focuses on this quite much.
 
Probably a confusion in terminology. The orchestrator may just be the main thread. It coordinates the execution of (high level) app/game activities. The scheduler may run on all threads in a "time sliced" fashion. Schedulers are typically very light weight. All 6 threads (including main thread) can be scheduled to run jobs as long as they are free. Some jobs may be "bound" to specific thread(s).
 
I think we will see a lot of jobs move to the gpu over the life of the console much like we saw things being move onto the SPU in cell.

Good read!
 
Back
Top