http://gdcvault.com/play/1022186/Parallelizing-the-Naughty-Dog-Engine
Vidéo of GDC 2015 conference about paralleling the Naughty Dog engine with fibers.
Great stuff, my summary :
- All code is jobified (yes all) except slow I/O resources that are usual system threads. All jobs can yield to others jobs in middle of execution
- Max 160 fibers (fiber = partial thread + stack) can be used at the same time
- PROBLEM: 3 months before TLOUR release they were running at ~25fps, heavily CPU bound because of locks the cores were far from being 100% used. (GPU not a problem at all)
- SOLUTION: to use near 100% of the cores with jobs (as they did apparently judging by their graphs), cut one frame in 3 parts (game logic, rendering logic, GPU exec) and render 3 consecutive frames simultaneously during the 16.6 ms frame time: see 33:17 on video
- Frame centric design to simplify doing several frames simultaneously, new concept of frames with uncontended resources. Up to 16 frames tracked max (only states not data),
- PROBLEM: this eats a lot of memory! They are quickly running out of memory...
- SOLUTION: Tagged heap using only 2 MB blocks, very technical here...
- INPUT LAG: ~66ms?, (3* 16.6 ms frame-time + scan out, not sure here, it's at 54:25 if someone wants to help me) but still shorter input lag than TLOU on PS3
- Code your own fibers library! 5 or 6 functions max. "I am not a fan of PS4 fibers library...do your own fibers library"
- They paid no attention at all on cache coherency ("sounds weird coming from us"
), they were 100% focused on keeping cores busy which is similar to Infamous SS devs complaints about CPU, that they similarly had trouble keeping cores busy but apparently Naughty dog found better solutions).