Digital Foundry Article Technical Discussion Archive [2014]

Status
Not open for further replies.
Face-Off: Final Fantasy 14: A Realm Reborn on PS4

http://www.eurogamer.net/articles/digitalfoundry-2014-final-fantasy-14-ps4-face-off

Previously, producer/director Naoki Yoshida has said that the developers were targeting a native 1080p presentation for this PS4 version, with similar graphics quality to the PC version running on maximum settings. Taking a look at the framebuffer, we can indeed confirm a full HD resolution, backed up by a fairly standard FXAA implementation.

Image quality is a match for the PC version, right down to the slight texture blur and shimmering across sub-pixel elements of the scene. The pixel precision afforded by 1080p ensures scenery and characters in the near field appear reasonably clean and well-presented, although it fails to prevent the appearance of jaggies elsewhere.

All the niceties are in the right place, at least. Texture detail, character modelling and the majority of the effects work - from smoke and particles to reflections and transparencies - all appear identical across PS4 and PC, with both formats delivering a sharper and vastly more detailed representation of the game world than the murky-looking PS3 release. This allows fine details and intricate artwork to come through virtually unscathed by comparison, resulting in a huge visual advantage over the last-gen console.

The PC version operates with 16x AF in our shots, while reduced effect on PS4 leads to texture details becoming blurred when viewed from sharp angles, although otherwise the artwork remains relatively crisp and clear.

A Realm Reborn does hit 60fps in some circumstances, but the truth is it's inconsistent. For the most part the game averages 30-45fps across an extended gameplay session, very occasionally dropping into the twenties during large battles featuring many players. V-sync staves off screen-tear though, and even this uneven frame-rate is miles higher than PS3, with the knock-on effect that control is much more responsive and there's less judder all round.
 
I wonder why they didn't mention broken texturefiltering on the game.
Using 16xAF makes skenery a flickering mess.
Certainly preferred to use pure trilinear as it was almost stable.
 
I wonder if a SSD would help PS4/XB1 with streaming draw issues? Anyone here using a SSD (PS4/XB1), with Trials Fusion? Outcome?

Both of my PCs use SSD ...can't picture myself using nothing less (gaming wise), other than massive storage needs with standard HDD.
 
I wonder if a SSD would help PS4/XB1 with streaming draw issues? Anyone here using a SSD (PS4/XB1), with Trials Fusion? Outcome?
Xbox 360 streaming is faster if you install the game on USB stick (same is true for Trials Evolution and Fusion). Haven't tested SSDs on next gen consoles.

I am also improving our last level (compressed) macroblock caches and prefetching logic to a future patch. The first patch already improved the streaming performance (reviewers didn't have the patch). Basically you only see blurry textures on full restarts.
 
sebbbi, would you be able to tell use whether you're now mainly ROP limited on the X1? Or is there fill headroom to potentially increase resolution further in future revisions of the game?
 
Xbox 360 streaming is faster if you install the game on USB stick (same is true for Trials Evolution and Fusion). Haven't tested SSDs on next gen consoles.

I am also improving our last level (compressed) macroblock caches and prefetching logic to a future patch. The first patch already improved the streaming performance (reviewers didn't have the patch). Basically you only see blurry textures on full restarts.

Sounds good. I might purchase an SSD for my PS4 later this week and see how things pan out.
 
sebbbi, would you be able to tell use whether you're now mainly ROP limited on the X1? Or is there fill headroom to potentially increase resolution further in future revisions of the game?
ROPs don't matter that much anymore in modern graphics rendering. On any modern GPU, you should use compute shaders for anything else than triangle rasterization. Compute shaders write directly to memory (instead of using ROPs). I would estimate that a future engine would use at least 2/3 of the GPU time running compute shaders (for graphics rendering). Of course that remaining 1/3 of the frame might be partially ROP bound. However AMD GCN is quite good in g-buffer rendering, as it has full rate 64 bit ROP output. This allows you to bit pack two 32 bit g-buffers to a single 64 bit g-buffer and practically double your ROP rate (compared to other architectures). Bit packing instructions are also very fast (shift + and/or in a single cycle). HDR particle rendering hasn't been ROP bound on any modern GPU (HDR particle rendering is always bandwidth bound, even on 290X and Titan). That leaves us with shadow maps... Yes, shadow maps are often ROP bound and/or geometry bound, depending on your content and shadow map resolution. However in our case (Trials Fusion) we already use exactly the same shadow map resolution on both Xbox One and PS4. So we can definitely manage that pretty well.

Disclaimer: Some of the info above about compute shaders doesn't hold for Trials Fusion. In a cross-generation game you need to do some compromises to support DirectX 10(.1) PCs and last generation consoles. This mainly means that we use pixel shaders for some of the post processing steps. However only a very small percentage of pixel shaders are ROP bound. Do anything more complex in the pixel shader and you are automatically bound by something else than ROPs. ROP bound shaders need to be very simple (shadow map rendering is a good example). Good way to fight against being ROP (and BW) bound is to combine multiple effects into a single shader. For example our post processing shader does tone mapping, exposure adjustment, color tinting, color correction (programmable 3d lookup for saturation / contrast / selective colorization / sepia / filmic effects / etc), gamma, bloom combine, DOF combine, tone mapping and color space conversions, all combined to a single shader. The tile based lighting does basically the same thing for lighting (combines all lights in a single pass). This is really efficient for bandwidth and is never ROP bound.
 
sebbbi:
In the DF articles it mentions that you leveraged your virtual texturing system, something you implemented in the previous Trials from the other interviews with you in them.

For Fusion [assuming time to learn or build was not a factor] would have leveraging hardware PRT/Tiled Resources as opposed to leveraging your own virtual texturing system resulted in any performance differences?
 
Anyone here using a SSD (PS4/XB1), with Trials Fusion?

Yes, a 256GB Samsung 830 Series in the PS4.


I don't have anything to compare it to, as I have no before and after. Give me a video along with the name of the course its showing and I can try to do a comparison. Better still, I can take a video of my own and post that to Youtube when the next update hits. Then you can compare the non-SSD video to the SSD video for yourself.

Only sebbi can tell you if texture streaming is bandwidth bound (which the SSD probably won't fix since as I understand it there's a soft cap on the transfer rate) or if seek time is the bigger limiting factor (which an SSD should help address nicely). Or if there's some other, bigger, limiting factor altogether.
 
Last edited by a moderator:
Yes, a 256GB Samsung 830 Series in the PS4.
I'm very interested in the results. I dropped a 1Tb 7200rpm HDD in mine on day one so I'm keen to se the disk I/O performance deltas between standard 5400rpm, 7200rpm, hybrid and SSD drives.
 
Xbox 360 streaming is faster if you install the game on USB stick (same is true for Trials Evolution and Fusion). Haven't tested SSDs on next gen consoles.

Is level loading in yours game optimised for sequential read of one big file (aka HDD friendly) or is it lot of small files where SSD can really show its latency advantage?
 
Is level loading in yours game optimised for sequential read of one big file (aka HDD friendly) or is it lot of small files where SSD can really show its latency advantage?

Even a single big file can require a good many seeks once the hard drive has fragmented enough.
 
1) That shouldn't be an issue for new XB1s.
2) I hope the consoles have decent fragmentation reduction. It's not really an issue with the occasional big file added. Perhaps background defrag of messed-up drives while doing low-stress activities?
 
Is level loading in yours game optimised for sequential read of one big file (aka HDD friendly) or is it lot of small files where SSD can really show its latency advantage?
All our levels are created inside the game using exactly the same in-game level editor as the users are using (to create user created content). The level files are usually around 50 KB in size, and contain all the game logic info (triggers/events) + object/camera placement + animation. We load the 50 KB file in one go, build data structures based on that, and then we start to simulate. We keep the loading screen on as long as all our streaming subsystems tell us that they are finished. We stream textures, meshes, objects, terrain heigthmap, etc all on fly. Starting a new level is not much different than warping the camera to a new location in the world. Except that during level load we ensure that everything is streamed at full quality before we start, while a camera jump might show some blurry textures or lower LOD objects occasionally.

It needs to be done in this way, because the levels don't exist offline, so there's no way of optimizing the package layout based on level file data usage patterns. There's already over 10k user created levels, and obviously we can't optimize layout based on data that doesn't exist at the game shipping time. But this isn't a big problem for us, the loading times of the levels are usually around 5 seconds, with the exception of the first level. First level loading takes usually more time (~10 seconds), because it loads the bike and the rider, etc stuff that stays in the caches when you switch from one level to another.

SSD does help with loading times, because the data access patterns can be quite erratic. We however do analyze the data loading patterns at runtime and order the loading operations to reduce hard drive seek times. So we get quite close to the performance of replicating all level data to a big linear chunk without needing to spend that extra storage space or restrict our data sets in any way. Every single level can access all our data (10 GB worth of uncompressed textures/meshes if the level designers chooses so, there's no texture/mesh memory budget limits at all). This allows our level designers go crazy... for example in Trials Evolution we had one level (called Gigatrack) that circulated the whole 8 square kilometer game world (Fusion game world is twice as big, 16 sq kilometers).
 
I've been wondering if you have plans to make other genres of games with your engine? I'm not very good at the game but think it's quite beautiful and has quite good tech.
 
For Fusion [assuming time to learn or build was not a factor] would have leveraging hardware PRT/Tiled Resources as opposed to leveraging your own virtual texturing system resulted in any performance differences?
Hardware PRT only implements sampling from a sparse texture, it doesn't offer you a complete virtual texturing system. You still need to analyze the scene (texture pages needed), copy the page list back to CPU (and remove duplicates), update the GPU data, load from the HDD, manage the HDD data (optimize access patterns), etc yourself (basically all the hard parts). With hardware PRT, you save around 4 ALU instructions (compared to the optimal shader based indirection+sampling implementation). This is only a minor gain in performance, and in our case we weren't ALU bound in our g-buffer rendering stage, so there wasn't a gain at all. The biggest thing with hardware PRT is that you don't need to add borders (for filtering) to your virtual texture pages. This simplifies many things considerably (and even saves HDD space), but it also means that hardware PRT needs a completely separate data set from software virtual texturing. This is not good for content production, since PC DirectX doesn't support hardware PRT (DX 11.2 does, but it requires Windows 8). We didn't want to double our virtual texture export times, just to support a feature that didn't bring any noticeable performance gains for us.

One big plus about software virtual texturing is that you can do the indirection (virtual->physical address mapping) whenever you want. With hardware PRT you never see the physical address in the shader. Physical addresses need less bits to store. This is very important if you need to store UVs to the virtual texture somewhere.
 
I've been wondering if you have plans to make other genres of games with your engine? I'm not very good at the game but think it's quite beautiful and has quite good tech.
Players have created FPS games, racing games, etc already in Trials Evolution (the older Xbox 360 Trials game).

FPS: http://www.youtube.com/watch?v=pd1OgUHe3OQ
Car games: http://www.youtube.com/watch?v=Tdf1VS-FU_w

I am eagerly waiting what the players do with the new (more advanced) editor and the next gen consoles in Trials Fusion.
 
Last edited by a moderator:
Did you opt for EVSM again? (or full implementation vs Evo's lower precision on 360).

It looked like you were updating the SMs depending on distance too - could PC users get full speed update option? :)

Would SDSM* have been too expensive :?:
*sample distribution shadow maps

edit:

Actually, what's going on with the shadow cascades here :?:

http://cfa.gamer-network.net/2013/a.../5/360_014.bmp.jpg/EG11/quality/80/format/jpg
http://cfa.gamer-network.net/2013/a.../5/PS4_014.bmp.jpg/EG11/quality/80/format/jpg
 
. Do anything more complex in the pixel shader and you are automatically bound by something else than ROPs ... Good way to fight against being ROP (and BW) bound is to combine multiple effects into a single shader.
Avalanche studio seems to agree with this, in GDC they basically said the only way not to be ROP bound is to use long shaders. for short shaders they suggested the use of compute shaders to bypass ROBs completely and write straight through UAVs.

As hardware has gotten increasingly more powerful over the years, some parts of it has lagged behind. The number of ROPs (i.e. how many pixels we can output per clock) remains
very low. While this reflects typical use cases where the shader is reasonably long, it may limit the performance of short shaders. Unless the output format is wide, we are not even theoretically capable of using the full bandwidth available. For the HD7970 we need a 128bit format to become bandwidth bound. For the PS4 64bit would suffice.

The solution is to use a compute shader. Writing through a UAV bypasses the ROPs and goes straight to memory. This solution obviously does not apply to all sorts of rendering, for one we are skipping the entire graphics pipeline as well on which we still depend for most normal rendering. However, in the cases where it applies it can certainly result in a substantial performance increase. Cases where we are initializing textures to something else than a constant color, simple post-effects, this would be useful.
http://www.humus.name/Articles/Perss...timization.pdf
 
Status
Not open for further replies.
Back
Top