Digital Foundry Article Technical Discussion Archive [2014]

Status
Not open for further replies.
Someone tweeted the issue to one of the system's designers at Sony, he said there was nothing he could think of in the system itself that would limit the use of AF, that it must be something the individual devs are doing. But no one seems to have asked the devs themselves what the issue is.

I have a GTX-680, but even on older cards, the impact of AF was pretty much non-existent. Even cranking it to 16x resulted in maybe 1.5-2fps drop. Every other setting has a significantly more noticeable impact than AF, it's the one option I thought would be a given for every next-gen title, especially considering how jarring it can be to the visual quality when you're not used to it (from being a PC gamer).
 
Every modern video compression codec already heavily relies on reprojection from last frame data, and only stores full frames very infrequently. This is how you achieve high quality at low bit rate. Video codecs usually only move data in 2d direction (while a 3d engine can move and reproject along any shape). This discussion is basically about using similar techniques for game rendering that video compression has used for long time. People tend to think that Blu-Ray looks goods (even at 24 fps and regenerating pretty much never more than 20% of pixels per frame), so using similar kinds of methods to save cost in real time graphics rendering is definitely a good idea, and shouldn't degrade image quality too much.
If I recall correctly, the GOP on blu-ray encodes can be as much as 48 frames. 2 seconds between full frame encodes.
 
Every modern video compression codec already heavily relies on reprojection from last frame data, and only stores full frames very infrequently. This is how you achieve high quality at low bit rate. Video codecs usually only move data in 2d direction (while a 3d engine can move and reproject along any shape). This discussion is basically about using similar techniques for game rendering that video compression has used for long time. People tend to think that Blu-Ray looks goods (even at 24 fps and regenerating pretty much never more than 20% of pixels per frame), so using similar kinds of methods to save cost in real time graphics rendering is definitely a good idea, and shouldn't degrade image quality too much.
If I recall correctly, the GOP on blu-ray encodes can be as much as 48 frames. 2 seconds between full frame encodes.
Though, the efficiencies and accuracies of video encodes are sort of a different case.

With video encodes, you have the "correct" high-fidelity target result on hand, and the encoding doesn't necessarily have to be done all that quickly.

With temporal reprojection in games, you don't have said "correct" target result on hand. You're using things like motion buffers to guess what it should have been, and it needs performance to fit into small time slices on a periodic render cycle.

Then of course there's the massive amount of fast camera motion that can happen constantly in some games.

I'll be pretty impressed if a game developer in the remotely near future manages to get even vaguely respectable results with only 20% new pixels per frame in a competitive FPS :smile:
 
But you'll also have a LOT of more information on hand, too. You know how the camera moved. You know the depth of any pixel on hand. Every correct motion vector, etc... All of this is not available in video encoding. Just the temporal in/difference of each pixel.

But let's not forget that we're enconding videos offline. You have "unlimited" time per frame to process it effectively, whereas you have 16 or 33 ms per frame in real time rendering. I don't think this generation of GPUs is fast enough to come even close to high end video encoding tricks like these. Or rather... the GPU doesn't just do this, but a lot of other stuff as well... there won't be enough left for "perfect" processing.
 
But you'll also have a LOT of more information on hand, too. You know how the camera moved. You know the depth of any pixel on hand. Every correct motion vector, etc... All of this is not available in video encoding. Just the temporal in/difference of each pixel.
But you don't actually need that extra information, because you have the target material. It might speed up the encoding process, but it certainly isn't necessary to getting a good result. The result is important, how you get there has a lot of leeway with processing time.

Conversely, if you had the target material in the video game case, you'd just output the target material. :D
 
Also, you don't have a compression target increasing encode complexity. The end result can use as much BW as you like, so the reconstruction algorithm can very easily be realtime. You just adapt the reconstruction algorithm to fit your available GPU resources and time-slice.
 
People tend to think that Blu-Ray looks goods (even at 24 fps and regenerating pretty much never more than 20% of pixels per frame)
What? I thought Blu-Ray feed is not compressed! why compress at all when you have that much space available to you?

If we assume we have a 90-min movie, 1080p @30 fps. With 32-bit color precision each frame is 7.9MB, then the amount of space needed is little more than 20GB for the raw uncompressed video data, much less if you went with 24-bit colors, add a few more GBs for the audio data and you could fit all the movie inside a 25GB disc. most discs are at least 50GBs nowadays, so why compress at all? playback performance?
 
What? I thought Blu-Ray feed is not compressed! why compress at all when you have that much space available to you?

If we assume we have a 90-min movie, 1080p @30 fps. With 32-bit color precision each frame is 7.9MB, then the amount of space needed is little more than 20GB for the raw uncompressed video data, much less if you went with 24-bit colors, add a few more GBs for the audio data and you could fit all the movie inside a 25GB disc. most discs are at least 50GBs nowadays, so why compress at all? playback performance?

Your math is way off.

1920x1080x3x24x60x90 yields about 750GB for 90 minute movie.
 
size of uncompressed video

8 bit @ 1920 x 1080 @ 24fps = 95 MB per/sec, or 334 GB per/hr.

I'm sure blu-ray feeds are much longer than a couple of minutes.
 
Sorry, must have gotten it all wrong, wrote the post before sleeping.

There is a 86400 frame in a 60 min movie, if each frame is 5.9MB (@24 bits), then the result is 509760MB of data, which is about 500GB. no way that could fit anywhere. Again sorry.
 
Even last gen consoles managed to interpolate / extrapolate entire frames with superb results, tho it never made it to a commercial game (Star wars dev had it running with success), I think the new-gen consoles are capable of doing it. That said, extrapolating entire-frames is possibly less compute intensive, tho, as you don't need to render any geometry or update everything every 16ms, but would be more prone to artefacts (e.g. overshooting at the end of big movements).
 
Sorry, must have gotten it all wrong, wrote the post before sleeping.

There is a 86400 frame in a 60 min movie, if each frame is 5.9MB (@24 bits), then the result is 509760MB of data, which is about 500GB. no way that could fit anywhere. Again sorry.

And some of the movies and especially the TV you see on Blu-Ray comes from 50MBIT MPEG2 recordings :)
 
If interlacing is used on Infamous as Alstrong may have hinted, along with Killzone. I wonder if it is the PS4's "extra compute" architecture in play. Guerrilla went out of their way to call out "computationally expensive".

Perhaps devs may see a way to leverage otherwise unused extra compute resources on PS4, to create pseudo 1080P, and shift the bottleneck to someplace else in the pipeline they'd rather it to be.
 
If interlacing is used on Infamous as Alstrong may have hinted, along with Killzone. I wonder if it is the PS4's "extra compute" architecture in play. Guerrilla went out of their way to call out "computationally expensive".

Perhaps devs may see a way to leverage otherwise unused extra compute resources on PS4, to create pseudo 1080P, and shift the bottleneck to someplace else in the pipeline they'd rather it to be.

I don't think it's interlacing in play. But it may be some form of temporal AA artifacts and/or heavy motion blur.

To the detriment of image quality. But that's OK. The more effects, the better, right?
 
Striped artefact could be easily removed from the final image by tweaking the interlacing parameters. Compare the selected pixel value to its neighbours. If they are similar and this value radically different, ditch it and lerp the two real pixel values either side. That'd of course have IQ issues, but my point is that interlaced artefacts themselves can't be relied upon to identify an interlaced reconstruction process. It's definitely possible that a game could used IIR (interlaced image reconstruction*) without producing stripes.

* and TIR = temporal image reconstruction for frame interpolation, and DIR would be dithered image reconstruction, etc. I suppose postFX AA would become Selective Edge Image Reconstruction.
 
Fair enough, but let's wait for proper captures before saying that it's "1920x540" for sure. ;) At the moment it might be temporal AA and that's just from some crap captures, which might be subjected to frame blending during encoding, so who knows. There's room for error here.
 
I'm curious which parts of the rendering process GG indicated ran faster with reprojection.

The geometry portion of the process isn't going to change much. The reduced resolution of any given frame may actually worsen SIMD utilization, since the reduced pixel density means that the rasterizer is going to be sending more pixel batches that have more of their stride straddling triangle edges.
Alternatively, geometric detail could be kept more coarse so that they keep above GCN's preferred >16 pixel triangle size.
Cutting the number of pixels being rasterized frame is probably a bigger fraction of the workload than fragment inefficiency.

I don't know if the half-width resolution allows the rasterizer to favor one particular orientation for how it steps through the screen space. Its footprint is rectangular, but I think it can go for a vertical or horizontal orientation. The higher vertical resolution might encourage the rasterizer to orient itself in that direction.

The math load of the reprojection versus the full pixel processing path is an unknown.
If there's a lot of complex surface shading going on, reprojecting might still cut out more math and texture memory references despite the multiple frame data reads.
However, in other situations where the frame has smaller fractions of the screen taken up by complex surfaces, it may be heavier.
On the other hand, and perhaps more critically, the load presented by reprojecting may be more predictable, since the portion of the workload that is highly variable is cut in half.

Reading motion vectors and multiple stored frames does consume bandwidth, but this should be a more regular sort of access that might be amenable to tiling or other cache bandwidth optimizations.
 
I'm curious which parts of the rendering process GG indicated ran faster with reprojection.

The geometry portion of the process isn't going to change much. The reduced resolution of any given frame may actually worsen SIMD utilization, since the reduced pixel density means that the rasterizer is going to be sending more pixel batches that have more of their stride straddling triangle edges.
Alternatively, geometric detail could be kept more coarse so that they keep above GCN's preferred >16 pixel triangle size.
Cutting the number of pixels being rasterized frame is probably a bigger fraction of the workload than fragment inefficiency.

I don't know if the half-width resolution allows the rasterizer to favor one particular orientation for how it steps through the screen space. Its footprint is rectangular, but I think it can go for a vertical or horizontal orientation. The higher vertical resolution might encourage the rasterizer to orient itself in that direction.

The math load of the reprojection versus the full pixel processing path is an unknown.
If there's a lot of complex surface shading going on, reprojecting might still cut out more math and texture memory references despite the multiple frame data reads.
However, in other situations where the frame has smaller fractions of the screen taken up by complex surfaces, it may be heavier.
On the other hand, and perhaps more critically, the load presented by reprojecting may be more predictable, since the portion of the workload that is highly variable is cut in half.

Reading motion vectors and multiple stored frames does consume bandwidth, but this should be a more regular sort of access that might be amenable to tiling or other cache bandwidth optimizations.

It just renders as normally at half resolution, I don't think the blog indicates any sort of method to step through the scene.

The motion compensation (or temporal reprojection) is trivial (relatively) because it's got the camera and movement vectors, so you are not looking to search motion vectors of pixels between 2 frames (this is very costly). Also because it's striped, so if the pixels from the old frame is too different from the new pixels, it just blends the new pixels values and ignore the old one. (and I believe being at 960x0180 makes this process trivial)

According to GG they use multiple past 960x1080 native frames are combined into one 1920x1080 frame, and with the current 960x1080 representing the most accurate data, taking precedence over old pixels and generate the 1920x1080 frame for output (which is then, re-used as the "background" for the next 960x1080 frame).

The saving comes from being to render less pixels (50% to be exactly, and trivial) per frame, where performing the temporal projection is much cheaper than render 50% more pixels.

If this is making any sense.

TLDR:
half resolution + temporal reprojection (moving old frame into position of the new one and combine) costs roughly 50% of render a 1920x1080p native frame. Reusing expensive pixels from past frames with motion compensation produce a good approximation of native 1080p rendering.
 
Last edited by a moderator:
It just renders as normally at half resolution, I don't think the blog indicates any sort of method to step through the scene.
The GPU's rasterizer determines coverage for a triangle in rectangular chunks of pixels. The goal for good utilization is to make sure the batch of pixels that comes out of this stage has as many pixels as possible inside of the triangle, as the part of the rectangle that lies well outside of it may wind up becoming multiple SIMD lanes that are dead for the the wavefront. The mechanics for wavefront packing are something of a mystery to me, however.


The saving comes from being to render less pixels (50% to be exactly, and trivial) per frame, which is much more than doing the temporal reprojection.
That should be the common case, as there's going to be a floor of arithmetic work and memory references that is proportional to the screen's size, not its content.
The heavy load case is that the screen is dominated by complex materials that hopefully require more ALU and memory accesses than the check of the motion vector, the recalculation of the motion vector, and multiple reads and writes.
 
Status
Not open for further replies.
Back
Top