PS4 Pro, checkerboard rendering

Are the generated pixels made from the previous frame, next frame, surrounding pixels, or a mixture of something?
 
So basically checkerboard rendering at 4K takes the same power to native render at 1440p? In other words half?
 
It could be interesting to update your OP with this image (credits to @HTupolev):

Render the red pixels, reconstruct the green pixels.

lnYHDdx.png

Done. I also linked the two variants of checkerboard rendering from Ubisoft and Valve.
 

Well, it sounds like it takes more information than just the current frame, which to me suggests it must have some kind of temporal component. That would require the previous frame and motion vectors.

I think it depends on the particular game and engine. Different rendering pipelines are structured differently; for some pipelines, the cost of adding checkerboard rendering would be very low, because they are already computing a lot of the information that checkerboard rendering needs.

Read more: http://wccftech.com/blow-checkerboa...y-free-might-better-uses-power/#ixzz4JxpWtODs

It also sounds like a single feature instead of a group of smaller improvements to the gpu.

It is definitely true that if you had a game running on the original PS4, and the developer wants to do the most straightforward thing to make the game look better on the Pro, that developer could enable checkerboard rendering and the game will look better and run faster; so it’s “free” in that sense.

Read more: http://wccftech.com/blow-checkerboa...y-free-might-better-uses-power/#ixzz4Jxq4GT97
 
So basically checkerboard rendering at 4K takes the same power to native render at 1440p? In other words half?

Well, the way Rainbox Six Siege does it, they actually render at 1/4 the resolution with 2xMSAA to get half of the samples and save them all to a single render target that's 1/2 the final resolution. For example render at 960x540 w/ 2xMSAA and save to a 960x1080 render target. I think Valve's solution, which is a 2x2 grid of pixels actually renders half the pixels. But maybe the same 2xMSAA trick could work. Don't know. I'm kind of thinking no, just because of the ways the pixels are grouped, but maybe I'm wrong.
 
Done. I also linked the two variants of checkerboard rendering from Ubisoft and Valve.
Great. Here is the link to the Sony patent about their "uprendering multimedia content" method.

https://patents.google.com/patent/US20160005344A1/en

Are the generated pixels made from the previous frame, next frame, surrounding pixels, or a mixture of something?

From the patent:
defining multiple shifted images of the source image...and coalescing pixels from each of the reference image and the shifted images creating an uprendered image having a higher resolution than the reference image.
So apparently several images are needed to reconstruct the final image. Maybe that's what is not 'free' according to Blow. It depends if your engine already provides those frames.

In this theoretical example provided in the patent we can see that 3 different frames are needed. It's worth noting that this specific example is quite different than the checkerboard pattern used in the games shown but the principle should be very similar.
NV1brxt.jpg
 
I'm assuming they improved on memory compression to make this possible and the pro shouldn't require 4x memory bandwidth (176 gb/s x 4 = 704 gb/s, a number the pascal titan x is not even close to) to hit something just above 1440p (2x1080p = 4.1 million rendered pixels vs 3.6m for 1440p).

Well that math is off. On the PS4 1080p rendering doesn't use all 176 GB/s. A chunk of that total 176 GB/s goes to the CPU and then you have portions lost to overhead sharing between CPU and GPU. It's what's left that goes to the GPU.
 
Well that math is off. On the PS4 1080p rendering doesn't use all 176 GB/s. A chunk of that total 176 GB/s goes to the CPU and then you have portions lost to overhead sharing between CPU and GPU. It's what's left that goes to the GPU.

Which is precisely why improved memory compression and cpu/gpu b/w contention improvements (in comparison to the ps4) are a more logical approach of solving the issue (and most probably what is happening in the pro) than just more bandwidth, this is not a PC where you just end up throwing more brute force at it until it works. I'm thinking the main reason they didn't opt for higher clocked gddr5 is power consumption other than the cost.
 
@Globalisateur So this half pixel shift, by the diagrams I'm assuming are down, uright and diagonal. Is the shift just an average of the neighboring pixels?

I'm not sure this is a good candidate for "checkerboard" rendering. Just cursory reading, it doesn't seem like it would be great quality. More like an upscale.

The only information that method requires is the current frame. That makes it a good candidate for hardware implementation of upscaling. But I don't think it jives with Jonathan Blow's comments very well. I can't think of any reason a renderer would already be creating sub-pixel shifted versions of the final frame buffer. I think when he was talking about engines that already create information that the hardware needs, it's probably something else.
 
Last edited:
Which is precisely why improved memory compression and cpu/gpu b/w contention improvements (in comparison to the ps4) are a more logical approach of solving the issue (and most probably what is happening in the pro) than just more bandwidth, this is not a PC where you just end up throwing more brute force at it until it works. I'm thinking the main reason they didn't opt for higher clocked gddr5 is power consumption other than the cost.

Exactly, however I still would like someone who does have the numbers down to share details so we might know if there is an absolute upper-bound to what is realistically possible. I don't want it glossed over with sayings of "work smarter, not more"; that's not what we focus on in the technical discussions.

Specifically I want an analytical breakdown on what limitations are to be expected on a system with their given bandwidth specifications. The first could be assumptions based on original PS4 tech as absolute worst case scenarios. Then we could see how much of an improvement is needed in memory-compression or other bandwidth savings technology in the PS4 Pro to make various rendering options viable.
 
Great. Here is the link to the Sony patent about their "uprendering multimedia content" method.

https://patents.google.com/patent/US20160005344A1/en



From the patent: So apparently several images are needed to reconstruct the final image. Maybe that's what is not 'free' according to Blow. It depends if your engine already provides those frames.

In this theoretical example provided in the patent we can see that 3 different frames are needed. It's worth noting that this specific example is quite different than the checkerboard pattern used in the games shown but the principle should be very similar.
NV1brxt.jpg
That's for PS2 emulation.
 
I'm assuming they improved on memory compression to make this possible and the pro shouldn't require 4x memory bandwidth (176 gb/s x 4 = 704 gb/s, a number the pascal titan x is not even close to) to hit something just above 1440p (2x1080p = 4.1 million rendered pixels vs 3.6m for 1440p).

Bandwidth is an interesting point.

Exactly, however I still would like someone who does have the numbers down to share details so we might know if there is an absolute upper-bound to what is realistically possible. I don't want it glossed over with sayings of "work smarter, not more"; that's not what we focus on in the technical discussions.

Specifically I want an analytical breakdown on what limitations are to be expected on a system with their given bandwidth specifications. The first could be assumptions based on original PS4 tech as absolute worst case scenarios. Then we could see how much of an improvement is needed in memory-compression or other bandwidth savings technology in the PS4 Pro to make various rendering options viable.

Exactly, that is what's most interesting to me, especially with regards to possible improvements to anisotropic filtering - how much does Polaris texture compression contribute to lowering bandwidth requirements and could that "fried" bandwidth be used to bump the AF to 8-16x?

EDIT: Ok, based on that (http://wccftech.com/amd-rx-480-polaris-10-full-slide-deck-leak/) I'd say the peak gain is 40%, with typical probably half of that, i.e. 20%. Wonder if it's enough to improve AF?
 
I dont know if the hw bump was bumped enough in the right areas.
It was only bumped in the cheap areas. Literally "what can we get for nothing?" With the process shrink, they could fit on more CUs. End of. More RAM, BW, CPU, etc, off the cards other than what little they could clock things higher. The extra CU power is all they could manage without a significant engineering task, and the end result is a very imbalanced console, only it doesn't matter because it's a half-baked, half-gen solution not set out to be as balanced and efficient as a proper console iteration.

What will be interesting to see is what pure compute brings. We have a like-for-like comparison between a PS4 with 14 CUs and a PS4 with 28 or whatever, so comparing titles we'll see just what difference a more powerful GPU can manage in the same general system. Different to PCs with a GPU swapped out, games that target 4Pro should optimise for the GPU.
 
http://wccftech.com/blow-checkerboard-rendering-ps4pro-isnt-completely-free-might-better-uses-power/
Different rendering pipelines are structured differently; for some pipelines, the cost of adding checkerboard rendering would be very low, because they are already computing a lot of the information that checkerboard rendering needs.
Any idea what additional information it's using for the reconstruction, which simpler engines don't provide?
It has to be substantial buffers considering he's saying a straight upscale is a better choice if your engine doesn't have this data and that there's a bandwidth/memory trade off.
 
http://wccftech.com/blow-checkerboard-rendering-ps4pro-isnt-completely-free-might-better-uses-power/

Any idea what additional information it's using for the reconstruction, which simpler engines don't provide?
It has to be substantial buffers considering he's saying a straight upscale is a better choice if your engine doesn't have this data and that there's a bandwidth/memory trade off.

The only thing I've ever heard of being used in reconstruction is frames and motion vectors. So unless they're doing something really out of the ordinary, I'm guessing it's either the bandwidth used by copying motion vectors and frames to the hardware "scaler", for lack of a better word, or it's the memory needed to store that data and they don't have it available. Maybe they don't have motion vectors in the right format, precision ... or something.

Edit:
It's going to be pretty hard to guess what the hardware is, but at least we can sort of look at the quality in those Horizon screens. Probably not likely there will be any uncompressed 4k video out there any time soon, and probably not many of us that could view it anyway.

Also would be handy to see a game where we could compare a pc version or something, to really see the quality of it.
 
I'll assume here sony's checker-board rendering™ reconstructs the missing samples temporally much like TAA.
In that case, Temporal reconstruction needs requires motion vectors of considerably higher precision than what you generally need for Motion Blur alone, and you can't skip any object or motion type (vertex displacement and whatnot) like its comonly done when MB is the only concern.
Precision aside, many games simply don't have a Velocity Buffer at all. Any title without TAA nor MB certainly lacks it. Adding such feature doesn't only incur the memory and bandwith cost mantaining and writing to/reading from a Velocity Buffer, but the non trivial computational cost of generating it (by either computing all animation twice per frame, or cashing it somewhere somehow within your pipeline, and compering every vertice with its past and current position)
For an engine that never bothered with such feature, the dev is faced not only with the performance issue of fitting this extra work in their rendering budget, but with the non negligible task of developing all that. I assume most games that don't support TAA might skip the work to do proper temporal reconstruction, and might just do cheap interpolation between the existing samples with some clever algorithm to try to hallucinate some data.
 
... I assume most games that don't support TAA might skip the work to do proper temporal reconstruction, and might just do cheap interpolation between the existing samples with some clever algorithm to try to hallucinate some data.

Lol. Nice post.

So it seems the likely candidate is hardware that takes current frame buffer and previous frame and uses a velocity buffer to reconstruct the other half of the current frame.
 
I don't think they have any specific fixed function hw specifically tailored for Checker-board rendering, solely on the basis that that doesn't feel like a very cerny thing to do. When they said they have special hardware to aid that kind of stuff, I understand that they've made certain architectural changes to how rasterization is done or other steps of the pipeline that allow this kind of algorithm to run more optimally.
 
Back
Top