ID buffer and DR FP16

From what I recall in the DF Scorpio article, Scorpio does possess hardware support for checkerboard rendering and such. This is a quote in the article;
"Microsoft didn't delve too deeply into specifics on the checkerboarding support that Scorpio possesses at the hardware level. However, Andrew Goossen tells us that the GPU supports extensions that allow depth and ID buffers to be efficiently rendered at full native resolution, while colour buffers can be rendered at half resolution with full pixel shader efficiency."
"We are perfectly happy with developers choosing a bunch of other techniques that are possible. We have hardware techniques for making checkerboarding very efficient. If developers want to go for checkerboarding, that's great. We've also heard from a bunch of our partners that they're actually finding that they prefer TAA [temporal anti-aliasing] with upscaling rather than checkerboarding."
 
Too bad they didn't ask a question about ID buffer or FP16 and how they used it in the Pro version. I am very surprising they didn't actually. Sounds like an odd omission and makes it an incomplete interview.
I'd assume there isn't much to write about if you didn't use FP16 and didn't use ID Buffer.
 
"We've also heard from a bunch of our partners that they're actually finding that they prefer TAA [temporal anti-aliasing] with upscaling rather than checkerboarding."
I am also a fan of integrating temporal upsampler to TAA shader. It avoids many checkerboarding issues (such as saw toothing, checkerboard artifacts and 2:1 aspect post processing), doesn't require hacking with MSAA hardware and is simpler to integrate to existing pipelines.

Whether or not you want to use id-buffer is a completely different discussion. Modern TAA implementations have pretty good occusion detection algorithms using purely color and depth. Whether you want to spend extra memory bandwidth to write and read an id-buffer depends on your failure cases. If you can already hide failures with existing data (depth and color), you don't want more data. But there are more uses to id-buffer than detecting reprojection occlusion and/or reconstructing more spatial resolution. If your rendering pipeline is designed around id-buffer, then hardware support is obviously a nice bonus.

Whether HZD uses id-buffer and/or checkerboard doesn't mean that they are taking more/less advantage of PS4 Pro hardware. These are just technical choices. I am confident that they chose the type of temporal upsampler and reconstruction method that suited their content and rendering pipeline best. The results speak from themselves. PS4 Pro gives you more options, but not using all of these extra options doesn't mean that you didn't choose the best way to render your game on that platform.
 
Decima engine slides.
Guerilla Games
Slide 48 confirms that they didn't use native res depth or ID buffer. They said that with native res hints (including ID buffer) they could have only achieved 1800p. Instead they invented a new pixel corner shading trick to reach 2160p.

No mention about fp16. But I would assume that they used fp16 where it suited their needs and gave perf boosts, but that's not very important info for SIGGRAPH presentation. New rendering algorithms are the main topic of SIGGRAPH instead of instruction level optimization.
 
I am also a fan of integrating temporal upsampler to TAA shader. It avoids many checkerboarding issues (such as saw toothing, checkerboard artifacts and 2:1 aspect post processing), doesn't require hacking with MSAA hardware and is simpler to integrate to existing pipelines.

Whether or not you want to use id-buffer is a completely different discussion. Modern TAA implementations have pretty good occusion detection algorithms using purely color and depth. Whether you want to spend extra memory bandwidth to write and read an id-buffer depends on your failure cases. If you can already hide failures with existing data (depth and color), you don't want more data. But there are more uses to id-buffer than detecting reprojection occlusion and/or reconstructing more spatial resolution. If your rendering pipeline is designed around id-buffer, then hardware support is obviously a nice bonus.

Whether HZD uses id-buffer and/or checkerboard doesn't mean that they are taking more/less advantage of PS4 Pro hardware. These are just technical choices. I am confident that they chose the type of temporal upsampler and reconstruction method that suited their content and rendering pipeline best. The results speak from themselves. PS4 Pro gives you more options, but not using all of these extra options doesn't mean that you didn't choose the best way to render your game on that platform.

I assume that Insomniac's temporal injection and UbiSoft's For Honor are both based on TAA with temporal upscaling, and the look absolutely fantastic. My question then is; if TAA with Temporal upscaling gives that much better a result, avoids a lot of checkerboarding issues, and can be used with a variable of resolution instead of checkerboarding use of half resolution, why then is checkerboarding more commonly used? is it easier?
 
I assume that Insomniac's temporal injection and UbiSoft's For Honor are both based on TAA with temporal upscaling, and the look absolutely fantastic. My question then is; if TAA with Temporal upscaling gives that much better a result, avoids a lot of checkerboarding issues, and can be used with a variable of resolution instead of checkerboarding use of half resolution, why then is checkerboarding more commonly used? is it easier?
I am not willing to comment or speculate about Ubisoft technology, since I left Ubisoft only 2 years ago. I am sure Insomniac and Ubisoft are eventually doing presentations about their techniques. Then we can compare them. Decima Engine presentation (link above) clearly shows that new research is important. There's so many ways to improve these algorithms. We are just getting started.
 
I assume that Insomniac's temporal injection and UbiSoft's For Honor are both based on TAA with temporal upscaling, and the look absolutely fantastic. My question then is; if TAA with Temporal upscaling gives that much better a result, avoids a lot of checkerboarding issues, and can be used with a variable of resolution instead of checkerboarding use of half resolution, why then is checkerboarding more commonly used? is it easier?
How many games are using checkerboarding?
 
Quite a few; A lot of Sony's first party, BF1, CoD:IW, Watchdog, the upcoming AC, Anthem, Destiny 2, and so on and so forth. Why do you ask?
I think he's asking because he wants to know approximate adoption rates to validate your hypothesis if checkerboarding is more prominent etc.

It's entirely possible that some games are better fit for CBR and others TAA. I'm not sure if any are particularly easier to implement.
 
Quite a few; A lot of Sony's first party, BF1, CoD:IW, Watchdog, the upcoming AC, Anthem, Destiny 2, and so on and so forth. Why do you ask?
Checkboarding is a catchall at the moment. I'm interested to know actual numbers of genuine, vanilla checkerboarding versus other options, to see if it genuinely is much more common. As for reasons why it'd be more common, Sony provide a solution for them to use, and it's possibly an easier fit in current WIPs. It takes a while for new ideas to propagate as well - how long ago were these example games integrating CB and how much effort would it be to refactor them for something else?

Or in short, the latest, greatest solution isn't always possible to adopt in a timely, costly fashion, and you just have to wait for the next iteration of games.
 
Checkboarding is a catchall at the moment. I'm interested to know actual numbers of genuine, vanilla checkerboarding versus other options, to see if it genuinely is much more common. As for reasons why it'd be more common, Sony provide a solution for them to use, and it's possibly an easier fit in current WIPs. It takes a while for new ideas to propagate as well - how long ago were these example games integrating CB and how much effort would it be to refactor them for something else?

Or in short, the latest, greatest solution isn't always possible to adopt in a timely, costly fashion, and you just have to wait for the next iteration of games.

That's why I asked sebbbi as I know he is a developer and might know if its genuinely easier or not, but I understand his reason for not answering as he does not want to speculate.
 
Slide 48 confirms that they didn't use native res depth or ID buffer. They said that with native res hints (including ID buffer) they could have only achieved 1800p. Instead they invented a new pixel corner shading trick to reach 2160p.

No mention about fp16. But I would assume that they used fp16 where it suited their needs and gave perf boosts, but that's not very important info for SIGGRAPH presentation. New rendering algorithms are the main topic of SIGGRAPH instead of instruction level optimization.

Can't see the slide from my device, do they describe the technique in it and if it is used exclusively?

I could see that statement as indicating that with native hints alone they couldn't but by using that new technique as well they could. Unless the technique is fully described and mentioned as being used alone by itself, such hints could be used in conjunction with it, that is aid the technique.

The ID buffer is said aids both reconstruction techniques(think not exclusive to checkerboarding, though might be wrong) as well as temporal antialiasing, both used in horizon, iirc, and both affecting the quality of the final 4k output. So the id buffer would have to not be used in either reconstruction nor temporal antialiasing for it to have played no part.
 
Slide 48 confirms that they didn't use native res depth or ID buffer. They said that with native res hints (including ID buffer) they could have only achieved 1800p. Instead they invented a new pixel corner shading trick to reach 2160p.

No mention about fp16. But I would assume that they used fp16 where it suited their needs and gave perf boosts, but that's not very important info for SIGGRAPH presentation. New rendering algorithms are the main topic of SIGGRAPH instead of instruction level optimization.
more of a personal question to you sebbbi, but do you prefer:

implementation / R&D of new algorithms or instruction level optimization. Is one particularly easier over the other?

Or are they for the most part joined? as in, you could implement new algorithms you learned but, it's not optimized and you need to go back and perform instruction level optimization on the new algorithm anyway.
 
more of a personal question to you sebbbi, but do you prefer:

implementation / R&D of new algorithms or instruction level optimization. Is one particularly easier over the other?

Or are they for the most part joined? as in, you could implement new algorithms you learned but, it's not optimized and you need to go back and perform instruction level optimization on the new algorithm anyway.
During the years I have noticed that I am pretty good at low level optimization myself. Give me good tools and I am going to find the performance bottlenecks and figure out ways to reduce them. When we started our own company and adapted Unreal Engine, I immediately noticed that I can analyze performance and do low level optimizations efficiently also to other people's code. Doesn't need to be made by my team. I consider memory bandwidth and layout optimizations as low level optimizations. Pure ALU optimizations aren't as efficient anymore as they were on last gen consoles. Usually the best bang of the buck is achieved by optimizing memory accesses, register usage or improving load balancing or parallelism. Splitting the workload efficiently among multiple shader passes is important for all of these.

But I also find it fun to research new rendering methods. Our SIGGRAPH presentation (http://advances.realtimerendering.c...siggraph2015_combined_final_footer_220dpi.pdf) was a great example of that. Inventing new rendering methods is much harder than doing low level optimizations, because it requires larger scale understanding. And you need a new idea. That's hard. Then you need time to iterate this idea further. Usually doing one thing differently means that you soon find that you need to do other things differently as well. This requires more good ideas to solve the problems you encounter. It's not a steady progress. Sometimes you make big jumps and sometimes your awesome ideas fail to deliver after doing years of research. And sometimes you simply don't have time to finish the system in time for the current project. The most important thing to realize in game tech business is that the game comes first. Tech serves the game, not the other way around. There's not enough time for experimentation during game projects, unless you can guarantee results. That's why big companies like EA have created separate technology research teams (EA SEED).

New rendering methods obviously also require low level optimization passes. Usually even more so than traditional methods, because you end up running more stuff on compute shaders versus traditional pixel+vertex shader setups. Also sometimes you find out during the low level optimization pass that your rendering method is flawed on current hardware. A good example is the triangle strip clustering presented at beginning of our SIGGRAPH presentation. GCN 1/2/3 have very poor triangle strip rendering performance (B3D Suite benchmark also shows this clearly). GCN4 fixes this, but base console versions are still GCN2 based. A different algorithm is shown two slides later to sidestep this problem (output index buffer + use multidraw). Unfortunately this algorithm does't work on DirectX 11... except that Nvidia and AMD added backdoor APIs to do multidraw on DX11. Sometimes you find that hardware and/or software (API) lacks features needed to make your research efficient. We had a list of important future features at end of that presentation. I am glad to see that DX12 SM 6.0 and SM 6.1 added most of features in our list to PC DirectX. Vulkan also now has extensions to many of these features. One thing that I forgot to put into my list was updating virtual memory mappings directly on GPU side using an mapping array generated by a compute shader. This is still on top of my wish list...
 
Last edited:
Back
Top