ID buffer and DR FP16

But it would be able to run it at XO settings whatever they may be running at 1080p

Sure, at 1080p. But Horizon at 1080p on XB1 would already mean some serious downgrades. PS4 games are substantially more demanding at 1080p.

so native is possible and because it's using CBR doesn't mean it couldn't run the XO version at native 4k. It's using CBR because studio has invested in that tech for their engine, which means get more for slight drop in IQ

Ok, let's do some basic caculations :

6 vs 1,31 = +358% or x4,5

2160p vs 1080p = +300% or x4

2160p vs 900p = +476% or x5,7

So far, i don't know any single game announced at native 4k on X and running at 900p on XB1.

So, based on these data, with a slightly lower resolution than 1080p, native 4k should still be possible on X, but not at 900p.

Also, 2160C seems to be the maximum possible resolution on Pro with AAA games.

Anyway, i could be wrong and you could be right, but all the data so far are more in line with my opinion.
 
Sure, at 1080p. But Horizon at 1080p on XB1 would already mean some serious downgrades. PS4 games are substantially more demanding at 1080p.



Ok, let's do some basic caculations :

6 vs 1,31 = +358% or x4,5

2160p vs 1080p = +300% or x4

2160p vs 900p = +476% or x5,7

So far, i don't know any single game announced at native 4k on X and running at 900p on XB1.

So, based on these data, with a slightly lower resolution than 1080p, native 4k should still be possible on X, but not at 900p.

Also, 2160C seems to be the maximum possible resolution on Pro with AAA games.

Anyway, i could be wrong and you could be right, but all the data so far are more in line with my opinion.

Raw pixel draw calculations doesn't necessarily tell the whole story. XBO-X's larger memory pool/bandwidth, non eSRAM design, beefier GPU and good 'ole game optimization, more than likely can run those original XB1 900p games at native 4K. However, if those games are seriously CPU bond, and the developer goal is frame-rate first, then maybe checkerboard rendering is the best option.
 
Raw pixel draw calculations doesn't necessarily tell the whole story. XBO-X's larger memory pool/bandwidth, non eSRAM design, beefier GPU and good 'ole game optimization, more than likely can run those original XB1 900p games at native 4K. However, if those games are seriously CPU bond, and the developer goal is frame-rate first, then maybe checkerboard rendering is the best option.
if an engine has already got CBR implemented then they will likely use it even if it could do it native, as it gives them the option to up the graphics some more.

I also suspected that if an engine doesn't have CBR and its a 900p game they may aim for 1800p native for similar reasons.
 
if an engine has already got CBR implemented then they will likely use it even if it could do it native, as it gives them the option to up the graphics some more.

I also suspected that if an engine doesn't have CBR and its a 900p game they may aim for 1800p native for similar reasons.

That's dependent on the title and how demanding it is and how well the devs understand and exploit the base X1 hardware. Shadow of Mordor is 900p on the X1 yet native 4k on the PS4 Pro.

900p on the XB1 doesn't automatically mean CB is required to hit 4K.
 
That's dependent on the title and how demanding it is and how well the devs understand and exploit the base X1 hardware. Shadow of Mordor is 900p on the X1 yet native 4k on the PS4 Pro.

900p on the XB1 doesn't automatically mean CB is required to hit 4K.

I didn't know that Shadow of Mordor ran at native 4k on Pro. If we want to be accurate, it's dynamic 4k though, but native 4k most of the time according to DF.

So, you made a point. However, it's a cross-gen game.

About the Pro :

4,2 vs 1,84 = +128% or x2,2

2160c vs 1080p = +100% or x2

So, at least for the Pro performances are in line with the raw numbers in best scenarios (many games don't even hit 2160c).
 
There was some speculation on 32 ROPs not being quite right
You mean there aren't 32 ROPs on the original PS4? I thought those were certain..


although 64 vs 32 seems more than a little off.
Why?
The PS4 targets 1080p whereas the Pro targets 1440p/1800p (+checkerboard). That's between 1.8x and 2.9x higher resolution, so why shouldn't the Pro have 2.24x more fillrate as it does with everything else?


There's a GPU memory controller that then plugs into the main memory controller.
If that is the case, the GPU memory clients would directly attach to a channel in that controller, and then that would then plug into the actual memory controller.
Then this means the ROPs aren't directly attached to the memory channels, so ROP amount isn't dependent from the bus width as we see in discrete GPUs.


not really, you can't take it that literally or we would also have 512bit (twice the MCs) memory and many other things doubled
This isn't a discrete GPU, it's an APU. Memory channels aren't necessarily connected to the ROPs.
 
Why do I always see the ID buffer in PS4 Pro and DR FP16 being lumped together? Aren't these two seperate things?
 
That's dependent on the title and how demanding it is and how well the devs understand and exploit the base X1 hardware. Shadow of Mordor is 900p on the X1 yet native 4k on the PS4 Pro.

900p on the XB1 doesn't automatically mean CB is required to hit 4K.
that's been my point all along. XO games at XO settings can run on 1X 4k native, even 900p ones, but 900p is not guaranteed and may need work to get there, how much is game dependent.
and they may choose not to run it native, but that doesn't mean it couldn't.
 
Why do I always see the ID buffer in PS4 Pro and DR FP16 being lumped together? Aren't these two seperate things?

I mentioned the 2 because they are advantages present in PS4Pro for certain techniques and also are the only ones I know, if you know another tech present in the console it will be interesting to discuss that too , actually the DR FP16 is also present in phones and in the nintendo switch so whatever optimization they represent for PS4 pro can probably apply for switch
 
You mean there aren't 32 ROPs on the original PS4? I thought those were certain..

Why?

The PS4 targets 1080p whereas the Pro targets 1440p/1800p (+checkerboard). That's between 1.8x and 2.9x higher resolution, so why shouldn't the Pro have 2.24x more fillrate as it does with everything else?

Then this means the ROPs aren't directly attached to the memory channels, so ROP amount isn't dependent from the bus width as we see in discrete GPUs.

This isn't a discrete GPU, it's an APU. Memory channels aren't necessarily connected to the ROPs.

There is a speculative article from Anandtech, based around Digital Foundry's Project Scorpio reveal, that talks about XBO-X ROPS and memory controller configuration.

What makes things especially interesting though is that Microsoft didn’t just switch out DDR3 for GDDR5, but they’re using a wider memory bus as well; expanding it by 50% to 384-bits wide. Not only does this even further expand the console’s memory bandwidth – now to a total of 326GB/sec, or 4.8x the XB1’s DDR3 – but it means we have an odd mismatch between the ROP backends and the memory bus. Briefly, the ROP backends and memory bus are typically balanced 1-to-1 in a GPU, so a single memory controller will feed 1 or two ROP partitions. However in this case, we have a 384-bit bus feeding 32 ROPs, which is not a compatible mapping.

What this means is that at some level, Microsoft is running an additional memory crossbar in the SoC, which would be very similar to what AMD did back in 2012 with the Radeon HD 7970. Because the console SoC needs to split its memory bandwidth between the CPU and the GPU, things aren’t as cut and dry here as they are with discrete GPUs. But, at a high level, what we saw from the 7970 is that the extra bandwidth + crossbar setup did not offer much of a benefit over a straight-connected, lower bandwidth configuration. Accordingly, AMD has never done it again in their dGPUs. So I think it will be very interesting to see if developers can consistently consume more than 218GB/sec or so of bandwidth using the GPU.
 
You mean there aren't 32 ROPs on the original PS4? I thought those were certain..
It was speculation on whether the Pro had 32, which was described vaguely as being not quite right.
I would interpret 64 as being very not right.

Why?
The PS4 targets 1080p whereas the Pro targets 1440p/1800p (+checkerboard). That's between 1.8x and 2.9x higher resolution, so why shouldn't the Pro have 2.24x more fillrate as it does with everything else?
Bandwidth didn't scale nearly as much, even with compression. Die area or room to add the custom ID buffer path might also factor into it. There's no description of the Pro's fill rate, although the 32-ROP calculations haven't been contradicted yet.

Then this means the ROPs aren't directly attached to the memory channels, so ROP amount isn't dependent from the bus width as we see in discrete GPUs.
The non-console APUs have such low ROP and DRAM channel counts that it's not clear if they are out of sync, although that's not what I've been concerned about.
We already know that discretes can have mismatched ROP and channel counts--Tahiti and Tonga.
 
I have not seen this claimed.
I haven't seen details more specific than the headline numbers, but it doesn't strike me as helpful to make this split.
Getting 32 ROPs to map to a 384-bit bus has already been done, while getting 2 MiB of L2 cache to match this has not. 128 bits of GDDR5 seems like overkill for the CPU portion as well.
I was trying to figure out where I saw that. Could be mistaken, but I may have assumed that based on the bandwidth allotments being a multiple of a single channel. Doubling bandwidth figures from XB1 resulted in the equivalent of 4 channels for the CPU portion and 8 for the GPU portion. I'd agree that seems to be too much bandwidth for the CPU, but texturing wouldn't necessarily adhere to ROP channel assignments. Eight cores could use the equivalent bandwidth of the ROPs with DCC.

That's true, but not as simple as it sounds like.Running fp16 instead of fp32 code doesn't directly allow the CU to run more concurrent threads (maximum is still 40 waves). fp16 will save some register space, and this can lead to more concurrency (more waves fit to register file at same time). Also fp16 math completes the heavy ALU portions twice as fast. However if the shader was bound by something else than ALU, this actually means that the ALU can't hide as much latency anymore. If the GPU simply runs more waves of the same kernel, FP16 doesn't help at all in this case (every wave just waits more). But if a CU has mixed workload (only possible on AMD GPUs) containing waves from multiple kernels then FP16 helps, because waves hitting the bottleneck will reach the next memory/filter operation sooner -> wait sooner, allowing waves from other kernels to run more frequently on the same CU.

FP16 is definitely better with games/engines using lots of async compute or compute overlap.
I didn't mean to imply it will increase concurrency, but benefit concurrently scheduled work. ALU work being the sum of all concurrent waves.

Just to be clear here, FP16 should never be slower. Exception being additional steps to pack registers. The latency hiding ability should be greater or equal, never actually worse. The ratio will change, but absolute latency will remain the same in the worst case. So waves may spend more time waiting, the execution time won't be any longer.

The maximum may currently be 40 waves, but who's to say a refreshed product won't change that? Hypothetically a future console/tablet with higher throughput could be including more or completely independent work. It wouldn't be unreasonable for a high priority task for tone mapping that wasn't in the original app. Compatibility for future hardware.

No console has ever had turbo clocks based on TDP. Saving power doesn't directly give you any performance gains. Power saving is however very important strategy on mobile phones. Throtting can result in more than 50% GPU performance drop on modern flagship phones. Double rate FP16 is great when you need to race to sleep. But lately desktop GPU IHVs have also introduced TDP based turbo clocks: Modern Nvidia desktop GPUs have pretty high turbo clocks. AMD still doesn't, but Vega's 1.5+ GHz clock rate points out that AMD is following the suit. It is definitely going to be worth thinking about TDP in future high end GPU code. Saving bandwidth and ALU in shaders where those are not bottleneck is going to be wise.
No current console, but it does still reduce power. Even if the result is just saving battery life on a future tablet or mobile device like you mentioned.

Nvidia has high turbo clocks, but they also insert NOPs to reduce power. Increasing voltage can decrease performance without lowering clockspeed. Increase clocks by 50% and insert a NOP every third cycle. Equivalent compute performance, but non-compute components would be improved. Similar to changing clock multipliers on various components. High advertised clocks for marketing with negligible performance increase that gets written off as sub-linear scaling.
 
Also fp16 math completes the heavy ALU portions twice as fast.

This is only the case when you vectorize your shaders (again, after going scalar previously), because contrary to VLIW the FP16 are vector instructions and need to execute the same operation for all participants. This can be seen equivalent to "doubling" the wavefronts.
Otherwise, for a regular shader setup, you face the same problems extracing performance as with a C++ auto-vectorizing compiler, mostly you just pray the operations line up enough to give some improvement. Yes, in shaders it should do so more than in a regular C++ code, but's still very dependent on the piece of code.
 
I'd imagine a completely optimized game for 4Pro could come pretty close to what Scorpio can do.
The realities are fairly straight forward though.
a) Checkerboard offers significantly good returns on image quality for it's performance impact
b) FP16 will assist with a few areas of the rendering pipeline, but if we're talking about achieving the same level of image quality between the two, then this feature becomes has less impact
c) Checkerboarding can be accomplished without ID buffer, as shown by Ubisoft when they released Rainbow Six Siege. So the huge impact on performance statement needs some more context as to 'what'.

Where it gets interesting is that because checkerboard offers very good return on image quality for it's performance boost, there's no reason to not use it on 1X in this case, which favourably scales in X1X favour. As for example an arbitrary 50% of a larger number would result in benefiting the hardware with the larger number.

The issue with checkerboarding in general is that it is _not_ a hardware feature. Each company can implement checkerboarding differently, with or without ID buffer. There is no consistency between the performance or quality of each companies algorithm. At least, I don't think there has been a standard developed for it yet.

In situations where a company does not have the capacity to build checkerboard rendering, Native 4K can be achieved on X1X with much higher probability than 4Pro. Other factors like texture resolutions require bandwidth in which X1X has the memory to support 4K assets, where 4Pro is less likely to find success here.

Rounding it out, I'd prefer not to debate about marketing talk, I think a lot of us agree that the messaging is not exactly fair, or clear. But at the end of the day the Xbox One X, will have fewer asterisks in achieving 4K Frame Buffer games than 4Pro, and it will likely keep this trend. You will see a great deal of many CBR titles up to 1800p, and then a final upscale to 2160p on 4Pro. In these situations, X1X will probably clear CBR straight through to 2160p. I expect this to be the case for a lot of graphically challenging titles.

Sony's first party studios are some of the best in the business. For example Digital Foundry has commented that Ratchet and Clank's solution to generating 4k is superior to most checkerboard implementations they've seen, iirc. Did it depend on the ID buffer? How about horizon's?


We cannot dismiss the id buffer as providing no advantage performance or quality wise.

The xbox one does have slightly superior cpu performance and notably more bandwidth.
With a ps4 pro slim likely at 400 or below. Will it offer notable enough difference to justify the higher price?
It could be the id buffer presents no benefit, part of the performance difference is already made up by 16bit, but it could also be that the id buffer introduces quality or performance improvements that put it above.

Performance of a particular shader is often limited by a single bottleneck (or combination of two). Most common bottlenecks are ALU, texture filtering, memory latency, memory bandwidth, fillrate and geometry front end. Double rate FP16 only helps if the shader main bottleneck is ALU. FP16 registers also helps a bit with memory latency, since 16 bit registers use 50% less register file storage than 32 bit registers -> GPU has better occupancy -> more threads can be kept ready to run -> better latency hiding capability.

People look too much to GPU peak FLOP rate number. FP16 doubles this theoretical number, but it's important to realize that FP16 doesn't double the count of TMUs, ROPs or memory bandwidth. When GPU manufacturers scale up the GPU, they scale all of these up together. GPUs with more FLOPs also have more TMUs, more ROPs, more bandwidth and fatter geometry front ends. Marketing departments like to use FLOP count as simple number to describe the GPU performance level, but this creates the illusion that FLOP count is the only thing that matters. If the other parts didn't scale up equally, the performance advantage would be very limited.

Thus doubling the peak FLOP rate by FP16 doesn't suddenly make a GPU equivalent to another GPU with double FLOP rate, unless all other parts of the GPU are also scaled up.

FP16 is a very useful feature for the developers, but mixing it up with FLOP based marketing is simply confusing the consumers.

I assume the ps4 had improvements in other areas to handle higher resolution.

As people might guess, I really like id-buffers. But I am not going to participate in the discussion how much better a hardware implementation is versus an optimized software implementation (with MSAA/EQAA/DCC tricks). I don't believe there's enough public documents about these hardware features available to discuss in public forums. And the devil is obviously in the details (as it tends to be when discussing high performance graphics tricks).

But I recommend reading my thread about id-buffering and other future rendering techniques (pure tech instead of console wars):
https://forum.beyond3d.com/threads/modern-textureless-deferred-rendering-techniques.57611/

Paraphrasing
"Thought it was native, couldn't find the sort of artifacts you usually find with upscaling"
"it's never looked this good, this great before..."

Paraphrasing
"image improves dramatically on pro"
"Far more impressive then other solutions we've seen due to superior temporal antialiasing"

The best 4k techniques appear to be taking place on playstation 4 pro. How much is due to the id buffer? Would be ironic if software checkerboard exhibits more artifacts, and the better more artifact free pixels appear on the pro.
 
Sony's first party studios are some of the best in the business. For example Digital Foundry has commented that Ratchet and Clank's solution to generating 4k is superior to most checkerboard implementations they've seen, iirc. Did it depend on the ID buffer? How about horizon's?


We cannot dismiss the id buffer as providing no advantage performance or quality wise.

The xbox one does have slightly superior cpu performance and notably more bandwidth.
With a ps4 pro slim likely at 400 or below. Will it offer notable enough difference to justify the higher price?
It could be the id buffer presents no benefit, part of the performance difference is already made up by 16bit, but it could also be that the id buffer introduces quality or performance improvements that put it above.



I assume the ps4 had improvements in other areas to handle higher resolution.



Paraphrasing
"Thought it was native, couldn't find the sort of artifacts you usually find with upscaling"
"it's never looked this good, this great before..."

Paraphrasing
"image improves dramatically on pro"
"Far more impressive then other solutions we've seen due to superior temporal antialiasing"

The best 4k techniques appear to be taking place on playstation 4 pro. How much is due to the id buffer? Would be ironic if software checkerboard exhibits more artifacts, and the better more artifact free pixels appear on the pro.
Checkerboarding for the most part all developers get the render every other grid down consistently.

How they resolve the grids that aren't rendered will differ from developer to developer and from game to game.

That's why the term checkerboarding is not representative of some standard. It will drastically differ from title to title. So I don't know or see ID buffer being this consistent win for all titles for that reason.

From what I understand during reconstruction having that ID buffer greatly helps with very complex resolves. If it could be handled the same way in software with a slight performance impact then the ID buffer isn't enabling a feature that isn't available elsewhere, just an impact to performance.

I've never heard of a situation in which software could not create the same functionality as fixed function hardware, unless it's operating at this weird bit level that is inaccessible to programming.
 
Checkerboarding for the most part all developers get the render every other grid down consistently.

How they resolve the grids that aren't rendered will differ from developer to developer and from game to game.

That's why the term checkerboarding is not representative of some standard. It will drastically differ from title to title. So I don't know or see ID buffer being this consistent win for all titles for that reason.

From what I understand during reconstruction having that ID buffer greatly helps with very complex resolves. If it could be handled the same way in software with a slight performance impact then the ID buffer isn't enabling a feature that isn't available elsewhere, just an impact to performance.

I've never heard of a situation in which software could not create the same functionality as fixed function hardware, unless it's operating at this weird bit level that is inaccessible to programming.

Sure things can be done in software, but the question is how much of a performance penalty. A hardware implementation can sometimes be order of magnitudes faster. When you see most software solutions from excellent developers paling in comparison, could be higher quality solutions are not viable, or perhaps not worth the cost, in software performance wise.

The ps2 could do lots of stuff in software, but the performance penalty was significant as compared the original xbox.
 
Sure things can be done in software, but the question is how much of a performance penalty. A hardware implementation can sometimes be order of magnitudes faster. When you see most software solutions from excellent developers paling in comparison, could be higher quality solutions are not viable, or perhaps not worth the cost, in software performance wise.

The ps2 could do lots of stuff in software, but the performance penalty was significant as compared the original xbox.
I don't know. As Sebbbi mentioned earlier, ti's safeguarded for now, he'll speak to it when the info is out there but not before. I don't know if anyone else here can speculate on it's possible performance improvements, but I haven't read/seen evidence of ID Buffer enabling CBR techniques that aren't possible on PC and 1X. And I haven't read/seen evidence of ID buffer reconstruction techniques providing a huge advantage over software based such that none ID Buffer platforms 'won't go there'.
 
I don't know. As Sebbbi mentioned earlier, ti's safeguarded for now, he'll speak to it when the info is out there but not before. I don't know if anyone else here can speculate on it's possible performance improvements, but I haven't read/seen evidence of ID Buffer enabling CBR techniques that aren't possible on PC and 1X. And I haven't read/seen evidence of ID buffer reconstruction techniques providing a huge advantage over software based such that none ID Buffer platforms 'won't go there'.

True we still don't know, but if it remains the case in the future that ps4 pro retains superior checkerboard that has almost no artifact and looks native, and software on other platforms doesn't deliver similar, it may be that mimicking the functionality of the id buffer in software or doing alternative solutions is not practical.

Right now what we know is that the superior implementations are on ps4 pro. Whether that's a result of dev. know how, available hardware features, or a combination of the two remains to be seen.

PS

Given the quality of the output, I personally believe checkerboarding and similar reconstruction techniques are the future. Ps5 checkerboard will allow for more effects in play than 4k native let alone 8k native. It allows consoles to punch above their weight, as far as I can see.
 
Last edited:
Given the quality of the output, I personally believe checkerboarding and similar reconstruction techniques are the future.
Fully agree. Reconstruction is the future. But in my opinion checkerboarding will simply be a quick intermediate step towards better reconstructions algorithms. PS4 Pro having extra hardware to support checkerboarding is great, but my guess is that when PS5 launches, we already have better algorithms available.
 
Back
Top