ID buffer and DR FP16

In my opinion we will have to wait for 3rd party titles like Anthem and Metro to release to see the true advantages and disadvantages of both platforms.

I for one think using the XoneX power to push beyond just high pixel count is the way to go. Anthem is using checkerboard of some sort to hit 4k. I think the reason for this may have something to do with the impressive level of detail and high end "PC like" settings on display in the trailer.

It comes down to developers using the extra memory bandwidth and shader power to render a title with Xbox one settings at native 4k or using this extra power to render PC ultra settings in a title that renders with checker boarding or dynamic res.

I would prefer the latter each and every time!

PS I don't see devs using checkerboard rendering on the Xbox one x unless it provides a substantial benefit in freeing up memory and GPU usage. So without first hand experience it doesn't seem like the shader and memory cost of checker boarding would be enough to put the Pro on par with the OneX.
 
Last edited:
Combined with dynamic scaling according to DF. And um. I just checked the 4K images available of this game...But I think I'll wait for proper PNGs from the final version of the game before giving any opinion...:yep2:

http://www.eurogamer.net/articles/digitalfoundry-2017-anthem-the-real-deal-for-xbox-one
Actually in the video piece for this article DF state that the use of checker boarding and dynamic res in frostbite comes from a GDC talk about the engines possible features on project Scorpio.

They go on to state in the video that after pooring through the video feed they didn't see one frame drop below 2160p.
 
Actually in the video piece for this article DF state that the use of checker boarding and dynamic res in frostbite comes from a GDC talk about the engines possible features on project Scorpio.

They go on to state in the video that after pooring through the video feed they didn't see one frame drop below 2160p.
Because of the very bad image quality of the footage (from the 4K images I have seen on DF and gamersyde) I really don't see how they could give such a definitive statement for the whole demo...
 
This implies that the Pro's iGPU has 2x everything there is on the PS4's GPU, not only CU and TMU amount. That would mean there are 64 ROPs in it.
Otherwise turning off half the ROPs in the Pro would put it running with only 16 ROPs, which is half of what the PS4 has and would mean hitting an obvious bottleneck in base compatibility mode.
There was some speculation on 32 ROPs not being quite right, although 64 vs 32 seems more than a little off.
Some interesting interpretations are that items not used in compatibility mode like the ID buffer might be take over part of the ROP hardware in full mode. That might mean there's some asymmetry, or it's symmetric with the Pro not actually getting 64 pixels per clock in standard fill rate.

Honest question: aren't all APUs/SoCs using a memory crossbar / ringbus anyway?
How can the CPU cores access the system RAM if each memory channel is directly connected to a ROP like in modern discrete GPUs?
I think reviews for the existing APUs mentioned there's something of a hacky relationship between the GPU and CPU domain. There's a GPU memory controller that then plugs into the main memory controller.
If that is the case, the GPU memory clients would directly attach to a channel in that controller, and then that would then plug into the actual memory controller.

I'm not sure what happens with the consoles.
Vega's change in making ROPs L2 clients and adding the infinity fabric might make the multiple controller setup unnecessary.
 
Because of the very bad image quality of the footage (from the 4K images I have seen on DF and gamersyde) I really don't see how they could give such a definitive statement for the whole demo...
Oh I don't disagree. They said they were given direct feed but only at 29. Something fps. Plus this slice of the gameplay could not have enough performance hits to warrant the drop in resolution. The finished product could wind having a lot of instances of a dynamic framebuffer
 
ubisoft says the new Assasin's Kreed runs equal on both X and PRO... I guess they need to use this FP16 but only PRO has it double rate per clock cicle... Big mistake MS did.


Not really equally. AC Origins uses dynamic scaling and will scale down less on X1X according to the developer. Buried somewhere in this article.

https://www.windowscentral.com/xbox-one-x-demonstrates-real-value-it-ps4-pro-competitor-or-not?amp

The developer told me that both versions use dynamic resolution scaling to maintain frame rate stability, so more intense scenes might see the 4K resolution drop below momentarily to keep the game running smoothly. He said that the Xbox One X version's resolution would most likely drop below True 4K far less often, and perhaps not at all when compared to the PS4 Pro,

We're getting into splitting hairs at that point. But anyway you slice it, X1X is just more powerful. How much the extra power is used will depend.
 
Double rate FP16 only helps if the shader main bottleneck is ALU.
It should help in more situations than "only" a single shader bottleneck. Async would balance the load from all concurrent shaders. So even if an individual shader isn't ALU bound, it should benefit any concurrent work.

Indirectly, even if no ALU bottleneck exists, packed FP16 could still result in faster performance by consuming less power. Allowing for an increase in clockspeed that will directly help with any bottleneck. That's difficult to quantify, but with cards throttling to a specific TDP it's a valid optimization strategy.

If the L2 number is accurate, it is interesting for two reasons. The first is that it is not a straightforward distribution among 12 channels, and the other is that the highest per-slice capacity listed for GCN with Tahiti is too small to get to 2MiB at that channel count.
Wasn't the bus more along the lines of 256+128 for GPU and CPU portions of the APU respectively? One memory controller, but partitioned for the different components. The GPU being roughly Polaris 10 with a CPU attached to additional channels. ROPs would therefore map to a 256bit bus directly.
 
It comes down to developers using the extra memory bandwidth and shader power to render a title with Xbox one settings at native 4k or using this extra power to render PC ultra settings in a title that renders with checker boarding or dynamic res.

I highly doubt that even at XB1 settings, the X has the power to hit native 4k in sub-native 1080p games : "With that being our focus, we’re running at 4K 30FPS for Campaign/Horde and 4K 60FPS for Versus with adaptive scaling to ensure a rock-solid frame rate that fans expect from our head to head multiplayer."

http://www.neogaf.com/forum/showpost.php?s=194d898d9bafcbe6f6e044c89e713c9a&p=240179660&postcount=1
 
Wasn't the bus more along the lines of 256+128 for GPU and CPU portions of the APU respectively? One memory controller, but partitioned for the different components. The GPU being roughly Polaris 10 with a CPU attached to additional channels. ROPs would therefore map to a 256bit bus directly.
I have not seen this claimed.
I haven't seen details more specific than the headline numbers, but it doesn't strike me as helpful to make this split.
Getting 32 ROPs to map to a 384-bit bus has already been done, while getting 2 MiB of L2 cache to match this has not. 128 bits of GDDR5 seems like overkill for the CPU portion as well.
 
I highly doubt that even at XB1 settings, the X has the power to hit native 4k in sub-native 1080p games : "With that being our focus, we’re running at 4K 30FPS for Campaign/Horde and 4K 60FPS for Versus with adaptive scaling to ensure a rock-solid frame rate that fans expect from our head to head multiplayer."

http://www.neogaf.com/forum/showpost.php?s=194d898d9bafcbe6f6e044c89e713c9a&p=240179660&postcount=1
Yeah I've seen that quote from the Coalition but the way it reads is unclear. To me it reads that the campaign will run at 4k 30fps and the multiplayer will run at Dynamic res and 60fps. It isn't clear though. And I'm not saying that all 1080p Xone games will run at 4k native on the OneX. I'm just saying that I would prefer checkerboard or dynamic res with higher quality geometry, draw distance and effects over native 4k any day.
 
Yeah I've seen that quote from the Coalition but the way it reads is unclear. To me it reads that the campaign will run at 4k 30fps and the multiplayer will run at Dynamic res and 60fps.

I understood the same thing :

- Native 4k for the single player mode
- Dynamic 4k for the multipayer mode

And it's quite understandable since the multiplayer doesn't run at 1080p on XB1. There's the same dynamic resolution on the base model.
 
I highly doubt that even at XB1 settings, the X has the power to hit native 4k in sub-native 1080p games : "With that being our focus, we’re running at 4K 30FPS for Campaign/Horde and 4K 60FPS for Versus with adaptive scaling to ensure a rock-solid frame rate that fans expect from our head to head multiplayer."

http://www.neogaf.com/forum/showpost.php?s=194d898d9bafcbe6f6e044c89e713c9a&p=240179660&postcount=1
gears on 1X is not running XO settings. So if anything this goes against what your saying.
gears on 1X is running with much improved settings at 4K that's including the fact that even on XO it also had dynamic res implemented, not sure how often it dropped resolution, but we also don't know how often it drops on 1X either.
 
gears on 1X is not running XO settings. So if anything this goes against what your saying.
gears on 1X is running with much improved settings at 4K that's including the fact that even on XO it also had dynamic res implemented, not sure how often it dropped resolution, but we also don't know how often it drops on 1X either.

According to DF, the single player mode runs at 1080p most of the time on XB1.

But indeed you're right, i read too fast. However, the biggest improvements seem to be reserved for the single player mode : "Many of the improvements to Campaign also make it to Versus and Horde, including 4K, HDR, higher resolution textures, improved draw distances, and Dolby Atmos Support."
 
According to DF, the single player mode runs at 1080p most of the time on XB1.

But indeed you're right, i read too fast. However, the biggest improvements seem to be reserved for the single player mode : "Many of the improvements to Campaign also make it to Versus and Horde, including 4K, HDR, higher resolution textures, improved draw distances, and Dolby Atmos Support."
think there was also things like higher geometry also.
somewhere on site it has list of changes, but here it is in interview format
https://gearsofwar.com/en-us/community/gears-4-xbox-one-x
 
think there was also things like higher geometry also.

Yeah, but like i said those improvements seem to be only present in the single player mode.

On the multiplayer, you only have better textures + better draw distance.
 
Yeah, but like i said those improvements seem to be only present in the single player mode.

On the multiplayer, you only have better textures + better draw distance.
Many of the improvements to Campaign also make it to Versus and Horde, including 4K, HDR, higher resolution textures, improved draw distances, and Dolby Atmos Support.
doesn't actually mean these are the only things.
but even if it where the higher draw distance is something that affects geometry.
also fact is maybe its cpu bound for that game.
at 30fps we know for a fact it has a lot higher quality settings, and AC:O, D2(can't remember which we was talking about) both seem to be 30fps games, so at XO settings may have reached 4k native with optimization and work.

Gears has surpassed that. Shame we don't get to see it already though. Should be providing high quality captures to DF for our entertainment :smile2:
 
at 30fps we know for a fact it has a lot higher quality settings, and AC:O, D2(can't remember which we was talking about) both seem to be 30fps games, so at XO settings may have reached 4k native with optimization and work.

Gears has surpassed that.

You don't know what is the resolution of ACO on XB1. I guess it's 900p, so it's not comparable to Gears 4 which runs at 1080p most of the time on XB1, at least for the single player.

But more generally, native 4k has 100% more pixels than 2160C (best efforts on Pro). I don't see how the X could output 100% more pixels than the Pro without some serious downgrades.

If we take HZD as an example, i think that native 4K is impossible on X even with the same assets used by the Pro. I could be wrong though, but once again, we're talking about 100% more pixels.
 
You don't know what is the resolution of ACO on XB1. I guess it's 900p, so it's not comparable to Gears 4 which runs at 1080p most of the time on XB1, at least for the single player.

But more generally, native 4k has 100% more pixels than 2160C (best efforts on Pro). I don't see how the X could output 100% more pixels than the Pro without some serious downgrades.

If we take HZD as an example, i think that native 4K is impossible on X even with the same assets used by the Pro. I could be wrong though, but once again, we're talking about 100% more pixels.
correct, that's why when you said 1X couldn't do AC:O at XO setting native 4k, I disputed that and said you don't know that.
I expect any XO 1080p game at XO settings and fps can easy run on 1X and 900p may be possible with work and optimization.

you then tried to use gears to prove your point, but gears is doing a lot more than just XO settings, and MP unsure just how much more but still higher than XO settings. Which in turn proved it could do it.

Now your using HZD, and saying it would require a serious downgrade. I'm sure it would.
But it would be able to run it at XO settings whatever they may be running at 1080p, and possibly 900p. how much work is required for it if it was running at 900p is unknown (and may not be possible)

the point is 1X can run XO games native 4k at same settings, 900p may also be possible but would require work.
Which may not be worth investing in when 1800p upscaled may be good enough or not even worth trying if engine already implements CBR.
it didn't do too badly scaling up a 720p60 game XO settings to native 4k around 40fps without much optimization I believe.

so native is possible and because it's using CBR doesn't mean it couldn't run the XO version at native 4k. It's using CBR because studio has invested in that tech for their engine, which means get more for slight drop in IQ
 
This implies that the Pro's iGPU has 2x everything there is on the PS4's GPU, not only CU and TMU amount. That would mean there are 64 ROPs in it.
Otherwise turning off half the ROPs in the Pro would put it running with only 16 ROPs, which is half of what the PS4 has and would mean hitting an obvious bottleneck in base compatibility mode.

not really, you can't take it that literally or we would also have 512bit (twice the MCs) memory and many other things doubled
if you look at how typically AMD makes other cards based on the same physical die, they disable SPs/TMUs and keep the ROPs unchanged (like 7870 vs 7850, 480 vs 470 and so on), I think 32ROPs it's pretty much the correct spec for the PS4 Pro,
 
It should help in more situations than "only" a single shader bottleneck. Async would balance the load from all concurrent shaders. So even if an individual shader isn't ALU bound, it should benefit any concurrent work.
That's true, but not as simple as it sounds like.Running fp16 instead of fp32 code doesn't directly allow the CU to run more concurrent threads (maximum is still 40 waves). fp16 will save some register space, and this can lead to more concurrency (more waves fit to register file at same time). Also fp16 math completes the heavy ALU portions twice as fast. However if the shader was bound by something else than ALU, this actually means that the ALU can't hide as much latency anymore. If the GPU simply runs more waves of the same kernel, FP16 doesn't help at all in this case (every wave just waits more). But if a CU has mixed workload (only possible on AMD GPUs) containing waves from multiple kernels then FP16 helps, because waves hitting the bottleneck will reach the next memory/filter operation sooner -> wait sooner, allowing waves from other kernels to run more frequently on the same CU.

FP16 is definitely better with games/engines using lots of async compute or compute overlap.
Indirectly, even if no ALU bottleneck exists, packed FP16 could still result in faster performance by consuming less power. Allowing for an increase in clockspeed that will directly help with any bottleneck. That's difficult to quantify, but with cards throttling to a specific TDP it's a valid optimization strategy.
No console has ever had turbo clocks based on TDP. Saving power doesn't directly give you any performance gains. Power saving is however very important strategy on mobile phones. Throtting can result in more than 50% GPU performance drop on modern flagship phones. Double rate FP16 is great when you need to race to sleep. But lately desktop GPU IHVs have also introduced TDP based turbo clocks: Modern Nvidia desktop GPUs have pretty high turbo clocks. AMD still doesn't, but Vega's 1.5+ GHz clock rate points out that AMD is following the suit. It is definitely going to be worth thinking about TDP in future high end GPU code. Saving bandwidth and ALU in shaders where those are not bottleneck is going to be wise.
 
Back
Top