Digital Foundry Article Technical Discussion [2021]

Status
Not open for further replies.
Transcript from the video I pasted

I have no disagreement here. I'm just trying to say specifically the additional sampler feedback(collecting misses) features in xbox are useful but not mandatory. There are many ways one can go about implementing rendering. Even going to the extreme of UE5 or dreams that change geometry representation and even use sw rendering implemented as compute shaders.

I wonder how much ue5 for example uses SFS or is ue5 using completely custom renderer implemented in compute? UE5 is good example as it's probably the engine that depends on most heavily in streaming. At least if we discount sony 1st party titles/engines like ratchet&clank/spiderman miles morales.
 
So I'm gathering from this that the 2.5x multiplier claimed by Microsoft isn't likely to be in reference to the best alternative streaming methods but rather to something more naive like the Unity example posted earlier?
You're ultimately relying on the GPU hardware to do texture sampling, and you cannot peek at what was sampled. So you make decisions based upon what you think is being sampled.

The best systems will just do better at guessing at what was sampled and making decisions from there. With SFS you are provided feedback on how you did, and you can choose what to do with it. Full link on how it works above. Flowchart is fairly high level but useful in describing why a developer would want it.

You sample tiles, with SFS it tells you what you got; which may not be what you wanted. You then make another request for those tiles based upon knowing what you sampled previously, or whatever it is you wanted for instance. The key here from the Game Demos shown earlier, is that SFS provides you the feedback of your samples in which you create requests for new tiles to represent the ones you didn't want. You can unload those resources and that is where the savings are happening. It is not solving visibility, it is solving the decision of which MIP tiles it should have loaded and where.
 
Last edited:
Here is a another angle. Let's say we do ray tracing instead of raster. We get exact hits to triangles and textures. If we use mostly ray tracing SFS is not needed as ray tracing provides us hits. This becomes then issue that how long does it take to fetch the misses? Similar issue with misses is happening with SFS. Ray tracing makes things more difficult though. Any kind of frustrum/visibility based optimizations go out to trash bin as traced rays can hit things outside frustrum(lights, reflections,...)

For ray tracing we could first collect the hits, fetch textures best we can and then shade in another pass. What is missed is missed same as SFS. Missed things come available later and perhaps they still are needed or perhaps not. We still would want to predict what is needed to avoid misses. Miss is unfortunate thing when it happens.
 
Here is a another angle. Let's say we do ray tracing instead of raster. We get exact hits to triangles and textures. If we use mostly ray tracing SFS is not needed as ray tracing provides us hits. This becomes then issue that how long does it take to fetch the misses? Similar issue with misses is happening with SFS. Ray tracing makes things more difficult though. Any kind of frustrum/visibility based optimizations go out to trash bin as traced rays can hit things outside frustrum(lights, reflections,...)

For ray tracing we could first collect the hits, fetch textures best we can and then shade in another pass. What is missed is missed same as SFS. Missed things come available later and perhaps they still are needed or perhaps not. We still would want to predict what is needed to avoid misses. Miss is unfortunate thing when it happens.
depends if you're referring to visibility hits or LOD hits. You can hit, but still sample an undesirable LOD.
 
Sorry for terrible formatting in my previous post I am sitting on a old tablet

Did a quick pass at making it a bit more seemly, but possibly still missing nuances and natural breaks from the video.
 
depends if you're referring to visibility hits or LOD hits. You can hit, but still sample an undesirable LOD.

I think that is same between rt and sfs. One would want to keep low(er) level mip maps always in ram to be able to sample something. Missed data is unlikely to come in same frame as miss was hit. It probably is many frames before the missed data is in ram and available to be used.
 
I think that is same between rt and sfs. One would want to keep low(er) level mip maps always in ram to be able to sample something. Missed data is unlikely to come in same frame as miss was hit. It probably is many frames before the missed data is in ram and available to be used.
I suspect it a lot of this depends on how you want to sample textures. If you want to use the 3D Pipeline to sample or you want to use your own compute shader to sample.
I don't know if SFS is available to use in a compute shader.

ie.
SampleLevel() is available in compute shader for invocation.
Sample() is not available in compute shader.

going further, just reading through. unless I missed something; SFS requires Tiled Resources to run, which many VT systems have never wanted to adopt.
 
Last edited:
xsx has modest advantage in rt mode as expected (as ray/triangle intersection depands of cu count and clock)
2020-11-18_8-38-26-1030x579-1.png.webp
 
xsx has modest advantage in rt mode as expected (as ray/triangle intersection depands of cu count and clock)
2020-11-18_8-38-26-1030x579-1.png.webp
The frame rates are capped in that demo. How do you know how large the difference when XSX is capped at 60 and PS5 is below the cap?
Dips and peaks are not expected to be the same between the two.
 
The frame rates are capped in that demo. How do you know how large the difference when XSX is capped at 60 and PS5 is below the cap?
Dips and peaks are not expected to be the same between the two.
maybe its closer to theoretical 20-25% ? ;) who knows but both console dip to 50ish in some scenes
 
maybe its closer to theoretical 20-25% ? ;) who knows but both console dip to 50ish in some scenes
that's fine, I was just curious how you look at the clamping problem that's all. Honestly I don't know how close or far they are, but dips and peaks aren't the same unfortunately. As per the other thread, RT is a fixed calculation here regardless of resolution, so rasterization speed will matter as well and that is going to be dependent on other factors. If you want to single out the RT aspect, you'd have to separate out the rasterization aspect.

At lower resolutions in this title, on the PC side of things, OlegH found RT takes longer the lower the resolution indicating a CPU bottleneck. So that's basically why I don't like to use this as a RT benchmark. Since they both use CBR, the rendering resolution may be lower, possibly leading to a CPU bottleneck which could cap RT performance.

Which is why the dips sometimes intersect or get very close of one another. I do wonder if that is a CPU issue etc. Not sure. Too little information to go on.
 
that's fine, I was just curious how you look at the clamping problem that's all. Honestly I don't know how close or far they are, but dips and peaks aren't the same unfortunately. As per the other thread, RT is a fixed calculation here regardless of resolution, so rasterization speed will matter as well and that is going to be dependent on other factors. If you want to single out the RT aspect, you'd have to separate out the rasterization aspect.

At lower resolutions in this title, on the PC side of things, OlegH found RT takes longer the lower the resolution indicating a CPU bottleneck. So that's basically why I don't like to use this as a RT benchmark. Since they both use CBR, the rendering resolution may be lower, possibly leading to a CPU bottleneck which could cap RT performance.

Which is why the dips sometimes intersect or get very close of one another. I do wonder if that is a CPU issue etc. Not sure. Too little information to go on.
My experience on PC has always been that RT has a larger percentile hit to performance at lower resolutions. I've always assumed that this was because it's more of a fixed cost, and the raster time of each frame would be lower at lower resolutions.
 
The frame rates are capped in that demo. How do you know how large the difference when XSX is capped at 60 and PS5 is below the cap?
Dips and peaks are not expected to be the same between the two.
About 5% advantage for XSX in this game using exactly similar scenes shown by VGTech (when both are dropping). Most gameplay scenes shown by DF weren't exactly like for like.
 
My experience on PC has always been that RT has a larger percentile hit to performance at lower resolutions. I've always assumed that this was because it's more of a fixed cost, and the raster time of each frame would be lower at lower resolutions.
This is what I assumed, actually, I always assumed RT would vary with resolution considering the nature of rays per pixel cast. But I dunno, finding out RT was a fixed % of frame, then it made sense to look at it the way you do. But then finding out that lower resolution had a longer RT time was confusing.
 
About 5% advantage for XSX in this game using exactly similar scenes shown by VGTech (when both are dropping). Most gameplay scenes shown by DF weren't exactly like for like.
you're only looking at dips though. That's like looking at a massive 1 million frame dataset. Clamping the max value of both series to 60 and comparing the minimum values and making a claim on performance based on the remaining non-clamped data as a representation for the whole population.

No one would ever do that. While certainly that will be how players experience the game. That is not an evaluation of how successfully the hardware is running the game.
 
This is what I assumed, actually, I always assumed RT would vary with resolution considering the nature of rays per pixel cast. But I dunno, finding out RT was a fixed % of frame, then it made sense to look at it the way you do. But then finding out that lower resolution had a longer RT time was confusing.
That is confusing -- you mean longer in total, not longer proportionally? Something weird is up.

As far as fixed costs, you can shoot less rays per pixel, but you could also choose to shoot a fixed count I guess. Additionally, as far as fixed costs go, dealing with the bvh tree (on gpu or on cpu -- theoretically you could do either one, not sure what the rt apis permit) is going to be the same regardless of resolution.
 
My experience on PC has always been that RT has a larger percentile hit to performance at lower resolutions. I've always assumed that this was because it's more of a fixed cost, and the raster time of each frame would be lower at lower resolutions.

Could you be seeing fixed cost(overhead?) of ray tracing? Building BVH for example takes same amount of time irrespective of display resolution. Another thing could be some fixed hw/driver overhead that is more visible in lower resolutions.

Second thing that comes to mind is effect of caching. Lowering rendering resolution makes the rays more divergent as they still have to cover same area with less rays. These more divergent rays could be worse for cache and cause relatively worse performance. Maybe in higher res/higher ray count case hw/driver can group rays together in more cache friendly way. This could lead to lowered resolution not to give linear performance increase. The potential bottle neck due to cache misses can happen both in going through BVH and shading steps.
 
Last edited:
you're only looking at dips though. That's like looking at a massive 1 million frame dataset. Clamping the max value of both series to 60 and comparing the minimum values and making a claim on performance based on the remaining non-clamped data as a representation for the whole population.

No one would ever do that. While certainly that will be how players experience the game. That is not an evaluation of how successfully the hardware is running the game.

If there was/is any 'additional headroom' it should be quite visible. Seeing that both are dropping down into the 50s (RT mode), lets me know that any additional or "worthwhile headroom" isn't there. Capping both at 60fps was to guarantee 60fps, which neither system is quite capable of maintaining while RT is enabled. We can talk about potential max values and uncapped framerates all day and their relevance to the bigger picture when comparing performance metrics, but as of now, whatever additional headroom XBSX has over PS5 while running RE8 in RT-mode has manifested itself into a 9% framerate advantage which Rich mentioned.
 
If there was/is any 'additional headroom' it should be quite visible
how would it be visible though if you clamped the headroom at 60.

I mean, hypothetically, what if I clamped the frame rate to 55 fps? You'll see a perfect straight line until the really bad dip; and when they dip both they're within 1fps of each other. at 54 and 53 respectively. Is their performance gap < 3%?

I'm not trying to say that XSX is performing better. I'm just trying to ensure there is a separation of arguments that
a) PS5 performs more or less like XSX in the game (true; respectably there's no difference imo, 5% is not enough to really matter)
b) PS5 performs more or less like XSX with respect to the settings they have provided and within range of 9% (true, from an experience perspective you're unlikely to notice without a metric counter)
c) That the additional RT Units that XSX has only manifest to a 9% increase in performance over the clockspeed differential on PS5 for ray tracing (false, the metrics are not an indication of how the individual components are working towards their final output)

You can't prove C. Because at the very least you'd need to see the whole thing uncapped to really know what's going on beneath the hood.

Typically XSX has been a poor performer with a lot of alpha, how do you separate it dipping from having issues with alpha and PS5 not having issues with alpha. That may be a scenario where PS5 is making up time on RT by being better at rasterization since RT is a fixed cost. While I'm not saying it is, I'm just saying, you can't use this to describe how well the hardware performs on RT. Dips are not equivalent (see Hitman 3)

I mean; there really isn't enough RT computation here to really put the RT units to test.
 
Last edited:
Status
Not open for further replies.
Back
Top