Digital Foundry Article Technical Discussion [2021]

Status
Not open for further replies.
I've not followed it too closely tbh but everything I've seen mentioned was more focussed on the VRAM multiplying effect. If I'd have stopped and thought about it I should have realised that would have the same impact on streaming/loading too, but tbh it never really dawned on me.
next step
stop and think about what this will do to the gpu
because with sfs it will process a lot less data
 
Aren't you describing the functionality of the custom mip blending hardware in the Series consoles rather than SFS itself?

No, I'm not. Basically I'm saying that many games look at camera movement, player position, player movement, object position, object movement etc. Based on this information the needed tiles from texture(s) for correct mipmap levels are loaded and cached. Loading and caching is done as tiles, not as full textures. If something is missed the lower level lod is used instead. SFS helps this by providing a neat way to collect misses and then load more tiles. However this loading happens after the miss so data will inevitably come too late and potentially produce popin effect. A ton of games are already using streaming without relying on sfs. I view sfs as a thing that can help, but it doesn't replace traditional way of figuring out what needs to be streamed.

My gut feeling is that 10 years from now streaming will rely heavily on neural networks. What to stream feels like a problem neural network can solve better than human(statistics of what is needed). Training can be done as a GAN as one network will produce result for what is needed and another one has source of truth via sfs. Then you keep training and improving your network until result is better than heuristic/human. This allows similar training to be used as alpha zero/go/chess uses. It's not easy, but it's clearly doable and I bet it will happen.
 
No, I'm not. Basically I'm saying that many games look at camera movement, player position, player movement, object position, object movement etc. Based on this information the needed tiles from texture(s) for correct mipmap levels are loaded and cached. Loading and caching is done as tiles, not as full textures. If something is missed the lower level lod is used instead. SFS helps this by providing a neat way to collect misses and then load more tiles. However this loading happens after the miss so data will inevitably come too late and potentially produce popin effect. A ton of games are already using streaming without relying on sfs. I view sfs as a thing that can help, but it doesn't replace traditional way of figuring out what needs to be streamed.

My gut feeling is that 10 years from now streaming will rely heavily on neural networks. What to stream feels like a problem neural network can solve better than human(statistics of what is needed). Training can be done as a GAN as one network will produce result for what is needed and another one has source of truth via sfs. Then you keep training and improving your network until result is better than heuristic/human. This allows similar training to be used as alpha zero/go/chess uses. It's not easy, but it's clearly doable and I bet it will happen.

i don't think this is how it works, i may be wrong but i got impression that this is scenario where SF will help you what to load and at what level (which mips and even which parts of the mips).

https://microsoft.github.io/DirectX-Specs/d3d/SamplerFeedback.html
 
No, I'm not. Basically I'm saying that many games look at camera movement, player position, player movement, object position, object movement etc. Based on this information the needed tiles from texture(s) for correct mipmap levels are loaded and cached. Loading and caching is done as tiles, not as full textures. If something is missed the lower level lod is used instead. SFS helps this by providing a neat way to collect misses and then load more tiles. However this loading happens after the miss so data will inevitably come too late and potentially produce popin effect. A ton of games are already using streaming without relying on sfs. I view sfs as a thing that can help, but it doesn't replace traditional way of figuring out what needs to be streamed.

My gut feeling is that 10 years from now streaming will rely heavily on neural networks. What to stream feels like a problem neural network can solve better than human(statistics of what is needed). Training can be done as a GAN as one network will produce result for what is needed and another one has source of truth via sfs. Then you keep training and improving your network until result is better than heuristic/human. This allows similar training to be used as alpha zero/go/chess uses. It's not easy, but it's clearly doable and I bet it will happen.
SFS isn't responsible for resolution after a miss. SFS provides the precise data every time you want to see what you sampled from a texture. There's no guess work provided by SFS. SFS provides a way to tell you what the hardware sampled when you asked it to sample something. How developers choose to use that data and how many times they want to see/store the results of the sample is up to them - effectively the feedback system is designed to provide back to the developer information (which MIP was sampled on a tile, and where on that tile it was sampled) to improve their accuracy on their guesses of which tiles are needed next (thus the name Sampler Feedback).

SFS doesn't resolve pop-in necessarily. The goal of SFS is to reduce the amount of committed memory in the residency list (if you are streaming tiles) due to providing better feedback on which mips you should be using and where on the tiles you are sampling - thus reducing overhead on committed residency tiles which may not be in use.
 
Last edited:
In DF's Returnal piece, I appreciate them talking through the graininess of the image. I'd clocked it, but not really thought about why it had that element to it, given the temporal stability in the other parts of the image.

(and I think the game looks great. Good trade off I think)
 
Last edited:
SFS isn't responsible for resolution after a miss. SFS provides the precise data every time you want to see what you sampled from a texture. There's no guess work provided by SFS. SFS provides a way to tell you what the hardware sampled when you asked it to sample something. How developers choose to use that data and how many times they want to see/store the results of the sample is up to them - effectively the feedback system is designed to provide back to the developer information (which MIP was sampled on a tile, and where on that tile it was sampled) to improve their accuracy on their guesses of which tiles are needed next (thus the name Sampler Feedback).

I think we are trying to say about same thing about how sfs works. We seem to disagree on where it leads to though. Basically sfs samples, misses and then fetches. If possible it's better to predict which tiles are needed and fetch them ahead of time.

What I'm claiming is good solution is predictive(heuristic/dnn). Prediction can be made better by feeding it data from sfs. I wouldn't replace prediction by simple just fetch what was sampled logic. Especially if there is no cancel for in flight requests. In some cases the data might not be needed in future frames once data is available. This can happen if camera/player moves, object rotates, object moves out of frustrum or object is destroyed etc.

Prediction process feels like something neural net could do very well. Of course the neural net could take sfs as input in addition to object movement/rotation, object size, camera movement etc. I really like neural net here as the ground truth is so easy to generate. GAN should be able to do a good job here,...
 
Last edited:
Alex's explanation of SFS was great. I hadn't appreciated its relevance to streaming bandwidth before now but it can act as a multiplier there in the same way it can for vram capacity.

Does the PS5 have any equivalent of this? Even if its not 100% as effective? If not then I could imagine the XSX having a big advantage in titles that use SFS. Even the PS5's SSD speed advantage could be nullified.
Essentially this is what was in this forum for months now ;)
One does it by brute force (via bandwidth) and one by trying to reduce what really is needed (and bandwidth .. ~2.5 GB is still massive). The later has the advantage that the GPU and GPU-memory (and GPU memory bandwidth) hasn't to do much with the "waste". And I really can't say how much the additional "S" (from SFS) does save on the GPU resources. They are making additional steps (filtering in one pass) not available to PC so far and I really can't say how much this will make a difference. This is also a step that can save many GPU resources.
 
Basically sfs samples, misses and then fetches. If possible it's better to predict which tiles are needed and fetch them ahead of time.
What SFS samples and/or misses I don't think is necessarily the point. I look at SVT systems behaving like VT systems, the GPU operates as though all the textures are resident in memory, but the reality is only so much is actually resident the rest of them are paged off on the slower storage. There's nothing that SFS can do to change that, if the memory is resident then no swap will occur, and vice versa.

What SFS is doing is providing a very accurate feedback on both the locations of samples and the mip level. So you're effectively optimizing what mips you need and where precisely you need them. Consider it additional granularity on your tile based system. Because of this granularity, you can release tiles you don't actually need until you need them, there isn't necessarily any more pop-in than any other VT system there is, theoretically there should be less. You can see here how it works in this radeon video:

Without SFS, you can see the effect on how much memory must be used to whole tiles not currently in view, and you can see how ungranular the tiles are as the demo flies through here with unity.
(Thought you need to rewind backwards to see the fly through)

If you think about tiles per mip, the closer you get to MIP 0, the number of tiles for the texture increases dramatically. As you go to higher mip levels say 2-10, the number of tiles to represent that MIP drop dramatically. So there is always going to be a boundary point in which you need to swap to lower MIP levels (load in more tiles) or increase your MIP levels (reduce the number of tiles). Without SFS, it's very difficult to determine how to do this accurately, so the often end result is to just store those non visible tiles into cache - as what you see with Unity.
 
Last edited:
Without SFS, you can see the effect on how much memory must be used to whole tiles not currently in view, and you can see how ungranular the tiles are as the demo flies through here with unity.

I don't agree on this. This only happens on worst case on worst possible implementation. It's very possible to create better than brute force algorithm to stream in textures. It can be small things like looking at size and distance of object and pulling in only the needed mip level(s). Or it can be better and take into account object position and rotation and only fetch specific tiles that are going to be visible. It's not like we didn't have megatexturing etc. available a long time ago,...

It will be interesting to see how for example unreal5 has solved streaming. I doubt they require sfs despite the whole engine being very streaming heavy,...
 
https://microsoft.github.io/DirectX-Specs/d3d/SamplerFeedback.html

Without Sampler Feedback

For background, the general texture space shading algorithm does not require sampler feedback. The texture space shading process works like this:

  1. Consider a 3-D object or 3-D scene element which should be shaded in texture space.

  2. Allocate a target texture of suitable resolution for how close the object will tend to be relative to the camera.

  3. Determine a scheme for mapping locations on the surface of that object, in world space, to areas of that target texture. Fortunately, real scenarios often have the notion of {U, V} co-ordinates per object, and a {U, V} unwrapping map to act as this scheme.

  4. Draw the scene, targeting the target texture. For this pass, it may be desirable to simply run a compute shader instead of a conventional graphics render, using a pre-canned mapping of geometry-to-target-space with no notion of a “camera”. This pass would be the pass in which expensive lighting operations are used.

  5. Draw the scene once again, targeting the final target. The object is rasterized to the screen. Shading the object is a simple texture lookup which already contains the result of the scene’s lighting computations. This is a far less expensive rendering operation compared to the previous step.

With Sampler Feedback
With the general texture-space shading algorithm described in the above section one challenge is, for the first pass, knowing which areas of the target texture to shade. Naively, one could shade the entire texture, but it could be expensive and often unnecessary. Perhaps, shading the entire texture would mean shading all facets of an object, even when only about half the facets can be viewed in the scene. Sampler feedback provides a means of reducing the cost of the shading pass.

Integration of sampler feedback with texture space shading means splitting up the first pass into two, yielding a three-pass algorithm if implemented straightforwardly. With sampler feedback, the texture-space shading operation would work like this:

Steps 1, through 3, are the same as in the above section.

  1. Draw objects straightforwardly to the final target in screen space. For each object with which texture-space shading will be used, keep a feedback map of which areas of objects’ target texture would be updated.

  2. For objects with which texture-space-shading will be used, draw the scene targeting the objects’ target texture. This pass would be the pass in which expensive lighting operations are used. But, do not shade areas of the target texture not included in the feedback map.

  3. Draw the scene once again, targeting the final target. The object is rasterized to the screen. Shading the object is a simple texture lookup which already contains the result of the scene’s lighting computations.
The ability to skip shading operations in step 5 above comprises a performance savings made available by sampler feedback.
 

I think we can all agree on XSX a single character in entire game has 0.1mm longer eyelashes = instant win
So its checkerboarded meaning its running at 1920x2160 and 1280x1440 internally?

The interlaced option in the PC menu should give the same experience, would it not?
 
I don't agree on this. This only happens on worst case on worst possible implementation. It's very possible to create better than brute force algorithm to stream in textures. It can be small things like looking at size of object and pulling in only the needed mip level(s). Or it can be better and take into account object position and rotation and only fetch specific tiles that are going to be visible. It's not like we didn't have megatexturing etc. available a long time ago,...
It's not about tile visibility though, that's sort of what I'm getting at. Here is a simple diagram I ripped off nvidia
RB31Lq7.jpg


The sampling algorithm here in this case are 4 dots. Black dots are misses for visibility, green dots are hits. Imagine doing the same thing for texture sampling, you get a hit, so you load that part of texture, great. But if you're using tiled resources you may want to know where (which of the 4 dots) scored a hit and at which MIP level the hit was scored on. That way the system will tell you precisely during render which MIP to call on and precisely what part of that tile you actually need. Think about the upper part of that triangle where you are only scoring 2 out of 8 hits. A decision needs to be made on which MIP level and which tiles are being used to represent that little bit of polygon. Without SFS you are not provided this information, so what to do? You still need to come up with a way to deal with it. So the SVT systems have no issue resolving these cases, but that doesn't necessarily mean that they are super efficient with tile and mip selection. They may select more tiles than necessarily (you make the assumption that a hit means all 4 dots are hits, so load 4 tiles instead of 1) And that's where SFS helps to improve both tile and mip selection, it tells you where you sampled and tells you at what mip level. What the engine decides to do with that information is up to the developers.
 
Last edited:
It's not about tile visibility though, that's sort of what I'm getting at. Here is a simple diagram I ripped off nvidia
RB31Lq7.jpg


The sampling algorithm here in this case are 4 dots. Black dots are misses for visibility, green dots are hits. Imagine doing the same thing for texture sampling, you get a hit, so you load that part of texture, great. But if you're using tiled resources you may want to know where (which of the 4 dots) scored a hit and at which MIP level the hit was scored on. That way the system will tell you precisely during render which MIP to call on and precisely what part of that tile you actually need. Think about the upper part of that triangle where you are only scoring 2 out of 8 hits. A decision needs to be made on which MIP level and which tiles are being used to represent that little bit of polygon. Without SFS you are not provided this information, so what to do? You still need to come up with a way to deal with it. So the SVT systems have no issue resolving these cases, but that doesn't necessarily mean that they are super efficient with tile and mip selection. They may select more tiles than necessarily (you make the assumption that a hit means all 4 dots are hits, so load 4 tiles instead of 1) And that's where SFS helps to improve both tile and mip selection, it tells you where you sampled and tells you at what mip level. What the engine decides to do with that information is up to the developers.

One does math based on visible geometry and textures attached to it to figure out what is needed. Based on the size of the rendered geometry the right mip levels can be fetched. There also is going to be a cache so it's ok. to fetch a little bit too much speculatively. One will also make some heuristics based on the motion and rotation of objects to decide which tiles to fetch/not fetch.

Something like meshlets could be tremendously useful on figuring this stuff out/optimizing it when sfs is not available/desired to be used. Meshlets would likely make things easier as where appropriate we can work on meshlet level for streaming/discarding/rendering versus caring about individual triangles.
 
Revealing df analysis, looking forward to the whole game. Yet another where the spreadsheet released by other analysts didn't really capture the experience.
 
There also is going to be a cache so it's ok. to fetch a little bit too much speculatively. One will also make some heuristics based on the motion and rotation of objects to decide which tiles to fetch/not fetch.
yea absolutely. And it's not that SVT systems today are piss poor or anything like that. They're fantastic. But if the developer is unsatisified or wants to make further optimizations in parts of their SVT pipeline, Sampler feedback has is a tool they can use to be provided feedback instead of having to guess on what to do with edge cases.
 
One does math based on visible geometry and textures attached to it to figure out what is needed. Based on the size of the rendered geometry the right mip levels can be fetched. There also is going to be a cache so it's ok. to fetch a little bit too much speculatively. One will also make some heuristics based on the motion and rotation of objects to decide which tiles to fetch/not fetch.

Something like meshlets could be tremendously useful on figuring this stuff out/optimizing it when sfs is not available/desired to be used. Meshlets would likely make things easier as where appropriate we can work on meshlet level for streaming/discarding/rendering versus caring about individual triangles.

Transcript from the video I pasted

“No matter how you manage texture residency, whether it's full mip chain, or partial mip chain, you have some decisions to make about what to load, and when. Like, when do you load that 4k mip0?

Well, maybe something tried to sample from it. Well, how would you know that? Because samplers are opaque, you don't have a built-in way of knowing.

You can try to calculate mip level selection yourself. Maybe, you can be really clever. Maybe, you can emulate the filter mode of the sampler that you're using in your shader. Maybe, you can try to be really precise about emulating it, but it's really, really hard.

And, if you're using tiled resources, it's even harder, because it not enough just to know what mip level you're going to end up with. You need to know where on that mip level. So, if you combine that with use of, like, anisotropic filtering or something, trying to emulate that choice of where you sample from and all the site of places that it could have sampled from, it's prohibitively hard. It's just deal-breakingly hard.

So, enter sampler feedback. Sampler feedback is a way of opening up that black box, so you can find out what mips you tried to sample from and it goes a step further than that. It will tell you what parts of those mips. So, one thing to note is that sampler feedback is not a complete overhaul of sampling hardware, but it's an extension to it. It's a GPU hardware feature that extends existing hardware designs, and gets you something new out of what used to be that closed black box.”
 
Last edited by a moderator:
yea absolutely. And it's not that SVT systems today are piss poor or anything like that. They're fantastic. But if the developer is unsatisified or wants to make further optimizations in parts of their SVT pipeline, Sampler feedback has is a tool they can use to be provided feedback instead of having to guess on what to do with edge cases.

So I'm gathering from this that the 2.5x multiplier claimed by Microsoft isn't likely to be in reference to the best alternative streaming methods but rather to something more naive like the Unity example posted earlier?
 
Status
Not open for further replies.
Back
Top