Velocity Architecture - Limited only by asset install sizes

Really cool tech. So does this take away from GPU power for other tasks? So essentially trading GPU power for more efficient memory and bandwidth?
What do you mean?
You talking about on PC, console, bit more context in what you mean.

Apart from gpu decompression on pc it wont take anything away from gpu as far as I can tell. Unless you try processing all the SFS feedback when they say doing it stochastically gives good results with less load. Think they said about descarding about 99%.
 
Really cool tech. So does this take away from GPU power for other tasks? So essentially trading GPU power for more efficient memory and bandwidth?

You also saving cpu power on a PC as its normally doing the decompression. And from my limited search, lossless based decompression is around 40-200X faster on a gpu vs. a cpu.


https://on-demand.gputechconf.com/gtc/2016/posters/GTC_2016_Algorithms_AL_11_P6128_WEB.pdf

Shows how much latency can be added by decompressing on a cpu when calling data from SSD to gpu memory. And how GPU decompression can reduce that latency.
 
Last edited:
It should do for sure.
Is SFS pretty much automatic, or is there a bit of work on the dev end to use it?
Well, SF is more or less a based on an "info" (feedback) for the engine what is needed and what not (more or less). So it must actively get integrated. It is nothing that has an automatically integrated. Just like mesh-shaders. If the engine/game does not use it, it is more or less "useless" and done the traditional way.
 
Yes, the decompressor is a hardware block as said before. The lack of other IO hardware takes you from the 100% SSD speed to the 20% real result. See Cernys slide. No software can overcome that 80% without eating many CPU resources.


you canty quote Cerny who compare PS5 to PC and then compare it to XSX. They put a lot of work to overcome various limitations and maximise sdd. Everything is explained here

 
Yes, the decompressor is a hardware block as said before. The lack of other IO hardware takes you from the 100% SSD speed to the 20% real result. See Cernys slide. No software can overcome that 80% without eating many CPU resources.
This is really not true or embellished reality. You do not need 100% IO bandwidth all the time. Normally you only need a fraction of the available bandwidth, but when you need it, you want to have it as fast as possible.
Even without a hardware-block (at least xbox has it) in real-life workflows it might still only make a minor difference. E.g. Microsoft concentrated more on only load things that are really needed, so the IO-bandwidth and IO operations getting even less of a limiting factor.
 
Even without a hardware-block (at least xbox has it) in real-life workflows it might still only make a minor difference. E.g. Microsoft concentrated more on only load things that are really needed, so the IO-bandwidth and IO operations getting even less of a limiting factor.
The other reason why hardware blocks are good, and something folks don't appreciate until you debug a decompression routine, is the cache hit for CPU decompression. If you're already tight on cache running your massive open world, throwing CPU-decompression means more cache contamination. CPU-decompression is fast because it leverages cache.
 
you canty quote Cerny who compare PS5 to PC and then compare it to XSX. They put a lot of work to overcome various limitations and maximise sdd. Everything is explained here


The past Gamestack presentation has clarified a lot of things and we now know more or less how exactly XVA actually works:
(1) It seems now increasingly clear that parallelism was indeed a fundamental design philosophy of not only the GPU but the whole system. This is a stark difference from Mark Cerny's approach of fast and narrow as expounded in The Road to PS5. That MS will hold such a view is not surprising as parallelism is now viewed as the future of high end computing by most of the big IT players (there is a very helpful talk by John Henessy on this very subject).
(2) Sample feedback outperforms (by a 2.5x multiplier) existing texture streaming solutions. Basically the difference between guessing visibility and knowing it for sure; one can be more aggressive with the texture budget in the latter case.
(3) Sample feedback enables extreme granularity. The bulk of data requested are a collection of tiles and in keeping with the batch-like functioning of a GPU, those requests will occur in batches.
(4) DirectStorage, through the windows storage stack, enables processing of those many small requests one batch at a time which jives with the optimal functioning of NVME drives and cut down dramatically on CPU overhead.
(5) Importantly, DirectStorage cuts down on latency by optimising path length, bypassing indirection of the filesystem and the FTL of volume layers. This is certainly being achieved through Flashmap which is tailor-made for that exact function (https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/flashmap_isca2015.pdf). If true, this will confirmed memory mapping of a portion at least of the SSD.

How about that for brute force?
 
Here is a vid tech demo of XVA with emphasis on the benefits of SFS over just using XVA:

This is very nice. He talks about the speed of feedback and not seeing things being loaded. How quick does the feedback come, real time or per frame? That demo seemed to be running in the high hundreds to low thousands of FPS. A far cry from 60 or 30fps, slower feedback and more to load.

The multiplier for memory and IO does not change but will the overall experience?

The ssd will be the same but I wonder if the feedback is delayed by 33ms if it works as well.

Just my musings.

We need some true this generation games using all the next gen console tech, it should be amazing
 
This is very nice. He talks about the speed of feedback and not seeing things being loaded. How quick does the feedback come, real time or per frame? That demo seemed to be running in the high hundreds to low thousands of FPS. A far cry from 60 or 30fps, slower feedback and more to load.

The multiplier for memory and IO does not change but will the overall experience?

The ssd will be the same but I wonder if the feedback is delayed by 33ms if it works as well.

Just my musings.

We need some true this generation games using all the next gen console tech, it should be amazing
The greater the frametime, the better SFS should work as it will give a little bit more time to DMA the requested tile from the SSD. It is at high framerates that I think SFS may get into trouble.
 
Really cool tech. So does this take away from GPU power for other tasks? So essentially trading GPU power for more efficient memory and bandwidth?
On the PC the GPU is doing the decompression since it doesn’t have the decomp block like in the Series X! Still much better than using a CPU! And even better the data is decompressed when it reaches VRAM.
 
At this point Sampler Feedback Streaming is only available on Xbox Series X/S?
I have heard that it will come to PC via Direct 12 Ultimate, but at this point it hasn't?
Microsoft said they added specific hardware to the Xbox alone for SFS from what I recall.
James Stanard said that only Sampler Feedback was a Direct X 12 Ultimate feature, and not the streaming.

So question is, can it be applied to Nvidia GPUs for instance if they don't have the same hardware as Xbox?
I haven't seen Nvidia advertise it, but I have seen reports that it is coming to PC via DX12U.
I think people are getting confused with SF and SFS.
So is it coming to PC, or is Stanard right that its not a DX12U feature?
 

Attachments

  • Screenshot_20210428-144503_Samsung Internet.jpg
    Screenshot_20210428-144503_Samsung Internet.jpg
    351.9 KB · Views: 12
At this point Sampler Feedback Streaming is only available on Xbox Series X/S?
I have heard that it will come to PC via Direct 12 Ultimate, but at this point it hasn't?
Microsoft said they added specific hardware to the Xbox alone for SFS from what I recall.
James Stanard said that only Sampler Feedback was a Direct X 12 Ultimate feature, and not the streaming.

So question is, can it be applied to Nvidia GPUs for instance if they don't have the same hardware as Xbox?
I haven't seen Nvidia advertise it, but I have seen reports that it is coming to PC via DX12U.
I think people are getting confused with SF and SFS.
So is it coming to PC, or is Stanard right that its not a DX12U feature?

I can be wrong but i will give it a try. It looks like SFS requires DirectStorage, DS on XSX/S is build with flashmap as a backbone and imo here is the problem.
I dont think this can be achived with a simple DX upgarde. Flashmap is not a simple IO improvement, it completely redesigns how ssd is accessed. Perhaps it will come later as an update to os? Maybe msft is not planning to release it on Pc, i have no idea. There is nothing hw wise that says it cannot be done thou.
In the paper about sampler feedback that @Ronaldo8 linked on previous page i dont see anything hw specific to amd. I think it shouldn't be a problem for a modern nvidia gpu.
 
I can be wrong but i will give it a try. It looks like SFS requires DirectStorage, DS on XSX/S is build with flashmap as a backbone and imo here is the problem.
I dont think this can be achived with a simple DX upgarde. Flashmap is not a simple IO improvement, it completely redesigns how ssd is accessed. Perhaps it will come later as an update to os? Maybe msft is not planning to release it on Pc, i have no idea. There is nothing hw wise that says it cannot be done thou.
In the paper about sampler feedback that @Ronaldo8 linked on previous page i dont see anything hw specific to amd. I think it shouldn't be a problem for a modern nvidia gpu.

Sampler feedback is a feature already available in RTX 20 series cards (introduced 2 years back). The only hardware customization (although significant) on series console not included (as of now) in available GPU cards are specialized texture filters and the feedback map implemented in caches (though I guess the latter can still be implemented somehow?).
Flashmap, if it is indeed the solution adopted by MS, is a purely software implementation that enables SSD memory-mapping and the resolution of the FTL and filesystem layers into a single one (a software wrapper that treats every file like a singular small SSD). PCs and datacenters are the more obvious deployment environments to be honest.
 
Sampler feedback is a feature already available in RTX 20 series cards (introduced 2 years back). The only hardware customization (although significant) on series console not included (as of now) in available GPU cards are specialized texture filters and the feedback map implemented in caches (though I guess the latter can still be implemented somehow?).
Flashmap, if it is indeed the solution adopted by MS, is a purely software implementation that enables SSD memory-mapping and the resolution of the FTL and filesystem layers into a single one (a software wrapper that treats every file like a singular small SSD). PCs and datacenters are the more obvious deployment environments to be honest.
So in saying all that, could SFS be implemented on GPUs that don't have the same customizations as Series X/S?
 
Sampler feedback is a feature already available in RTX 20 series cards (introduced 2 years back). The only hardware customization (although significant) on series console not included (as of now) in available GPU cards are specialized texture filters and the feedback map implemented in caches (though I guess the latter can still be implemented somehow?).
Flashmap, if it is indeed the solution adopted by MS, is a purely software implementation that enables SSD memory-mapping and the resolution of the FTL and filesystem layers into a single one (a software wrapper that treats every file like a singular small SSD). PCs and datacenters are the more obvious deployment environments to be honest.

People from MSFT are calling ssd in xsx/s as a virtual memory on multiple occasions so it only seems logical. But nothing confirmed.

https://thegeek.games/2020/01/02/xbox-series-x-the-ssd-will-be-used-as-virtual-ram-too/

"Thanks to their speed, developers can now use the SSD practically as virtual RAM. The SSD access times come close to the memory access times of the current console generation. .....
A graphic designer no longer has to worry about when GDDR6 ends and when the SSD starts. "


"PCs and datacenters are the more obvious deployment environments to be honest." I am really looking forward to it, this tech would make my life so much easier.
 
Back
Top