Velocity Architecture - more than 100GB available for game assets

Discussion in 'Console Technology' started by invictis, Apr 22, 2020.

  1. LiveGamer

    Newcomer

    Joined:
    Jan 29, 2021
    Messages:
    10
    Likes Received:
    5
    Really cool tech. So does this take away from GPU power for other tasks? So essentially trading GPU power for more efficient memory and bandwidth?
     
  2. Jay

    Jay
    Veteran Regular

    Joined:
    Aug 3, 2013
    Messages:
    3,512
    Likes Received:
    2,855
    What do you mean?
    You talking about on PC, console, bit more context in what you mean.

    Apart from gpu decompression on pc it wont take anything away from gpu as far as I can tell. Unless you try processing all the SFS feedback when they say doing it stochastically gives good results with less load. Think they said about descarding about 99%.
     
  3. dobwal

    Legend Veteran

    Joined:
    Oct 26, 2005
    Messages:
    5,704
    Likes Received:
    1,945
    You also saving cpu power on a PC as its normally doing the decompression. And from my limited search, lossless based decompression is around 40-200X faster on a gpu vs. a cpu.


    https://on-demand.gputechconf.com/gtc/2016/posters/GTC_2016_Algorithms_AL_11_P6128_WEB.pdf

    Shows how much latency can be added by decompressing on a cpu when calling data from SSD to gpu memory. And how GPU decompression can reduce that latency.
     
    #463 dobwal, Apr 22, 2021
    Last edited: Apr 22, 2021
    PSman1700 likes this.
  4. invictis

    Newcomer

    Joined:
    May 28, 2013
    Messages:
    105
    Likes Received:
    63
    It should do for sure.
    Is SFS pretty much automatic, or is there a bit of work on the dev end to use it?
     
  5. Allandor

    Regular Newcomer

    Joined:
    Oct 6, 2013
    Messages:
    614
    Likes Received:
    549
    Well, SF is more or less a based on an "info" (feedback) for the engine what is needed and what not (more or less). So it must actively get integrated. It is nothing that has an automatically integrated. Just like mesh-shaders. If the engine/game does not use it, it is more or less "useless" and done the traditional way.
     
    Kugai Calo likes this.
  6. mr magoo

    Newcomer

    Joined:
    May 31, 2012
    Messages:
    193
    Likes Received:
    322
    Location:
    Stockholm

    you canty quote Cerny who compare PS5 to PC and then compare it to XSX. They put a lot of work to overcome various limitations and maximise sdd. Everything is explained here

     
    RagnarokFF and PSman1700 like this.
  7. Allandor

    Regular Newcomer

    Joined:
    Oct 6, 2013
    Messages:
    614
    Likes Received:
    549
    This is really not true or embellished reality. You do not need 100% IO bandwidth all the time. Normally you only need a fraction of the available bandwidth, but when you need it, you want to have it as fast as possible.
    Even without a hardware-block (at least xbox has it) in real-life workflows it might still only make a minor difference. E.g. Microsoft concentrated more on only load things that are really needed, so the IO-bandwidth and IO operations getting even less of a limiting factor.
     
    PSman1700 likes this.
  8. DSoup

    DSoup meh
    Legend Veteran Subscriber

    Joined:
    Nov 23, 2007
    Messages:
    14,987
    Likes Received:
    11,086
    Location:
    London, UK
    The other reason why hardware blocks are good, and something folks don't appreciate until you debug a decompression routine, is the cache hit for CPU decompression. If you're already tight on cache running your massive open world, throwing CPU-decompression means more cache contamination. CPU-decompression is fast because it leverages cache.
     
    Allandor and iroboto like this.
  9. Ronaldo8

    Regular Newcomer

    Joined:
    May 18, 2020
    Messages:
    284
    Likes Received:
    348
    The past Gamestack presentation has clarified a lot of things and we now know more or less how exactly XVA actually works:
    (1) It seems now increasingly clear that parallelism was indeed a fundamental design philosophy of not only the GPU but the whole system. This is a stark difference from Mark Cerny's approach of fast and narrow as expounded in The Road to PS5. That MS will hold such a view is not surprising as parallelism is now viewed as the future of high end computing by most of the big IT players (there is a very helpful talk by John Henessy on this very subject).
    (2) Sample feedback outperforms (by a 2.5x multiplier) existing texture streaming solutions. Basically the difference between guessing visibility and knowing it for sure; one can be more aggressive with the texture budget in the latter case.
    (3) Sample feedback enables extreme granularity. The bulk of data requested are a collection of tiles and in keeping with the batch-like functioning of a GPU, those requests will occur in batches.
    (4) DirectStorage, through the windows storage stack, enables processing of those many small requests one batch at a time which jives with the optimal functioning of NVME drives and cut down dramatically on CPU overhead.
    (5) Importantly, DirectStorage cuts down on latency by optimising path length, bypassing indirection of the filesystem and the FTL of volume layers. This is certainly being achieved through Flashmap which is tailor-made for that exact function (https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/flashmap_isca2015.pdf). If true, this will confirmed memory mapping of a portion at least of the SSD.

    How about that for brute force?
     
  10. turkey

    Veteran Newcomer

    Joined:
    Oct 21, 2014
    Messages:
    1,096
    Likes Received:
    876
    Location:
    London
    This is very nice. He talks about the speed of feedback and not seeing things being loaded. How quick does the feedback come, real time or per frame? That demo seemed to be running in the high hundreds to low thousands of FPS. A far cry from 60 or 30fps, slower feedback and more to load.

    The multiplier for memory and IO does not change but will the overall experience?

    The ssd will be the same but I wonder if the feedback is delayed by 33ms if it works as well.

    Just my musings.

    We need some true this generation games using all the next gen console tech, it should be amazing
     
  11. Ronaldo8

    Regular Newcomer

    Joined:
    May 18, 2020
    Messages:
    284
    Likes Received:
    348
    The greater the frametime, the better SFS should work as it will give a little bit more time to DMA the requested tile from the SSD. It is at high framerates that I think SFS may get into trouble.
     
  12. mr magoo

    Newcomer

    Joined:
    May 31, 2012
    Messages:
    193
    Likes Received:
    322
    Location:
    Stockholm
    Thank you for sharing this, great read.
     
    thicc_gaf likes this.
  13. rntongo

    Newcomer

    Joined:
    May 23, 2020
    Messages:
    120
    Likes Received:
    106
    On the PC the GPU is doing the decompression since it doesn’t have the decomp block like in the Series X! Still much better than using a CPU! And even better the data is decompressed when it reaches VRAM.
     
  14. Ronaldo8

    Regular Newcomer

    Joined:
    May 18, 2020
    Messages:
    284
    Likes Received:
    348
    mr magoo likes this.
  15. Kugai Calo

    Newcomer

    Joined:
    Mar 6, 2020
    Messages:
    186
    Likes Received:
    181
    Sampler Feedback doesn't, decompression (on PC) does.
     
  16. invictis

    Newcomer

    Joined:
    May 28, 2013
    Messages:
    105
    Likes Received:
    63
    At this point Sampler Feedback Streaming is only available on Xbox Series X/S?
    I have heard that it will come to PC via Direct 12 Ultimate, but at this point it hasn't?
    Microsoft said they added specific hardware to the Xbox alone for SFS from what I recall.
    James Stanard said that only Sampler Feedback was a Direct X 12 Ultimate feature, and not the streaming.

    So question is, can it be applied to Nvidia GPUs for instance if they don't have the same hardware as Xbox?
    I haven't seen Nvidia advertise it, but I have seen reports that it is coming to PC via DX12U.
    I think people are getting confused with SF and SFS.
    So is it coming to PC, or is Stanard right that its not a DX12U feature?
     

    Attached Files:

  17. mr magoo

    Newcomer

    Joined:
    May 31, 2012
    Messages:
    193
    Likes Received:
    322
    Location:
    Stockholm
    I can be wrong but i will give it a try. It looks like SFS requires DirectStorage, DS on XSX/S is build with flashmap as a backbone and imo here is the problem.
    I dont think this can be achived with a simple DX upgarde. Flashmap is not a simple IO improvement, it completely redesigns how ssd is accessed. Perhaps it will come later as an update to os? Maybe msft is not planning to release it on Pc, i have no idea. There is nothing hw wise that says it cannot be done thou.
    In the paper about sampler feedback that @Ronaldo8 linked on previous page i dont see anything hw specific to amd. I think it shouldn't be a problem for a modern nvidia gpu.
     
  18. Ronaldo8

    Regular Newcomer

    Joined:
    May 18, 2020
    Messages:
    284
    Likes Received:
    348
    Sampler feedback is a feature already available in RTX 20 series cards (introduced 2 years back). The only hardware customization (although significant) on series console not included (as of now) in available GPU cards are specialized texture filters and the feedback map implemented in caches (though I guess the latter can still be implemented somehow?).
    Flashmap, if it is indeed the solution adopted by MS, is a purely software implementation that enables SSD memory-mapping and the resolution of the FTL and filesystem layers into a single one (a software wrapper that treats every file like a singular small SSD). PCs and datacenters are the more obvious deployment environments to be honest.
     
  19. invictis

    Newcomer

    Joined:
    May 28, 2013
    Messages:
    105
    Likes Received:
    63
    So in saying all that, could SFS be implemented on GPUs that don't have the same customizations as Series X/S?
     
  20. mr magoo

    Newcomer

    Joined:
    May 31, 2012
    Messages:
    193
    Likes Received:
    322
    Location:
    Stockholm
    People from MSFT are calling ssd in xsx/s as a virtual memory on multiple occasions so it only seems logical. But nothing confirmed.

    https://thegeek.games/2020/01/02/xbox-series-x-the-ssd-will-be-used-as-virtual-ram-too/

    "Thanks to their speed, developers can now use the SSD practically as virtual RAM. The SSD access times come close to the memory access times of the current console generation. .....
    A graphic designer no longer has to worry about when GDDR6 ends and when the SSD starts. "


    "PCs and datacenters are the more obvious deployment environments to be honest." I am really looking forward to it, this tech would make my life so much easier.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...