Velocity Architecture - more than 100GB available for game assets

Discussion in 'Console Technology' started by invictis, Apr 22, 2020.

  1. Kugai Calo

    Newcomer

    Joined:
    Mar 6, 2020
    Messages:
    184
    Likes Received:
    181
    Isn't the only implication of such "split" memory config being that when the game engine's allocator requests new pages, it needs to specify whether it wants higher bandwidth?
     
    iroboto and BRiT like this.
  2. rntongo

    Newcomer

    Joined:
    May 23, 2020
    Messages:
    119
    Likes Received:
    106
    IIRC HBCC would have handled this issue really well or whatever DMA controllers they're using in the Series X and PS5. If you think about it as a virtual address space and not the physical address space, the OS should be able to load the 250MB into the right physical memory from disk I/O. The displaced 200MB would simply be retrieved when needed from the phyiscal location in the SSD but will remain present in the whole virtual RAM. Unless I misunderstood what you were saying.
     
    thicc_gaf likes this.
  3. rntongo

    Newcomer

    Joined:
    May 23, 2020
    Messages:
    119
    Likes Received:
    106
    If you assume Hitman 3 is not optimized for the PS5 you'd have to assume the same for the Series X. As far as we know both games were optimized for their systems to the best of IO interactive's ability. And as far as we're concerned we haven't seen nor will we ever see how the velocity architecture loads Spiderman. I still think the actual SSD is faster in the PS5 but the Series X is definitely performing just as well or even better despite having half the effective throughput.

    It's honestly impressive thus far. My thoughts are that later on when engines are more optimized we'll start seeing a 2x advantage but at very low loadtimes. Say 2 vs 4 seconds. But in games with such low load times like Fifa and NBA its been a draw with games loading in under 3 seconds on both machines.
     
    thicc_gaf and PSman1700 like this.
  4. Kugai Calo

    Newcomer

    Joined:
    Mar 6, 2020
    Messages:
    184
    Likes Received:
    181
    When you allocate memory you are unlikely to do so at such a big granularity. Unless you absolutely need a whole 10GB of content to be ready for command at a given time, this is probably a non issue.
     
  5. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    12,986
    Likes Received:
    15,717
    Location:
    The North
    Yea, I think this is true. I guess I was just making a simple example to understand really.
    But I would hope the granularity is small enough to make this issue a non-factor.
     
  6. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    12,986
    Likes Received:
    15,717
    Location:
    The North
    I dunno, maybe? Unfortunately I don't understand enough to really know what's happening behind the scenes, the documentation isn't exhaustive.
     
  7. rntongo

    Newcomer

    Joined:
    May 23, 2020
    Messages:
    119
    Likes Received:
    106
    Yeah HBCC or SFS would handle what you described really well. I wrote a post about it on reddit a while ago. Efficient demand paging of data into RAM. Do you have a link to the documentation?
     
  8. cheapchips

    Veteran Newcomer

    Joined:
    Feb 23, 2013
    Messages:
    1,791
    Likes Received:
    1,926
    Ant-Man and the Pube Forest?
     
    rntongo and Nesh like this.
  9. thicc_gaf

    Regular Newcomer

    Joined:
    Oct 9, 2020
    Messages:
    324
    Likes Received:
    246
    Half of the suspected effective throughput, at that; people really need to read through the FlashMap papers. I probably have them bookmarked but, I'm not searching my bookmarks for them right now xD.

    I'm surprised there's no HBCC in either system, but I guess their designs did not require it. I wonder if MS's handling of the fast/slow memory pools is similar to how Intel has apps implement memory control using Optane Persistent Memory in App Direct, or in Memory Direct mode.

    IIRC the latter just treats the DRAM as a last-level cache and the system sees it and the Optane memory as one large contingent memory pool, while App Direct mode has the software and OS treat them as two separate memory pools. I'm guessing given the challenges some 3P devs seem to be having MS could have the OS and game apps handling the fast and slow memory pools in an equivalent to some App Direct mode seen with Optane memory & DRAM.
     
    #429 thicc_gaf, Jan 22, 2021
    Last edited: Jan 22, 2021
  10. rntongo

    Newcomer

    Joined:
    May 23, 2020
    Messages:
    119
    Likes Received:
    106
    If I had to guess, the DMA controllers in the Series X do the equivalent or similar work to HBCC. I think the SFS hw in the GPU idenifies what pages will be needed and the DMA controller makes sure they're resident in RAM. Otherwise all assets remain as part of the virtual address space.


    I have no idea how this works.
     
  11. Kugai Calo

    Newcomer

    Joined:
    Mar 6, 2020
    Messages:
    184
    Likes Received:
    181
    I think the SF (no Streaming) hardware only: 1) records what part & which mips level is sampled and 2) makes sampling of the feedback map of higher accuracy by having custom filter. The Streaming part I would imagine involves the CPU readingback the feedback map and encodes DirectStorage commands accordingly.
     
    thicc_gaf likes this.
  12. Jay

    Jay
    Veteran Regular

    Joined:
    Aug 3, 2013
    Messages:
    3,456
    Likes Received:
    2,805
    Doesn't sound right to me.
    SF is a DX12U feature
    SFS is a XS feature that includes the custom filter.
     
    thicc_gaf and tinokun like this.
  13. dobwal

    Legend Veteran

    Joined:
    Oct 26, 2005
    Messages:
    5,688
    Likes Received:
    1,919
  14. Moik

    Newcomer

    Joined:
    Dec 2, 2020
    Messages:
    25
    Likes Received:
    28
    Twas implied to be hardware specific at one time. (or at least tasks that could be done on cpu, elsewhere)
    ">"></script>
    I wonder if it's a feature that's in AMD hardware, but hasn't been uncovered in DX,,
     
    #434 Moik, Jan 22, 2021
    Last edited: Jan 22, 2021
  15. Jay

    Jay
    Veteran Regular

    Joined:
    Aug 3, 2013
    Messages:
    3,456
    Likes Received:
    2,805
    SF is supported in all DX12U hardware.
    SFS is custom additions made by MS for XS, features like the custom filter, etc.

    Unsure how much performance SFS will give above and beyond SF, it may just help with simpler development.
    But I suspect there will be benefits beyond simpler development otherwise may not have been worth the effort to add it as SF would need to be implemented in the PC engine anyway.
     
    function, tinokun and RagnarokFF like this.
  16. rntongo

    Newcomer

    Joined:
    May 23, 2020
    Messages:
    119
    Likes Received:
    106
    Whatever page of data the custom filter identifies as not resident in RAM will have to be demand paged in. The DMA controllers would play a part in this process.
     
    thicc_gaf and function like this.
  17. Moik

    Newcomer

    Joined:
    Dec 2, 2020
    Messages:
    25
    Likes Received:
    28
    Jay wrote: "SF is supported in all DX12U hardware.
    SFS is custom additions made by MS for XS, features like the custom filter, etc."

    Yep, I grabbed the wrong tweet initially.
    I know a Sony engineer had mentioned writing a shader to query which textures are needed, and the sampler can report if the texture is resident in memory. (twas a generic comment, but maybe some insight into what they have in hardware (or not) on the PS5 side.
     
  18. dobwal

    Legend Veteran

    Joined:
    Oct 26, 2005
    Messages:
    5,688
    Likes Received:
    1,919
    Read the MS doc I linked. There is no mention of the Xbox. XS may have customizations for SFS, but this doc makes no declarations that SFS is limited to consoles.


    How to adopt Sampler Feedback for Streaming
    To adopt SFS, an application does the following:

    • Use a tiled texture (instead of a non-tiled texture), called a reserved texture resource in D3D12, for anything that needs to be streamed.
    • Along with each tiled texture, create a small “MinMip map” texture and small “feedback map” texture.
      • The MinMip map represents per-region mip level clamping values for the tiled texture; it represents what is actually loaded.
      • The feedback map represents and per-region desired mip level for the tiled texture; it represents what needs to be loaded.
    • Update the mip streaming engine to stream individual tiles instead of mips, using the feedback map contents to drive streaming decisions.
    • When tiles are made resident or nonresident by the streaming system, the corresponding texture’s MinMip map must be updated to reflect the updated tile residency, which will clamp the GPU’s accesses to that region of the texture.
    • Change shader code to read from MinMip maps and write to feedback maps. Feedback maps are written using special-purpose HLSL constructs.
     
    #438 dobwal, Jan 23, 2021
    Last edited: Jan 23, 2021
    thicc_gaf, Moik, function and 2 others like this.
  19. Jay

    Jay
    Veteran Regular

    Joined:
    Aug 3, 2013
    Messages:
    3,456
    Likes Received:
    2,805
    Your probably right, I remember the whole conversation about what is and isn't custom.
    Only thing I know for sure is the custom filter.
    MS would mangle VA, SFS etc when talking custom and improvemens sometimes.

    I'm still not clear what they actually meant when they said they have the full RDNA2, but then I've not been following anything for couple months now.

    Do we know which games/engines are using tiled resources? As those would be the candidates to be updated to SFS part of VA sooner rather than later.
     
  20. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    12,986
    Likes Received:
    15,717
    Location:
    The North
    There is a custom filter for XSX however wrt SFS.

    The hardware will pull the tile from SSD as required, but will also generate a matching somewhat generated coloured tile to insert into the frame incase the texture doesn't arrive in time to avoid any sort of hiccuping etc. And the next frame the tile will be there.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...