Kugai Calo
Regular
Isn't the only implication of such "split" memory config being that when the game engine's allocator requests new pages, it needs to specify whether it wants higher bandwidth?
I mean, I guess I'm just thinking about space fragmentation in memory, you've got 200MB left in each pool but the next set of assets requires 250MB for instance, neither pool can hold it and now you're playing tetris to make things work. It's a slightly different argument from bandwidth, and this problem doesn't exist with a unified setup. So if you need to make space, the question is how and what performance impacts will the system suffer for making space. Or you can forgo it and try to load more things just in time. (which brings us back to the argument around why velocity architecture is needed).
Maybe this is not fully optimized. Loading are slow in Hitman 3 around 6 to 7 seconds compared to Spiderman Miles Morales for example where it is around 2 seconds. In an optimized title on PS5 the CPU is not involved into I/O at all.
When you allocate memory you are unlikely to do so at such a big granularity. Unless you absolutely need a whole 10GB of content to be ready for command at a given time, this is probably a non issue.I mean, I guess I'm just thinking about space fragmentation in memory, you've got 200MB left in each pool but the next set of assets requires 250MB for instance, neither pool can hold it and now you're playing tetris to make things work. It's a slightly different argument from bandwidth, and this problem doesn't exist with a unified setup. So if you need to make space, the question is how and what performance impacts will the system suffer for making space. Or you can forgo it and try to load more things just in time. (which brings us back to the argument around why velocity architecture is needed).
Yea, I think this is true. I guess I was just making a simple example to understand really.When you allocate memory you are unlikely to do so at such a big granularity. Unless you absolutely need a whole 10GB of content to be ready for command at a given time, this is probably a non issue.
I dunno, maybe? Unfortunately I don't understand enough to really know what's happening behind the scenes, the documentation isn't exhaustive.IIRC HBCC would have handled this issue really well or whatever DMA controllers they're using in the Series X and PS5. If you think about it as a virtual address space and not the physical address space, the OS should be able to load the 250MB into the right physical memory from disk I/O. The displaced 200MB would simply be retrieved when needed from the phyiscal location in the SSD but will remain present in the whole virtual RAM. Unless I misunderstood what you were saying.
Yeah HBCC or SFS would handle what you described really well. I wrote a post about it on reddit a while ago. Efficient demand paging of data into RAM. Do you have a link to the documentation?I dunno, maybe? Unfortunately I don't understand enough to really know what's happening behind the scenes, the documentation isn't exhaustive.
I'm missing something or this is the perfect tech for an antman's game
If you assume Hitman 3 is not optimized for the PS5 you'd have to assume the same for the Series X. As far as we know both games were optimized for their systems to the best of IO interactive's ability. And as far as we're concerned we haven't seen nor will we ever see how the velocity architecture loads Spiderman. I still think the actual SSD is faster in the PS5 but the Series X is definitely performing just as well or even better despite having half the effective throughput.
It's honestly impressive thus far. My thoughts are that later on when engines are more optimized we'll start seeing a 2x advantage but at very low loadtimes. Say 2 vs 4 seconds. But in games with such low load times like Fifa and NBA its been a draw with games loading in under 3 seconds on both machines.
Yeah HBCC or SFS would handle what you described really well. I wrote a post about it on reddit a while ago. Efficient demand paging of data into RAM. Do you have a link to the documentation?
If I had to guess, the DMA controllers in the Series X do the equivalent or similar work to HBCC. I think the SFS hw in the GPU idenifies what pages will be needed and the DMA controller makes sure they're resident in RAM. Otherwise all assets remain as part of the virtual address space.I'm surprised there's no HBCC in either system, but I guess their designs did not require it.
I wonder if MS's handling of the fast/slow memory pools is similar to how Intel has apps implement memory control using Optane Persistent Memory in App Direct, or in Memory Direct mode.
IIRC the latter just treats the DRAM as a last-level cache and the system sees it and the Optane memory as one large contingent memory pool, while App Direct mode has the software and OS treat them as two separate memory pools. I'm guessing given the challenges some 3P devs seem to be having MS could have the OS and game apps handling the fast and slow memory pools in an equivalent to some App Direct mode seen with Optane memory & DRAM.
I think the SF (no Streaming) hardware only: 1) records what part & which mips level is sampled and 2) makes sampling of the feedback map of higher accuracy by having custom filter. The Streaming part I would imagine involves the CPU readingback the feedback map and encodes DirectStorage commands accordingly.If I had to guess, the DMA controllers in the Series X do the equivalent or similar work to HBCC. I think the SFS hw in the GPU idenifies what pages will be needed and the DMA controller makes sure they're resident in RAM. Otherwise all assets remain as part of the virtual address space.
Doesn't sound right to me.I think the SF (no Streaming) hardware only: 1) records what part & which mips level is sampled and 2) makes sampling of the feedback map of higher accuracy by having custom filter. The Streaming part I would imagine involves the CPU readingback the feedback map and encodes DirectStorage commands accordingly.
Doesn't sound right to me.
SF is a DX12U feature
SFS is a XS feature that includes the custom filter.
SF is supported in all DX12U hardware.Twas implied to be hardware specific at one time. (or at least tasks that could be done on cpu, elsewhere)
">"></script>
I wonder if it's a feature that's in AMD hardware, but hasn't been uncovered in DX,,
Whatever page of data the custom filter identifies as not resident in RAM will have to be demand paged in. The DMA controllers would play a part in this process.I think the SF (no Streaming) hardware only: 1) records what part & which mips level is sampled and 2) makes sampling of the feedback map of higher accuracy by having custom filter. The Streaming part I would imagine involves the CPU readingback the feedback map and encodes DirectStorage commands accordingly.
SF is supported in all DX12U hardware.
SFS is custom additions made by MS for XS, features like the custom filter, etc.
Unsure how much performance SFS will give above and beyond SF, it may just help with simpler development.
But I suspect there will be benefits beyond simpler development otherwise may not have been worth the effort to add it as SF would need to be implemented in the PC engine anyway.
Your probably right, I remember the whole conversation about what is and isn't custom.Read the MS doc I linked. There is no mention of the Xbox. XS may have customizations for SFS, but this doc makes no declarations that SFS is limited to consoles.
How to adopt Sampler Feedback for Streaming
To adopt SFS, an application does the following:
- Use a tiled texture (instead of a non-tiled texture), called a reserved texture resource in D3D12, for anything that needs to be streamed.
- Along with each tiled texture, create a small “MinMip map” texture and small “feedback map” texture.
- The MinMip map represents per-region mip level clamping values for the tiled texture; it represents what is actually loaded.
- The feedback map represents and per-region desired mip level for the tiled texture; it represents what needs to be loaded.
- Update the mip streaming engine to stream individual tiles instead of mips, using the feedback map contents to drive streaming decisions.
- When tiles are made resident or nonresident by the streaming system, the corresponding texture’s MinMip map must be updated to reflect the updated tile residency, which will clamp the GPU’s accesses to that region of the texture.
- Change shader code to read from MinMip maps and write to feedback maps. Feedback maps are written using special-purpose HLSL constructs.
There is a custom filter for XSX however wrt SFS.Read the MS doc I linked. There is no mention of the Xbox. XS may have customizations for SFS, but this doc makes no declarations that SFS is limited to consoles.
How to adopt Sampler Feedback for Streaming
To adopt SFS, an application does the following:
- Use a tiled texture (instead of a non-tiled texture), called a reserved texture resource in D3D12, for anything that needs to be streamed.
- Along with each tiled texture, create a small “MinMip map” texture and small “feedback map” texture.
- The MinMip map represents per-region mip level clamping values for the tiled texture; it represents what is actually loaded.
- The feedback map represents and per-region desired mip level for the tiled texture; it represents what needs to be loaded.
- Update the mip streaming engine to stream individual tiles instead of mips, using the feedback map contents to drive streaming decisions.
- When tiles are made resident or nonresident by the streaming system, the corresponding texture’s MinMip map must be updated to reflect the updated tile residency, which will clamp the GPU’s accesses to that region of the texture.
- Change shader code to read from MinMip maps and write to feedback maps. Feedback maps are written using special-purpose HLSL constructs.