Velocity Architecture - more than 100GB available for game assets

Discussion in 'Console Technology' started by invictis, Apr 22, 2020.

  1. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    12,990
    Likes Received:
    15,720
    Location:
    The North
    is this confirmed? Just reading through the docs, I was sort of in contention about this one (originally assuming the CPU was allowed to access whatever it wanted as well)

    I'll have to check hot chips, but I think that was an assumption we have traditionally made, but I haven't found hard proof of it. It's open to interpretation though so far until then.

    edit: wait. Yea you're right. CPU being in charge of IO, it's making the call to move data into memory somehow. Yea it's got full access to both pools.

    You are right that the challenge is probably just maximizing the memory, you are likely to have a much harder time filling it to the full allocation having now divide into two pools. Like a packing a single moving truck vs 2 smaller trucks. Tougher with big furniture etc.

    As for variable clocks, I think the reason XSX is fixed is because of the way it's trying to run more than 1 virtualized environment on Azure. It can run 4 Xbox One instances at the same time for instance. Variable clocks would just mess that up entirely.
     
    #381 iroboto, Jan 18, 2021
    Last edited: Jan 18, 2021
    PSman1700 and thicc_gaf like this.
  2. thicc_gaf

    Regular Newcomer

    Joined:
    Oct 9, 2020
    Messages:
    324
    Likes Received:
    246
    I think what's being referred to isn't even really related to bandwidth, of how the data for the GPU pool of the memory is actually getting into that pool. Since the CPU only has access to 6 GB from 6 of the 10 chips (and of that 2.5 GB is reserved for OS anyway), I've seen that as basically a 3.5 GB physical space of RAM to shuffle data for the GPU memory through, but realistically it would be even "smaller" because some of that 3.5 is going to also be used for CPU code and audio data. At least, if the way I'm seeing it there is close to accurate.

    Devs are likely aware of where the data is in memory once it's populated, it's more the question of how quickly that data can actually get to where it needs between the two pools. Once the GPU data is in the GPU memory pool it's basically smooth sailing but the differing pools of size & bandwidth might be creating some growing pains for devs in actually getting GPU-bound data into the GPU part of the memory pool. All that aside though compared to PC an APU design like Series X still has a clear advantage over what PC does with stuff like BAR or shuffling data over a PCIe bus, since it's still essentially a hUMA design in other aspects. This partitioning of memory pools into fast & slow blocks which also affects the physical capacity of both, feels like it kind of virtualizes some nUMA quirks into the package though, at least IMO.

    If the GPU could directly access storage (in practice), would that resolve a lot of this? That's what DirectStorage is supposed to help with: directly accessing data in the storage to populate VRAM, bypassing system RAM and copy process. So we know the Series X is capable of this. However, it's also a part of Velocity Architecture and DirectStorage won't start deploying until later this year. If the VA timescale is to what you're speculating then this feature probably won't even be leveraged for a while even if the hardware is capable of it.

    Which might be a bit of an issue going forward until it can actually be leveraged, but I guess we'll have to wait and see.

    Sounds about right, also comes with some inconvenient timing for MS tho I would guess. Again I think this is a reason they scaled back so much on their own 1P cross-gen support; if some of the features critical to VA can't really be leveraged by games still being coded to 8th-gen consoles as their base, then the only way to break out of that early is to cut off 8th-gen support. Which for Microsoft in particular, I always felt was the better option for a lot of reasons.

    But then they also have to consider their Xbox/PC/mobile (through streaming) cross-platform initiative though and that kind of acts as a bit of an anchor in this because solutions like DirectStorage won't even be supported among most PCs except in virtually-impossible-to-buy RDNA2 and RTX 30 series GPUs, which won't make up a significant chunk of that market gaming-wise for a very long time. So while they could push their 1P ahead to focus on the 9th-gen and accelerate use of VA and its features if they'd like or needed, that still probably creates some lag for the PC side of things.

    Altho I don't want to sound like I'm putting all of it on VA myself; seems that utilization of VA isn't potentially at the heart of things regarding certain performance metrics in various 3P games on the system as of this moment.

    Yeah, and if it's a game that needs to also work on 8th-gen platforms any sort of SSD-level streaming of data (particularly non-texture data) needs a mirrored equivalent for the older systems and their HDD I/O being much more limited. I know games like Until Dawn got a recent update to significantly cut loading times on PS4, but loading data isn't the same thing as streaming it in, and this highlights that.

    And like you also say, PC would be a limiting factor as well, partly because stuff like DirectStorage and GPUDirectStorage are either too niche to really program around, or simply aren't even available to use yet.

    So it's more basically coming down to reliance by cross-gen software to rely on CPU for pipelines related to rendering setup (is this another way of phrasing drawlists/drawcall instructions? XBO had support for executeIndirect which might be some of the GPU-orientated task support in 8th-gen systems you were suggesting), and in that scenario a fully unified pool is always going to win out.

    With something like Series X there's just a large chunk of physical memory exclusively set for a processor component that these cross-gen games aren't even designed to leverage for that type of task work and given the CPU speeds between Sony and MS's systems are virtually identical in SMT mode, that's going to affect MS moreso since the lesser physical RAM amount (and therefore bandwidth) dedicated to the CPU in their setup is magnified with no additional CPU clockspeed to power through that and make up the difference (again, in SMT).

    Some or all of this stuff I figure also applies when it comes to BC games too, but in that case games can rely on non-SMT clock for the CPU plus those games using less physical memory, XBO/OneX games not running as fast on the CPU side of things in the first place and probably several other things I don't understand the intricate details on to talk about.
     
  3. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    12,990
    Likes Received:
    15,720
    Location:
    The North
    This probably is using velocity architecture.. given how close they can zoom up.
    Pretty insane materials/budget setup. I guess they can keep things cheap if they keep the entire game to that condo.
     
    #383 iroboto, Jan 18, 2021
    Last edited: Jan 18, 2021
    mr magoo, Nesh, Malo and 2 others like this.
  4. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    18,770
    Likes Received:
    21,055
    Wrong. The CPU has access to ALL memory, all 16 GB. Same with GPU, it has access to ALL memory, all 16 GB. A game has access to 13.5 GB that it can use on Series X. The GPU and CPU can access all 13.5 GB.
     
    tinokun likes this.
  5. thicc_gaf

    Regular Newcomer

    Joined:
    Oct 9, 2020
    Messages:
    324
    Likes Received:
    246
    But that's access as in copying data to/from/within the RAM, right? Not access as in the CPU working with data in the full 16 GB, otherwise why designate CPU/audio-optimized pool and GPU-optimized memory pool?

    Because the former, there's no disagreement or point of curiosity there. The latter, though, at least IMO it is opening up some discussion on how if cross-gen games are not optimizing allocation of GPU-bound data to fit in the GPU-optimized 10 GB pool, and the games are still doing certain tasks that haven't been transitioned to a GPU-friendly pipeline, if those games are just barely using parts of VA to "make up" for that, it could explain pretty well some of the discrepancies in 3P cross-gen performance on the platform for these early months.
     
  6. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    18,770
    Likes Received:
    21,055
    It's a single memory pool. It's full access, to do with whatever you please. Only the speed of the memory is different.

    It would be suboptimal to place data in the faster speed section if it's only ever accessed by the CPU.
    It would be suboptimal to place data in the slower speed section if it's only ever accessed by the GPU.

    Not everything will be that clear and easily classified. It's a bit of a balancing act in where to place the data if it's used by CPU and GPU. It all depends on how the data is used, as in how often does the GPU and CPU touch it.
     
    milk, Silent_Buddha, Jay and 3 others like this.
  7. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    18,770
    Likes Received:
    21,055
    As for how does Xbox One X games that use more than 10 GB gets mapped to memory on Xbox Series X, that's a more interesting question. A OneX enhanced game would have access to 12 GB.

    I suspect (would need to go through hotchips and release notes again) that the program executable is placed in the slower pool (up to 3.5 GB for games) and then everything else is mapped to the 10 GB faster pool. I don't know if there is any limits on how large an "executable" can be, but I doubt any would be over 3.5 GB. Even if it is, I suspect you simply map the first 3.5 GB to slower memory and anything over that is mapped to the remaining 10 GB faster memory.
     
    thicc_gaf likes this.
  8. liams

    Regular Newcomer

    Joined:
    Jul 1, 2020
    Messages:
    316
    Likes Received:
    265
    Is it possible for devs to treat the entirety of the ram as slow ram? If they were struggling with allocating things between the fast and slow pools they might just treat the whole thing as a unified pool. Which might be why there have been some performance issues on some multi platform games, the devs may not have had enough time to get familiar with it the first go round.
     
  9. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    18,770
    Likes Received:
    21,055
    They have operations that can migrate buffers between the pools, but one operation does not function as desired and the performance of doing the workaround in GDK is slower than desired in certain situations. These are some of the issued listed in the leaked XDK Release Notes from 2020.

    Remapping memory reservations from fast to slow memory fails when using XMemVirtualAlloc
    • Using XMemVirtualAlloc fails when trying to remap a memory reservation from fast memory to slow memory. You can work around this limitation by using XMemAllocatePhysicalPages or by using a completely different reservation for slow memory
    XMemAllocatePhysicalPages performs significantly slower on Anaconda with the Advanced memory configuration than Standard memory configuration. This will be addressed in a future GDK.
     
    Jay and thicc_gaf like this.
  10. rntongo

    Newcomer

    Joined:
    May 23, 2020
    Messages:
    119
    Likes Received:
    106
    Havent been able to read everything you've posted thoroughly but it makes the most sense. In essence its unified memory to the CPU/GPU but with slower and fast access to certain physical RAM. And So the XMemVirtualAlloc is a library function that remaps virtual RAM addresses of objects or primitives to faster or slower addresses. And the XMemAllocatePhysicalPages maps the actual physical locations in memory? So is it a case of developers are not using these functions properly in addition to inefficiencies in their implementations? or Its simply just the inefficiencies in their implementations that are causing issues and once sorted by MSFT devs, the game code will function much better?
     
    thicc_gaf likes this.
  11. rntongo

    Newcomer

    Joined:
    May 23, 2020
    Messages:
    119
    Likes Received:
    106
    I just think that would bottleneck the whole system. The memory controller is built to handle data using the split memory pools. Better to fully utilize that. But again if your data ends up in a slower memory pool you're not going to get the full performance. So its a catch 22 until they get their software in order.
     
  12. thicc_gaf

    Regular Newcomer

    Joined:
    Oct 9, 2020
    Messages:
    324
    Likes Received:
    246
    That's probably what they're doing, given that even if One X has 12 GB, not all of that is used for the game code.

    In fact I think only 9 GB is used for games as the other 3 is used for the OS (even if they bumped it to 9.5 GB or 10 GB for games that would still fit in the 560 GB/s memory pool), so the entirety of the game code can go in the GPU-optimized pool and stay there.
     
    BRiT likes this.
  13. rntongo

    Newcomer

    Joined:
    May 23, 2020
    Messages:
    119
    Likes Received:
    106
    The game code most likely will be in the slower pool of RAM( should be in static memory) and objects created at runtime should be stored for the most part in the game optimal RAM(10GB) So textures, geometry, whatever.
     
  14. rntongo

    Newcomer

    Joined:
    May 23, 2020
    Messages:
    119
    Likes Received:
    106
    Considering the slower memory pool still has higher memory bandwidth than the One X I don't think they need to update the game code to fully utilize the faster and slower pools of RAM. The games could simply run on the Series X and have much better performance.
     
    milk and BRiT like this.
  15. DSoup

    DSoup meh
    Legend Veteran Subscriber

    Joined:
    Nov 23, 2007
    Messages:
    14,879
    Likes Received:
    10,988
    Location:
    London, UK
    Something is being lost in translation. The big advantage of modern console architectures is the CPU and GPU are on the same die and both have access to the unified pool of RAM. I can't think of any compelling reasons to implement some arbitrary 'allocation' of RAM ranges of RAM to CPU or GPU.
     
    milk likes this.
  16. fehu

    Veteran Regular

    Joined:
    Nov 15, 2006
    Messages:
    1,958
    Likes Received:
    930
    Location:
    Somewhere over the ocean
    I'm missing something or this is the perfect tech for an antman's game, or both.
     
    rntongo, cheapchips and iroboto like this.
  17. mr magoo

    Newcomer

    Joined:
    May 31, 2012
    Messages:
    138
    Likes Received:
    242
    Location:
    Stockholm
    my immediate reaction was " i need to vacuum my apartment!" ... looks amazing thou
     
  18. Jay

    Jay
    Veteran Regular

    Joined:
    Aug 3, 2013
    Messages:
    3,458
    Likes Received:
    2,805
    The ranges will be associated to the speed.
    Just calling it gpu and cpu as that's the main uses of them.
     
    DSoup likes this.
  19. Allandor

    Regular Newcomer

    Joined:
    Oct 6, 2013
    Messages:
    588
    Likes Received:
    524
    Well, memory fragmentation would be one. The GPU data is mostly really short living (at least in the future). This part has a higher chance to create many fragments.
     
    BRiT likes this.
  20. DSoup

    DSoup meh
    Legend Veteran Subscriber

    Joined:
    Nov 23, 2007
    Messages:
    14,879
    Likes Received:
    10,988
    Location:
    London, UK
    I think we're talking about different things. The way I read iroboto's post ("Right, memory that is allocated to the GPU may actually need to be allocated to the CPU, but because of the split pool, developers can't use it and need work arounds to reallocate memory to it.") was Xbox has formal memory addressing arbitration. This seems unlucky and would be contrary to performance needs. This is why I think something has been lost in translation.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...