Velocity Architecture - Limited only by asset install sizes

these non power of two amounts probably can mean fewer cost reduction opportunities in the future, no?

Given than Sony and MS are already using none clamshell for PS5 and XSX, and they need their current bus width, there are probably limited opportunities for cost reduction on memory anyway.

I suppose it's possible that MS could go from 320-bits @ 14 ghz, and drop to 256-bits at 18 ghz, so go from 10 chips to 8, but that would assume 18 ghz memory becomes cost effective and that the additional raw BW it would give (beyond 320-bit @ 14) could compensate for fewer memory channels and 20% less GPU L3 cache.

So ... maybe?
 
Given than Sony and MS are already using none clamshell for PS5 and XSX, and they need their current bus width, there are probably limited opportunities for cost reduction on memory anyway.

I suppose it's possible that MS could go from 320-bits @ 14 ghz, and drop to 256-bits at 18 ghz, so go from 10 chips to 8, but that would assume 18 ghz memory becomes cost effective and that the additional raw BW it would give (beyond 320-bit @ 14) could compensate for fewer memory channels and 20% less GPU L3 cache.

So ... maybe?

Right... This is a bit beyond my depth, but maybe considering BC compatibility with both high end and low end products (Portable?) 10+ years ahead and also servers, maybe keeping memory amounts at powo2 could make reaching compatibility easier for stuff far enough ahead we can't even predict.
 
Why make Series X 20 Gb when S is 10? Devs need to optimize for S memory anyway.
This as well. Honestly gives me more reason why I wish it didn't exist. I wish they'd spent that extra development time and efforts funding the start of a AAA title. All I'm playing on my Series X is old games at the moment. Yet on the PS5 they're getting AAA first party third person titles in about a month or two
 
Given than Sony and MS are already using none clamshell for PS5 and XSX, and they need their current bus width, there are probably limited opportunities for cost reduction on memory anyway.

I suppose it's possible that MS could go from 320-bits @ 14 ghz, and drop to 256-bits at 18 ghz, so go from 10 chips to 8, but that would assume 18 ghz memory becomes cost effective and that the additional raw BW it would give (beyond 320-bit @ 14) could compensate for fewer memory channels and 20% less GPU L3 cache.

So ... maybe?
I think you meant 14Gbps and 18Gbps?
 
Right... This is a bit beyond my depth, but maybe considering BC compatibility with both high end and low end products (Portable?) 10+ years ahead and also servers, maybe keeping memory amounts at powo2 could make reaching compatibility easier for stuff far enough ahead we can't even predict.

It's worth noting (I think, probably) that XSX is still a power of 2 quantity of memory by capacity if not by memory chip count. So if they're flexible enough to move to smaller buses at higher clocks they could possible move back to 8 memory chips.

In the long term and with at least a couple of memory density increases left, it's quite possible that future BC devices would simple have more ram than is needed and so exactly matching wouldn't be necessary.
 
I am so lost trying to follow. This is regarding the current velocity architecture and cost reductions on future current gen Series machines or following generation speculation?
 
I am so lost trying to follow. This is regarding the current velocity architecture and cost reductions on future current gen Series machines or following generation speculation?
I'm also not following the correlation between memory configuration and velocity architecture either.
 
I think folks are trying to discuss the balance between SSD bandwidth for Xbox and memory quantity - and it's kind of moved onto what next gen might look like.

I think the cost reductions bit was just an OT spinoff about Xbox memory config. I'm guilty of indulging that too, so my bad. I'll pipe down.
 
Likely if SX was 20GB, SS would be more; like 12GB everything fullspeed or maybe 14 with different bus-width? A bit too early in the morning for me to entertain different capacity and width implications.
Keeping the buses the same, symmetrical chip setups would have:

a) made memory management significantly easier
b) not need to rely on the devkits being extremely optimal to obtain good performance from memory
c) you'd not have to include the memory shuffling commands that they have in the API to swap data from slow to fast pools and vice versa, thus reducing the likelihood of random hitching.
d) BC is a non issue with everything being uniform, BC go forward for next generation will have memory swaps for seemingly no reason now.

BC performance may also have been improved with upgrading last gen title as the memory layout becomes a bit more challenging with the split pools.

Overall, MS may have determined that the end result was the same in the long run, but I'm not sure if they are hitting their internal performance targets right now. Performing at about or worse than a 5700 is not where I think they expect XSX to land.

With respect to velocity architecture, it's not exactly clear how a split pool layout or a uniform memory layout would have equal or worse performance. It clearly can't be better than a uniform pool however.
 
Last edited:
Keeping the buses the same, symmetrical chip setups would have:

a) made memory management significantly easier
b) not need to rely on the devkits being extremely optimal to obtain good performance from memory
c) you'd not have to include the memory shuffling commands that they have in the API to swap data from slow to fast pools and vice versa, thus reducing the likelihood of random hitching.
d) BC is a non issue with everything being uniform, BC go forward for next generation will have memory swaps for seemingly no reason now.

BC performance may also have been improved with upgrading last gen title as the memory layout becomes a bit more challenging with the split pools.

Overall, MS may have determined that the end result was the same in the long run, but I'm not sure if they are hitting their internal performance targets right now. Performing at about or worse than a 5700 is not where I think they expect XSX to land.

With respect to velocity architecture, it's not exactly clear how a split pool layout or a uniform memory layout would have equal or worse performance. It clearly can't be better than a uniform pool however.

Memory swaps? For what? Moving data around the RAM eats bandwidth in and of itself.

I can see the need to move data around to improve bandwidth or latency but that would be due to caveats that go beyond (336 GBps vs 560 GBps).

The XSX CPU never has access to 560 GBps regardless of where the targeted data exists in RAM. That’s been relatively true of any AMD APU based console (the PS5 might not be limited in such a fashion). The CPU busses only had access to a fraction of the bandwidth that the RAM on that hardware provided. And the GPUs couldn’t pulled data from allocated memory used by cpu caches at max bandwidth. Even memory allocated for cpu and gpu sharing had limited bandwidth.

I can see limitations (which presents disparities much larger than 560 vs 336 GBps) such as those that may still exist in the new hardware encouraging memory swaps.

However a bunch of memory swaps isn’t going to make 10 GBs of VRAM at 560 GBps and 2 GBs of VRAM at 336 GBps appear as 12 GBs at 560 GBps. Shuttling data back and forth for such purposes is just going to lower overall bandwidth over time.

The difference in bandwidth between the two pools seems too small to encourage such a mechanism. Just directly request the data in the slow pool and live with 336 GBs of bandwidth.
 
Last edited:
Memory swaps? For what? Moving data around the RAM eats bandwidth in and of itself.

I can see the need to move data around to improve bandwidth or latency but that would be due to caveats that go beyond (336 GBps vs 560 GBps).

The XSX CPU never has access to 560 GBps regardless of where the targeted data exists in RAM. That’s been relatively true of any AMD APU based console (the PS5 might not be limited in such a fashion). The CPU busses only had access to a fraction of the bandwidth that the RAM on that hardware provided. And the GPUs couldn’t pulled data from allocated memory used by cpu caches at max bandwidth. Even memory allocated for cpu and gpu sharing had limited bandwidth.

I can see limitations (which presents disparities much larger than 560 vs 336 GBps) such as those that may still exist in the new hardware encouraging memory swaps.

However a bunch of memory swaps isn’t going to make 10 GBs of VRAM at 560 GBps and 2 GBs of VRAM at 336 GBps appear as 12 GBs at 560 GBps. Shuttling data back and forth for such purposes is just going to lower overall bandwidth over time.

The difference in bandwidth between the two pools seems too small to encourage such a mechanism. Just directly request the data in the slow pool and live with 336 GBs of bandwidth.
uhh your mailbox ;)
 
This as well. Honestly gives me more reason why I wish it didn't exist. I wish they'd spent that extra development time and efforts funding the start of a AAA title. All I'm playing on my Series X is old games at the moment. Yet on the PS5 they're getting AAA first party third person titles in about a month or two

Same thing I've thought, but in relation to some of the acquisition money they've set aside, like the purported $40 billion for TikTok. That certainly could've been earmarked for a bit more on the consoles or towards getting a few new 1P games ready for launch. Bleeding Edge could've benefited from a delay and launched in the Fall alongside the new consoles and XBO, might've been a hit if they had done so.

I just hope by the latter half of this year whatever disorganization XGS has had is resolved; getting back on acquisition talk though I'm not even particularly crazy with the new rumors. It'll be a big get if they snag a major dev/pub but it also means we'll probably have to wait a few years before seeing anything from that acquisition. Though, I suppose this changes if MS are already aware of certain developments pre-acquisition and make arrangements to secure those (like what I THINK they've done with the new Indiana Jones games which should be releasing sooner rather than later, they have a new money coming next year I think).

I think folks are trying to discuss the balance between SSD bandwidth for Xbox and memory quantity - and it's kind of moved onto what next gen might look like.

I think the cost reductions bit was just an OT spinoff about Xbox memory config. I'm guilty of indulging that too, so my bad. I'll pipe down.

Well, more specifically, SSD raw/compressed bandwidth to memory capacity and SSD raw/compressed bandwidth to memory bandwidth ratios, assuming similar ratios would want to at least be maintained for future designs if not improved, but yeah it was leaning a bit into 10th-gen speculation and I'm guilty of that slippery slope as well, there's already a thread for that.

Dunno if there's really too much to talk about VA ATM on the technical side that hasn't already been touched on prior, since we don't have any new details. Has anymore managed to source any documentation to the specific SK Hynix chip in the drive? That could give us a lot of new info on latency figures, random access times, etc. We're seeing roughly similar performance between MS and Sony's SSDs at least when it comes to load times of 3P multiplats; if we had access to more info on the NAND chip itself we'd be able to figure what the channel count is, among other things, and maybe draw some conclusions WRT PS5's NAND devices, too.

Speaking of which has anyone found any documentation on the Toshiba/Kioxia NAND chips the PS5 uses, or the specifics on its DDR cache chip?

Keeping the buses the same, symmetrical chip setups would have:

a) made memory management significantly easier
b) not need to rely on the devkits being extremely optimal to obtain good performance from memory
c) you'd not have to include the memory shuffling commands that they have in the API to swap data from slow to fast pools and vice versa, thus reducing the likelihood of random hitching.
d) BC is a non issue with everything being uniform, BC go forward for next generation will have memory swaps for seemingly no reason now.

BC performance may also have been improved with upgrading last gen title as the memory layout becomes a bit more challenging with the split pools.

Overall, MS may have determined that the end result was the same in the long run, but I'm not sure if they are hitting their internal performance targets right now. Performing at about or worse than a 5700 is not where I think they expect XSX to land.

With respect to velocity architecture, it's not exactly clear how a split pool layout or a uniform memory layout would have equal or worse performance. It clearly can't be better than a uniform pool however.

Maybe on its own it doesn't, but perhaps when combined with the GPU clocks being somewhat low for a higher-end RDNA 2 card (even if part of the reason is for power consumption targets), the clocks could exacerbate whatever complications lie in a split pool?

In the end it's on Microsoft to improve the performance situation; they clearly have the resources but it's also a matter of priorities. If they can, they can't take too long. VRS Tier 2 support in new 1P games is a good start, but I know it's nagging them they can't claim that "best place to play 3P games" title at this time since more often than not they have not been consistently outperforming the competition and in some ways lagging behind. The differences might be smaller than ever, but they're still there and analysis puts a magnifying glass on things the vast majority otherwise wouldn't be able to spot on their own (I sure likely wouldn't).

Question is will they have their Sega moment; by the time Sega got the new SGL stuff ready for Saturn they put out games like Virtua Fighter 2 that ran better than anything on PS1 that year and was pretty noticeable. Microsoft needs that type of a bump IMO but I dunno if they'll get it considering some of the big games coming from Sony 1P this year (provided there are no delays). But if they can at least get things to par where they're more consistently pulling edges, even if they aren't that grand in the scheme of things, they'll probably feel more assured.
 
Last edited:
Maybe on its own it doesn't, but perhaps when combined with the GPU clocks being somewhat low for a higher-end RDNA 2 card (even if part of the reason is for power consumption targets), the clocks could exacerbate whatever complications lie in a split pool?
So speaking with Dobwal, yea it the issues are probably not clock speed related at all. So there are some possibilities for things that interpreted wrong, but the main concern seems to be understanding the reservation amount for game standard pool and GPU optimal pool. It's not exactly clear either if the CPU can access the GPU optimal pool, while the GPU can access both pools.

This may cause some issues with developers if the amount of available memory in the standard pool if it is not large enough, so you're left with trying to figure out how to fit everything in there which there are a variety of options that exist, but all of them could lead to performance implications. So from what I can read from the documentation with the help of dobwal, is that to circumvent this problem, earlier 2020 the kit they allocated even more of the GPU optimal pool to the standard pool at first so it was 7GB slow pool, and 9GB fast pool, and have slowly been giving memory back to the GPU optimal pool over time, as of June 6.5 slow pool, 9.5 fast pool. Not sure if things have changed since then.

So there is a lot of pressure to get something velocity architecture working to decrease the pressure on memory allocation for GPU optimal pool. And it seems like MS is working a method to allow developers to reserve GPU optimal pool to standard memory pool.

If there was a unified pool, developers wouldn't have to tackle with this, they would just work within their bounds. So it's been a little more complicated it appears.
 
So speaking with Dobwal, yea it the issues are probably not clock speed related at all. So there are some possibilities for things that interpreted wrong, but the main concern seems to be understanding the reservation amount for game standard pool and GPU optimal pool. It's not exactly clear either if the CPU can access the GPU optimal pool, while the GPU can access both pools.

This may cause some issues with developers if the amount of available memory in the standard pool if it is not large enough, so you're left with trying to figure out how to fit everything in there which there are a variety of options that exist, but all of them could lead to performance implications. So from what I can read from the documentation with the help of dobwal, is that to circumvent this problem, earlier 2020 the kit they allocated even more of the GPU optimal pool to the standard pool at first so it was 7GB slow pool, and 9GB fast pool, and have slowly been giving memory back to the GPU optimal pool over time, as of June 6.5 slow pool, 9.5 fast pool. Not sure if things have changed since then.

So there is a lot of pressure to get something velocity architecture working to decrease the pressure on memory allocation for GPU optimal pool. And it seems like MS is working a method to allow developers to reserve GPU optimal pool to standard memory pool.

If there was a unified pool, developers wouldn't have to tackle with this, they would just work within their bounds. So it's been a little more complicated it appears.


Any guess as to when we might see the first implementations of the velocity architecture? using sampler feedback streaming and the like?
 
Any guess as to when we might see the first implementations of the velocity architecture? using sampler feedback streaming and the like?
I wouldn't expect anything until years 2-3. Aside from not knowing the state of the GDK, COVID, and generally speaking engines have a hard time adopting so much change so quickly. The some major DX12 features are now only being implemented now into engines, more than 6 years after DX12 was released. With access to FL12_1 that's already going to be an improvement, more so with everything coming from DX12U.

There may be faster adoption this time around since consoles and PC are finally aligned, it may be worthwhile for developers to speed up feature support and start leaving behind a larger portion of the population, but to profitable in terms of population sizes that support these features, they'd need to land around year 2-3 of this generation. So 2 generations after Ampere and RDNA 2, and 3 years of PS5 and Series X|S.

I wouldn't expect any support earlier unless they have been exclusively designed for the console for a long time now without cross gen support.
 
I wouldn't expect anything until years 2-3. Aside from not knowing the state of the GDK, COVID, and generally speaking engines have a hard time adopting so much change so quickly. The some major DX12 features are now only being implemented now into engines, more than 6 years after DX12 was released. With access to FL12_1 that's already going to be an improvement, more so with everything coming from DX12U.

There may be faster adoption this time around since consoles and PC are finally aligned, it may be worthwhile for developers to speed up feature support and start leaving behind a larger portion of the population, but to profitable in terms of population sizes that support these features, they'd need to land around year 2-3 of this generation. So 2 generations after Ampere and RDNA 2, and 3 years of PS5 and Series X|S.

I wouldn't expect any support earlier unless they have been exclusively designed for the console for a long time now without cross gen support.
I guess the problem is the long development time of games these days. Because of that it is much harder to adapt to newer stuff.
E.g. in the PS360 gen, games tend to need 2-3 years. With psx1 gen, it was more 3-5 years. It could go faster (with smaller games) and reusing stuff (like engines, ....) but it get's more and more complicated to get a game done because games scaled in every way. Better sounds, more sounds, higher quality assets, more assets, bigger worlds, more NPCs, better AI, ...
This has all a time and complexity cost.
E.g. Nintendo has a high output of games on the Switch, but most of them are "just" remasters from the WiiU version not that many played before (because of low WiiU sales). So their output of new games is actually not that high. No metroid (not even on the WiiU), no new Mario Kart (it is soon 7 years old), no new Pikmin (also 7 years old), ... so even they have output problems with new stuff.

Btw, the only thing that did not scale that well is disc space (or SSD space). A bit less is needed (because packaging is not essential for performance) but overall ... after it is not much.
 
Last edited:
So speaking with Dobwal, yea it the issues are probably not clock speed related at all. So there are some possibilities for things that interpreted wrong, but the main concern seems to be understanding the reservation amount for game standard pool and GPU optimal pool. It's not exactly clear either if the CPU can access the GPU optimal pool, while the GPU can access both pools.

This may cause some issues with developers if the amount of available memory in the standard pool if it is not large enough, so you're left with trying to figure out how to fit everything in there which there are a variety of options that exist, but all of them could lead to performance implications. So from what I can read from the documentation with the help of dobwal, is that to circumvent this problem, earlier 2020 the kit they allocated even more of the GPU optimal pool to the standard pool at first so it was 7GB slow pool, and 9GB fast pool, and have slowly been giving memory back to the GPU optimal pool over time, as of June 6.5 slow pool, 9.5 fast pool. Not sure if things have changed since then.

So there is a lot of pressure to get something velocity architecture working to decrease the pressure on memory allocation for GPU optimal pool. And it seems like MS is working a method to allow developers to reserve GPU optimal pool to standard memory pool.

If there was a unified pool, developers wouldn't have to tackle with this, they would just work within their bounds. So it's been a little more complicated it appears.

That's interesting. So correct me if I'm wrong but, the way described here makes it sound like any data intended for the GPU and its memory pool needs to be in the "slower" memory pool for CPU & audio, which seems more or less like how things are typically done on PC with data going in the system RAM and copied to the GPU VRAM. But we also know that newer GPUs will have ways of avoiding this step with things like GPUDirectStorage and DirectStorage.

Again maybe I've misinterpreted but that sounds to be what is the case with Series X's memory setup. The main difference being that data isn't getting shuffled over PCIe. I'm...not exactly sure how this will pan out, but it certainly sounds like it comes with a good number of complications that could've been avoided by just going with 20 GB of memory. I assume the costs in developing, implementing and deploying VA would still be less than that extra 4 GB for all Series X systems, but it does also sound like this adds some extra work for developers to manage, which can be tricky when time is money.

I wouldn't expect anything until years 2-3. Aside from not knowing the state of the GDK, COVID, and generally speaking engines have a hard time adopting so much change so quickly. The some major DX12 features are now only being implemented now into engines, more than 6 years after DX12 was released. With access to FL12_1 that's already going to be an improvement, more so with everything coming from DX12U.

There may be faster adoption this time around since consoles and PC are finally aligned, it may be worthwhile for developers to speed up feature support and start leaving behind a larger portion of the population, but to profitable in terms of population sizes that support these features, they'd need to land around year 2-3 of this generation. So 2 generations after Ampere and RDNA 2, and 3 years of PS5 and Series X|S.

I wouldn't expect any support earlier unless they have been exclusively designed for the console for a long time now without cross gen support.

That's kind of disappointing, because again if the way I've interpreted what you said in the other post is accurate, then this issue of fast/slow RAM allocation could become more of a strain for a majority of larger games before it starts to decrease in terms of required micromanagement for devs. I think MS needs to prioritize acceleration of VA feature support for 1P and 3P games; the fact that a majority of their 1P is not even cross-gen anymore (outside of potentially Halo Infinite and Grounded) suggests they might be more willing to do this internally, and assist 3P developers with implementing VA support faster.

The new consoles are already outpacing PS4 and XBO; several Switch games now leverage clould co-processing for picking up heavy-duty rendering slack and the platform itself is selling gangbusters. So in terms of marketshare I think even by late summer this year all three platforms will be in a good enough place to where dropping cross-gen support accelerates itself, or where cross-gen support takes a big backseat in priority. The only unknown variable is PC; not because hardcore/core PC gamers don't want to upgrade, but because they simply can't due to lack of availability. Availability that's even worst than with the new consoles due to crypto miners.
 
That's interesting. So correct me if I'm wrong but, the way described here makes it sound like any data intended for the GPU and its memory pool needs to be in the "slower" memory pool for CPU & audio, which seems more or less like how things are typically done on PC with data going in the system RAM and copied to the GPU VRAM. But we also know that newer GPUs will have ways of avoiding this step with things like GPUDirectStorage and DirectStorage.

Again maybe I've misinterpreted but that sounds to be what is the case with Series X's memory setup. The main difference being that data isn't getting shuffled over PCIe. I'm...not exactly sure how this will pan out, but it certainly sounds like it comes with a good number of complications that could've been avoided by just going with 20 GB of memory. I assume the costs in developing, implementing and deploying VA would still be less than that extra 4 GB for all Series X systems, but it does also sound like this adds some extra work for developers to manage, which can be tricky when time is money.
edit: ignore this reply lol btw. it's completely based on wrong information.

Right, memory that is allocated to the GPU may actually need to be allocated to the CPU, but because of the split pool, developers can't use it and need work arounds to reallocate memory to it. The way memory is mapped, developers can't just freely use what's available without planning or designing around it. Instead of filling a single bucket, now they got two separate buckets to play with. Most of the earlier discussion around the disadvantages around split pool tended to formulate around the 'average' bandwidth between two pools, but size considerations were never really discussed (and honestly, without access to documentation I wouldn't have suspected this either)

The issue is likely going to be cropping up the most during the transition period of games. These are games that still traditionally use the CPU for a lot of GPU functions, like culling, animation etc. So if the CPU needs to access these memory locations, this information needs to sit in the slow pool for it to do it's updates etc. I think the traditional thought process here is that GPU needs a huge amount of memory, which it may in the future, but as of this moment with cross gen, you may not see so much budget being placed towards super high quality assets, so the amount of VRAM required by the GPU may be lower, like 7-8 GB. And the CPU may use the rest. But with Series X|S you are locked with how much you have in both areas, so you careful planning on how to do it which is difficult when you need to make considerations for last generation.

This may explain why there are random major drops on XSX with some of these launch titles. They simply ran out of room on the CPU or GPU side and needed to perform some terrible workaround to get it all to fit. ie, relying on the SSD to stream level/animation/vertices data in for new monsters etc while slowly unloading parts of the level out.

That being said however, the most critical features for GPU dispatch are included in the older generations of hardware (at least for Xbox One it is confirmed and for PS4 sort of assumed), so it's really about re-writing their rendering pipelines as PC is holding them back in this regard.

In the end it's only speculation, but just my thoughts on the performance on Series X|S so far. If (or rather) the games that can get to/using GPU driven pipelines: animation, vertex, culling, ordering etc, can all be performed by the GPU, improving the bandwidth by moving that particular data to GPU optimal memory, removing it from the standard memory pool, and freeing the CPU up to do other things or do a better job at holding higher framerates, working on AI or processing other things.

It's not Velocity Architecture that needs to be adopted really. I mean, that's one way to attack the issue, but that's a texturing solution. What about mesh information? Animation information? What needs to be addressed is the move to GPU driven rendering pipelines.
 
Last edited:
On the SeriesX The CPU and GPU can reference data from anywhere, its just the maximum speed at which the data can be referenced is different (560 vs 336). The CPU has an upper bound far lower than the slower memory pool anyways.

But the devs need to be aware of where the memory they're using is at if its frequently used by the GPU.

Entirely different story on PC with physically separate memory pools. Though some of that is slightly mitigated by the Resizable BAR implementations, but its still slower than if the memory was directly accessible by both CPU and GPU.
 
Back
Top