Velocity Architecture - Limited only by asset install sizes

Thank you.



I am not against what you're saying. But at the end of the day you want a large amount of RAM such that the data is closer to the CPU/GPU. At the end of the day disk storage is just an I/O device and RAM is actual global memory of the CPU. Thats the most important thing alongside enough memory bandwidth for whatever processor will be accessing the device. You want higher memory bandwidth for highly parallel workloads in GPUs so more memory bytes per cycle on GPUs than CPUs.

But at the end of the day I was focused on what developers wanted after MSFT & Sony ascertained the amount of RAM they needed(At least 16GB) i.e the data path between memory and disk I/O. So memory bandwidth didn't really play a role in this argument. We know the disk I/O was definitely becoming a bottleneck because of the use of HDDs. The aim of adding SSDs is not to have SSDs replace RAM or constantly fill up RAM, its simply to have them fast enough such that devs can utilize the RAM more efficiently. Larger working sets of RAM and much better demand paging. At the end of the day it is much more cost effective to simply use an SSD with decompression hw to augment the disk I/O than aiming to go as fast as possible with the SSD. And thats what they did and will do with 10th gen hw as well.

Right, so I guess the question is what will be considered the right amount of storage bandwidth for 10th-gen systems. There's some ratios that can be figured by looking at bandwidth rates versus capacity amounts through that hierarchy of SSD<RAM<Cache, I think on some level the people at Microsoft and Sony have those in consideration, and also what amount of storage bandwidth can be had to satisfy maintaining those ratios going into a new gen if not compressing certain ratios down to expedite parts of the data path more. Then also considering what amount can be had to ensure sufficient desired transfer for lossless vs. lossy compression rates.

Thinking more about it, those ratios I mentioned earlier, will probably have a direct impact on how much RAM they will want for 10th-gen systems to begin with, since the storage bandwidth would need to scale to a certain amount to keep the ratios intact, and they wouldn't want anything relying on too much a crapshoot of hopeful improved bandwidth accounts in NAND that can't be guaranteed to be there, since that'd make it harder to calculate what the BOM production amounts would be, and what amounts they could/would need to order that stuff at in order to start getting the mass discounts thanks to economies of scale.

Do you think NVRAM might see a future with 10th-gen systems? I thought about that before in earlier speculation but went back on it because outside of Intel and Micron no one else really produces it at large capacities or quantities, and there are already SSDs matching the bandwidth figures of most Optane or X100 drives. Though, the benefit with NVRAM is that it offers true random access (something Microsoft and Sony in particular are trying to mimic with parallelized channels for their storage, though I'm curious how Sony have addressed any added latency from using a larger number of slower NAND devices. Guess not using PCIe for the interconnect of the internal drive is one such way) and significantly better latency, both of which are things I think fit into serving what you'd want with 10th-gen systems.

If you could procure NVRAM at a 4:1 or even 3:1 ratio in terms of capacity compared to RAM, especially if it's GDDR-based instead of something more expensive like HBM, I wonder if it would be genuinely worth including at least for Microsoft in a 10th-gen system, even if Sony just focused more on exploiting higher-quality NAND devices through low-level direct interconnects on the board (if they don't, say, get their ReRAM up and running because that seems really far behind).
 
Right, so I guess the question is what will be considered the right amount of storage bandwidth for 10th-gen systems. There's some ratios that can be figured by looking at bandwidth rates versus capacity amounts through that hierarchy of SSD<RAM<Cache, I think on some level the people at Microsoft and Sony have those in consideration, and also what amount of storage bandwidth can be had to satisfy maintaining those ratios going into a new gen if not compressing certain ratios down to expedite parts of the data path more. Then also considering what amount can be had to ensure sufficient desired transfer for lossless vs. lossy compression rates.

Thinking more about it, those ratios I mentioned earlier, will probably have a direct impact on how much RAM they will want for 10th-gen systems to begin with, since the storage bandwidth would need to scale to a certain amount to keep the ratios intact, and they wouldn't want anything relying on too much a crapshoot of hopeful improved bandwidth accounts in NAND that can't be guaranteed to be there, since that'd make it harder to calculate what the BOM production amounts would be, and what amounts they could/would need to order that stuff at in order to start getting the mass discounts thanks to economies of scale.

Do you think NVRAM might see a future with 10th-gen systems? I thought about that before in earlier speculation but went back on it because outside of Intel and Micron no one else really produces it at large capacities or quantities, and there are already SSDs matching the bandwidth figures of most Optane or X100 drives. Though, the benefit with NVRAM is that it offers true random access (something Microsoft and Sony in particular are trying to mimic with parallelized channels for their storage, though I'm curious how Sony have addressed any added latency from using a larger number of slower NAND devices. Guess not using PCIe for the interconnect of the internal drive is one such way) and significantly better latency, both of which are things I think fit into serving what you'd want with 10th-gen systems.

If you could procure NVRAM at a 4:1 or even 3:1 ratio in terms of capacity compared to RAM, especially if it's GDDR-based instead of something more expensive like HBM, I wonder if it would be genuinely worth including at least for Microsoft in a 10th-gen system, even if Sony just focused more on exploiting higher-quality NAND devices through low-level direct interconnects on the board (if they don't, say, get their ReRAM up and running because that seems really far behind).

For RAM you want to at least double it on 10th gen systems so that's the minimum we should expect. We're also likely going to be having more accelerators for Ray Tracing and ML. So the GPUs will require even much more memory bandwidth. If you're developing a system that requires high memory bandwidth you're going to want to use HBM. And with a move to cloud computing HBM memory will make a lot more sense since it brings down energy costs even in the servers. So MSFT for example would benefit from lower server costs by using HBM in its next gen Xbox and the servers that power its cloud gaming division. You can't beat the high bandwidth low latency low energy benefits of HBM. Just take a look at the fugaku super computer. Its using HBM2.

So now it comes to the storage. The cost of NAND is going down significantly as MSFT's Xbox team pointed at at HotChips. So SSDs may be expensive on the PS5 and Series X(at the start they're going to be much cheaper during the gen) but will likely be at the same of lower percentage cost of the system with 10th gen. With double the RAM, MSFT & Sony would only need to hit SSD speeds of around 12GB/s before decompression. SSDs on Gen 4 are already at 7GB/s at the start of this gen. The 12GB/s SSD could be achieved on 10th gen systems either through PCIe Gen 4 x6. Or 12GB/s on Gen 5 x3. So And the beauty to this is they could simply ship 1-2TB models and users would be able to expand the storage via a m.2 expansion bay

So that makes the case for NVRAM hard to justify unless it offers a huge bandwidth improvement over regular SSDs at a low cost. I just don't see that happening. You'd have to rewrite the OS to abstract the data transfer from NAND SSD->NVRAM->RAM and all sorts of other things for marginal benefits. So faster regular SSDs is the way to go. I don't see NVRAM playing a role
 
For RAM you want to at least double it on 10th gen systems so that's the minimum we should expect. We're also likely going to be having more accelerators for Ray Tracing and ML. So the GPUs will require even much more memory bandwidth. If you're developing a system that requires high memory bandwidth you're going to want to use HBM. And with a move to cloud computing HBM memory will make a lot more sense since it brings down energy costs even in the servers. So MSFT for example would benefit from lower server costs by using HBM in its next gen Xbox and the servers that power its cloud gaming division. You can't beat the high bandwidth low latency low energy benefits of HBM. Just take a look at the fugaku super computer. Its using HBM2.

So now it comes to the storage. The cost of NAND is going down significantly as MSFT's Xbox team pointed at at HotChips. So SSDs may be expensive on the PS5 and Series X(at the start they're going to be much cheaper during the gen) but will likely be at the same of lower percentage cost of the system with 10th gen. With double the RAM, MSFT & Sony would only need to hit SSD speeds of around 12GB/s before decompression. SSDs on Gen 4 are already at 7GB/s at the start of this gen. The 12GB/s SSD could be achieved on 10th gen systems either through PCIe Gen 4 x6. Or 12GB/s on Gen 5 x3. So And the beauty to this is they could simply ship 1-2TB models and users would be able to expand the storage via a m.2 expansion bay

So that makes the case for NVRAM hard to justify unless it offers a huge bandwidth improvement over regular SSDs at a low cost. I just don't see that happening. You'd have to rewrite the OS to abstract the data transfer from NAND SSD->NVRAM->RAM and all sorts of other things for marginal benefits. So faster regular SSDs is the way to go. I don't see NVRAM playing a role

Yeah I don't see NVRAM making too big a splash either. It would have a big bandwidth & latency advantage over NAND, the question is the costs reduction over time. Intel and Micron are the only companies that make NVRAM at large-ish volumes but it's still niche compared to NAND-based SSDs. Unless Micron's X100 drives are a big success I don't see NVRAM getting too much a use outside of server markets, and that will keep the costs somewhat high since businesses are willing to pay premiums for such things, just like they currently do for HBM2E.

Which...actually is just the main reason I am still unsure if BOTH Sony and Microsoft leverage HBM-based memory for 10th-gen systems. It really comes down to what advancements are made with GDDR7 when it hits around 2nd generation chips or the such. Capacities need to double per module, I/O bandwidth has to double as well, in order to make it viable, while still keeping same or lower power consumption over GDDR6 chips. I think Microsoft would be more willing to eat the costs on a hypothetical HBM3 (not HBMNext, that's just Micron's version of HBM2E) than Sony, if the choice came down to GDDR7 with big gains and reasonable bandwidth on a 256-bit bus (or 288-bit at most; Sony seems to prefer narrower memory buses for GDDR) and cheaper than HBM3.

I'm still half-and-half on those SSD bandwidth ideas though; they do make sense on one hand, but on the other hand I think it's also worth considering what ratio of bandwidth (with and without decompression) to RAM is sufficient in relation to RAM bandwidth to the GPU in order to maintain or improve certain ratios we see on the 9th-gen systems. But in any case, decompression algorithms should get even better and be better implemented in software/hardware stacks to expedite that process even moreso.
 
Yeah I don't see NVRAM making too big a splash either. It would have a big bandwidth & latency advantage over NAND, the question is the costs reduction over time. Intel and Micron are the only companies that make NVRAM at large-ish volumes but it's still niche compared to NAND-based SSDs. Unless Micron's X100 drives are a big success I don't see NVRAM getting too much a use outside of server markets, and that will keep the costs somewhat high since businesses are willing to pay premiums for such things, just like they currently do for HBM2E.

Which...actually is just the main reason I am still unsure if BOTH Sony and Microsoft leverage HBM-based memory for 10th-gen systems. It really comes down to what advancements are made with GDDR7 when it hits around 2nd generation chips or the such. Capacities need to double per module, I/O bandwidth has to double as well, in order to make it viable, while still keeping same or lower power consumption over GDDR6 chips. I think Microsoft would be more willing to eat the costs on a hypothetical HBM3 (not HBMNext, that's just Micron's version of HBM2E) than Sony, if the choice came down to GDDR7 with big gains and reasonable bandwidth on a 256-bit bus (or 288-bit at most; Sony seems to prefer narrower memory buses for GDDR) and cheaper than HBM3.

I'm still half-and-half on those SSD bandwidth ideas though; they do make sense on one hand, but on the other hand I think it's also worth considering what ratio of bandwidth (with and without decompression) to RAM is sufficient in relation to RAM bandwidth to the GPU in order to maintain or improve certain ratios we see on the 9th-gen systems. But in any case, decompression algorithms should get even better and be better implemented in software/hardware stacks to expedite that process even moreso.


There was an interview that the xbox dev team did, irrc it was the hotchips presentation, where they said that they were initially considering using hbm memory when the project started but that the wider market for hbm beyond graphics cards never materialised, so the cost of the hbm itself was just too high.

One thing that I think they might do is have tiered memory, so instead of 16 gb gddr5 ram like both consoles have you would have 10gb gddr5 and say 12 gb gddr4. This wouldnt be gddr5 for graphics card and gddr4 for cpu/OS, but some dev allocatable system.

The other thing I am surprised that neither of them did is to have a seperate ram chip for the OS at a slower speed for a lower cost, you could have had 16gb gddr5 for the games and 4gb gddr4 dedicated to the OS. I am especially surprised that microsoft didnt go this route, that way they could have had a completely common OS between the series x and s with the same amount of OS ram for both.
 
Which...actually is just the main reason I am still unsure if BOTH Sony and Microsoft leverage HBM-based memory for 10th-gen systems. It really comes down to what advancements are made with GDDR7 when it hits around 2nd generation chips or the such. Capacities need to double per module, I/O bandwidth has to double as well, in order to make it viable, while still keeping same or lower power consumption over GDDR6 chips. I think Microsoft would be more willing to eat the costs on a hypothetical HBM3 (not HBMNext, that's just Micron's version of HBM2E) than Sony,

HBM2 is I think $140 for 16GB right now. Whatever HBM memory is available in 2026 will be much more affordable. If they'll be doubling memory then 32 GB of memory at a cost of $100 is possible with HBM memory. I haven't heard anything about GDDR7 but no way it will have the high bandwidth low latency low energy advantages of HBM memory. MSFT is definitely in a better position to use HBM since they have huge cloud gaming ambitions. The only way I see "GDDR7" being a viable option is if it offers significant cost benefits. Otherwise HBM is much better even with signal integrity. Only cost is the issue with HBM.

if the choice came down to GDDR7 with big gains and reasonable bandwidth on a 256-bit bus (or 288-bit at most; Sony seems to prefer narrower memory buses for GDDR) and cheaper than HBM3.

I can see Sony go with GDDR7 if HBM is still expensive. But honestly if you have larger die areas for RT and ML acceleration, you need much higher bandwidth. So even they should go with HBM. I think they chose a 256 bit bus due to the performance of their GPU. Series X GPU has more CUs and thus needs more memory bandwidth so a wider memory bus. But yeah lets wait and see.

I'm still half-and-half on those SSD bandwidth ideas though; they do make sense on one hand, but on the other hand I think it's also worth considering what ratio of bandwidth (with and without decompression) to RAM is sufficient in relation to RAM bandwidth to the GPU in order to maintain or improve certain ratios we see on the 9th-gen systems. But in any case, decompression algorithms should get even better and be better implemented in software/hardware stacks to expedite that process even moreso.

yeah for those we'll have to wait and see. I was just basing my estimates on doubling of RAM.
 
HBM2 is I think $140 for 16GB right now. Whatever HBM memory is available in 2026 will be much more affordable. If they'll be doubling memory then 32 GB of memory at a cost of $100 is possible with HBM memory. I haven't heard anything about GDDR7 but no way it will have the high bandwidth low latency low energy advantages of HBM memory. MSFT is definitely in a better position to use HBM since they have huge cloud gaming ambitions. The only way I see "GDDR7" being a viable option is if it offers significant cost benefits. Otherwise HBM is much better even with signal integrity. Only cost is the issue with HBM.

Do you have a source for the HBM2 costs by chance? I've been trying my damnedest to find price listings for HBM but it has been virtually impossible. The only pricing quotes I've seen for HBM or HBM2 are from old articles talking about AMD's costs for HBM on Radeon VII, and a single post that said "something" about assumed HBM2 costs. I've also come across reports of various clients willing to "pay premiums" for HBM2 and HBM2E, those reports weren't too far back.

Sadly I can't find any listing for HBM or HBM2 prices on DRAMExchange, can't find it listed on wholesaler websites either like Mouser. Because of that I've actually been a bit hesitant on the claims of HBM2 being relatively affordable now. Maybe it is for clients of big server and data markets but the markup costs from the manufacturers (only SK Hynix and Samsung make HBM memories at this time; Micron's going to in the future with HBMNext which is their version of HBM2E) don't make it sound too affordable at prices that could fit a gaming console. Though, Microsoft did say at Hot Chips they considered HBM2 for Series systems but JEDEC wouldn't clear certain things with them so they dropped the idea. That might support your claim of HBM2 being around the price you mention because I don't see Microsoft wanting to eat too big of costs for 16 GB of the stuff.

And, if that's the case, then it does open the door for at least HBM2E in 10th-gen systems. The highest-performing of it is SK Hynix's, I think it's 460 GB/s per 8-Hi stack. You'd need higher bandwidth per stack for any 10th-gen console however.

I can see Sony go with GDDR7 if HBM is still expensive. But honestly if you have larger die areas for RT and ML acceleration, you need much higher bandwidth. So even they should go with HBM. I think they chose a 256 bit bus due to the performance of their GPU. Series X GPU has more CUs and thus needs more memory bandwidth so a wider memory bus. But yeah lets wait and see.

The thing though is that the bandwidth can be done one of two ways: either high-bandwidth, relatively low-latency off-chip memory at large relative capacity, or extremely high-bandwidth, extremely low-latency on-chip memory cache at smaller capacities. I think out of the two, with the way things are moving in semiconductor GPU, CPU and embedded system fields in general, they'd choose the latter, especially for a console as the nodes shrink and packaging techniques become better over time.

There's a reason why AMD skipped over going with wider GDDR6 buses and faster off-chip memory and instead is focusing on Infinity Cache. The more you can allocate to capacity and density of data on the chip, the better. Only when you hit the limit there does focusing a lot on off-chip memory bandwidth start to make more sense. AMD's architecture seems like it's a good fit for things like IC, so they don't need as much reliance on a fat off-chip bus and bandwidth, at least when you look at performance between the RDNA cards and Nvidia's RTX 30 cards for rasterization performance, especially when you also factor in Smart Access Memory (which is basically mimicking aspects of hUMA that the consoles naturally already have).

So while I don't think HBM would be chosen for 10th-gen systems for necessarily the same reasons you might, I do agree that if the pricing is right, there's zero reason for Sony or especially Microsoft to choose GDDR7 over something like improved HBMNext or (maybe affordable) HBM3/HBM4. Because you'd get similar pricing, and all of the benefits you already bring up, which compliment bigger prioritization of on-chip cache capacity and bandwidth/latency even better.
 
There was an interview that the xbox dev team did, irrc it was the hotchips presentation, where they said that they were initially considering using hbm memory when the project started but that the wider market for hbm beyond graphics cards never materialised, so the cost of the hbm itself was just too high.

Yeah, I remember this being mentioned, but I don't think it was actually about costs. I think @rtongo might be right (or somewhere in the ballpark) with price estimate of HBM2 (not HBM2E, though, which is worth considering because 16 GB stack of HBM2 would not have gotten Microsoft the bandwidth they get with 16 GB GDDR6 on a 320-bit bus).

IIRC, it was more because of some specifications or something like that with JEDEC and Microsoft just decided to say screw it since it was taking too long, and went with GDDR6 instead. I've got no idea what that could've been about in particular since their response was extremely vague.

One thing that I think they might do is have tiered memory, so instead of 16 gb gddr5 ram like both consoles have you would have 10gb gddr5 and say 12 gb gddr4. This wouldnt be gddr5 for graphics card and gddr4 for cpu/OS, but some dev allocatable system.

You mean 10 GB GDDR6 and 12 GB GDDR5 ;) ? Hmm...that could be interesting. I think though if they'd want to have specific memory for the CPU/OS, why not just go with DDR5? Series S kind of does something like this already but with same memory type, there's the 8 GB GDDR6 (4 chips) for the GPU and a single 2 GB GDDR6 (1 chip) for the CPU/audio/OS.

I don't know how that would necessarily work without shadow copying data into the GPU pool from the CPU one, though it seems like things like SAM on PC are already addressing that in nUMA setups, and the consoles are inherently hUMA so is that something I'm concerned with for no reason? Otherwise if it would be of concern, I guess GDDR6/GDDR5/GDDR4 tiered memory could work if it'd avoid that potential quirk.

The other thing I am surprised that neither of them did is to have a seperate ram chip for the OS at a slower speed for a lower cost, you could have had 16gb gddr5 for the games and 4gb gddr4 dedicated to the OS. I am especially surprised that microsoft didnt go this route, that way they could have had a completely common OS between the series x and s with the same amount of OS ram for both.

Yeah I'm kind of surprised neither of them did that, either. Maybe it was just too much to include in the design? I think one of the reasons though, could be because if the OS is actively residing on one of the pools of memory, and you're allocating non-OS stuff like games to the other pool, would the OS need to copy a portion of itself into the other memory pool if the memory types are different?

Or at the very least, commands for processes using the other memory pool'd have to be rooted in the slower pool then transferred to the faster one like with on PC. So I guess that's the reason. Actually the contention issue there in terms of resources feels more like a core/thread one; having to reserve a whole core and two threads for the OS kind of sucks no matter what. But the opposite in resolving that, having an ARM chip to run the OS off of, presents its own issues because now you're talking a dual-CPU system and that's a whole other beast.
 
Do you have a source for the HBM2 costs by chance? I've been trying my damnedest to find price listings for HBM but it has been virtually impossible. The only pricing quotes I've seen for HBM or HBM2 are from old articles talking about AMD's costs for HBM on Radeon VII, and a single post that said "something" about assumed HBM2 costs. I've also come across reports of various clients willing to "pay premiums" for HBM2 and HBM2E, those reports weren't too far back.

Sadly I can't find any listing for HBM or HBM2 prices on DRAMExchange, can't find it listed on wholesaler websites either like Mouser. Because of that I've actually been a bit hesitant on the claims of HBM2 being relatively affordable now. Maybe it is for clients of big server and data markets but the markup costs from the manufacturers (only SK Hynix and Samsung make HBM memories at this time; Micron's going to in the future with HBMNext which is their version of HBM2E) don't make it sound too affordable at prices that could fit a gaming console. Though, Microsoft did say at Hot Chips they considered HBM2 for Series systems but JEDEC wouldn't clear certain things with them so they dropped the idea. That might support your claim of HBM2 being around the price you mention because I don't see Microsoft wanting to eat too big of costs for 16 GB of the stuff.

And, if that's the case, then it does open the door for at least HBM2E in 10th-gen systems. The highest-performing of it is SK Hynix's, I think it's 460 GB/s per 8-Hi stack. You'd need higher bandwidth per stack for any 10th-gen console however.



The thing though is that the bandwidth can be done one of two ways: either high-bandwidth, relatively low-latency off-chip memory at large relative capacity, or extremely high-bandwidth, extremely low-latency on-chip memory cache at smaller capacities. I think out of the two, with the way things are moving in semiconductor GPU, CPU and embedded system fields in general, they'd choose the latter, especially for a console as the nodes shrink and packaging techniques become better over time.

There's a reason why AMD skipped over going with wider GDDR6 buses and faster off-chip memory and instead is focusing on Infinity Cache. The more you can allocate to capacity and density of data on the chip, the better. Only when you hit the limit there does focusing a lot on off-chip memory bandwidth start to make more sense. AMD's architecture seems like it's a good fit for things like IC, so they don't need as much reliance on a fat off-chip bus and bandwidth, at least when you look at performance between the RDNA cards and Nvidia's RTX 30 cards for rasterization performance, especially when you also factor in Smart Access Memory (which is basically mimicking aspects of hUMA that the consoles naturally already have).

So while I don't think HBM would be chosen for 10th-gen systems for necessarily the same reasons you might, I do agree that if the pricing is right, there's zero reason for Sony or especially Microsoft to choose GDDR7 over something like improved HBMNext or (maybe affordable) HBM3/HBM4. Because you'd get similar pricing, and all of the benefits you already bring up, which compliment bigger prioritization of on-chip cache capacity and bandwidth/latency even better.

You can read about the HBM prices here

https://semiengineering.com/whats-next-for-high-bandwidth-memory/

I watched hotchips but didn't listen to the QA after so you're right. Glad they are not stuck onto one architecture for DRAM. It makes HBM much more possible in the future.

If you're going to be doing real time ray tracing and you have a separate accelerator for just that you'd need as much bandwidth as you can. Same for AI. If you need to upscale images real time to 8K. So if they went with 4 DRAM dies(32GB RAM) of HBM3 memory they could hit 2TB/s of memory bandwidth!! All this depends on what game developers can take advantage of tbh. A crazier thing would be 4 16GB DRAM dies to hit 64GB of RAM at the same memory bandwidth. That would be insane. We can only dream at this point. Otherwise even if they went with HBM2 it would be at least 1.2TB/s of memory bandwidth without signal integrity issues! But for 10th gen systems they will likely spend most of the BOM on the CPU/GPU(Along with accelerators for RT & AI) and memory. NAND should ideally be one of the lowest costs or any other disk I/O technology. So SCM like ReRAM of whatever wouldn't be as good of an investment.

The thing though is that the bandwidth can be done one of two ways: either high-bandwidth, relatively low-latency off-chip memory at large relative capacity, or extremely high-bandwidth, extremely low-latency on-chip memory cache at smaller capacities. I think out of the two, with the way things are moving in semiconductor GPU, CPU and embedded system fields in general, they'd choose the latter, especially for a console as the nodes shrink and packaging techniques become better over time.

IIRC CPUs are more latency sensitive than GPUs. a cache miss has a higher performance penalty on CPUs than on GPUs. So although its generally true you want to increase the size of on chip memory but there has to be a balance. Otherwise you end up with MSFT's ESRAM situation. You always want to increase memory bandwidth for the GPU! MSFT learnt this the hard way and I don't think they'll ever make that mistake again. On chip memory is also very expensive. I'm surprised AMD GPUs have 128MB of "infinity cache" but the performance gains are marginal. The RTX 3080(with 760GB/s memory bandwidth) only has 5MB of L2 cache and doesn't have L3 cache. On the other hand, the 6800 XT(512 GB/s memory bandwidth) has a whopping 128MB of L3 cache but it still doesn't perform as well as the 3080 or doesn't blow it out of the water. So for highly parallel processors like GPUs best to always increase the memory bandwidth.

There's a reason why AMD skipped over going with wider GDDR6 buses and faster off-chip memory and instead is focusing on Infinity Cache. The more you can allocate to capacity and density of data on the chip, the better. Only when you hit the limit there does focusing a lot on off-chip memory bandwidth start to make more sense. AMD's architecture seems like it's a good fit for things like IC, so they don't need as much reliance on a fat off-chip bus and bandwidth, at least when you look at performance between the RDNA cards and Nvidia's RTX 30 cards for rasterization performance, especially when you also factor in Smart Access Memory (which is basically mimicking aspects of hUMA that the consoles naturally already have).

They did it for cost reasons. A 256 bit bus is cheaper. The huge on chip memory on the AMD GPUs hasn't produced proportionately larger performance increases. The figures I've seen are about 10% increased fps. I bet in the future they may reduce its size or even eliminate it once they can ship cards with HBM memory.
 
There's a reason why AMD skipped over going with wider GDDR6 buses and faster off-chip memory and instead is focusing on Infinity Cache. The more you can allocate to capacity and density of data on the chip, the better. Only when you hit the limit there does focusing a lot on off-chip memory bandwidth start to make more sense. AMD's architecture seems like it's a good fit for things like IC, so they don't need as much reliance on a fat off-chip bus and bandwidth, at least when you look at performance between the RDNA cards and Nvidia's RTX 30 cards for rasterization performance, especially when you also factor in Smart Access Memory (which is basically mimicking aspects of hUMA that the consoles naturally already have).
Yes, there is a reason. Cache is relative easy to produce (much can't go wrong and can always be compensated through some extra cache) has much Die area but does not produce so much heat. So it can be used to get the chip and therefore the cooling area a bit bigger without much risk of breaking a chip because of errors in this extra area. This allows the GPU to more efficiently spread the heat.
There is a reason why sony tried with the PS5 to spread the heat as fast as possible (e.g. through liquid metal).

Also the much bigger cache reduces cache-misses and bandwidth needs. This can be used to reduce the size of the memory interface which is even more complicated to handle at 7nm than at 14/12 before.

But what has all this to do with Velocity?
 
Yes, there is a reason. Cache is relative easy to produce (much can't go wrong and can always be compensated through some extra cache) has much Die area but does not produce so much heat. So it can be used to get the chip and therefore the cooling area a bit bigger without much risk of breaking a chip because of errors in this extra area. This allows the GPU to more efficiently spread the heat.
There is a reason why sony tried with the PS5 to spread the heat as fast as possible (e.g. through liquid metal).

Also the much bigger cache reduces cache-misses and bandwidth needs. This can be used to reduce the size of the memory interface which is even more complicated to handle at 7nm than at 14/12 before.

But what has all this to do with Velocity?

You know how it goes sometimes; talk about one technical feature just slips into another, then into a third and next thing you know we're talking about caches and memory for 10th-gen systems xD

You can read about the HBM prices here

https://semiengineering.com/whats-next-for-high-bandwidth-memory/

I watched hotchips but didn't listen to the QA after so you're right. Glad they are not stuck onto one architecture for DRAM. It makes HBM much more possible in the future.

If you're going to be doing real time ray tracing and you have a separate accelerator for just that you'd need as much bandwidth as you can. Same for AI. If you need to upscale images real time to 8K. So if they went with 4 DRAM dies(32GB RAM) of HBM3 memory they could hit 2TB/s of memory bandwidth!! All this depends on what game developers can take advantage of tbh. A crazier thing would be 4 16GB DRAM dies to hit 64GB of RAM at the same memory bandwidth. That would be insane. We can only dream at this point. Otherwise even if they went with HBM2 it would be at least 1.2TB/s of memory bandwidth without signal integrity issues! But for 10th gen systems they will likely spend most of the BOM on the CPU/GPU(Along with accelerators for RT & AI) and memory. NAND should ideally be one of the lowest costs or any other disk I/O technology. So SCM like ReRAM of whatever wouldn't be as good of an investment.

64 GB RAM for 10th-gen is probably out of the question; we only saw a 2x increase from 8th to 9th and RAM prices are going to likely see more increases as the floor for lower-end smartphones, laptops, tablets, and APUs rises upward, meaning more products competing with top-end phones, GPU cards and consoles for the upper range of top memory. So even if HBM3 would become even more affordable, more companies and products competing for securement of the memory will keep the prices up. If things like cryptocurrency mining continue to grow the way they are now and GPUs continue to be the preferred tool for doing them, then memory manufacturers can artificially keep prices high because they know GPU manufacturers are going to be putting in lots of orders and flip those cards around for a lot of potentially inflated profit margins.

Sickening in a way but, that's just how companies like to do things :/. I think 32 GB for 10th-gen is more realistic for those reasons. Agreed with pretty much all the rest tho.

IIRC CPUs are more latency sensitive than GPUs. a cache miss has a higher performance penalty on CPUs than on GPUs. So although its generally true you want to increase the size of on chip memory but there has to be a balance. Otherwise you end up with MSFT's ESRAM situation. You always want to increase memory bandwidth for the GPU! MSFT learnt this the hard way and I don't think they'll ever make that mistake again. On chip memory is also very expensive. I'm surprised AMD GPUs have 128MB of "infinity cache" but the performance gains are marginal. The RTX 3080(with 760GB/s memory bandwidth) only has 5MB of L2 cache and doesn't have L3 cache. On the other hand, the 6800 XT(512 GB/s memory bandwidth) has a whopping 128MB of L3 cache but it still doesn't perform as well as the 3080 or doesn't blow it out of the water. So for highly parallel processors like GPUs best to always increase the memory bandwidth.

It might be too early to write off IC just yet; it only just came about with RDNA 2 cards and I'm sure AMD will refine it for RDNA 3 and onward. I'd have to look at more benchmarks between the RDNA 2 cards and the RTX 30 ones, but I remember seeing some for rasterized performance in some testing cases where one of the RTX cards (either 3070 or 3080) was losing by 20 or so frames to the 6800 or 6800 XT.

They did it for cost reasons. A 256 bit bus is cheaper. The huge on chip memory on the AMD GPUs hasn't produced proportionately larger performance increases. The figures I've seen are about 10% increased fps. I bet in the future they may reduce its size or even eliminate it once they can ship cards with HBM memory.

If you read Allandor's reply there are other benefits besides cost savings why they did IC. They still hit within ballpark of Nvidia's cards on rasterized performance (which keep in mind this is AMD; to be competitive with Nvidia on rasterized performance after so long is a win in and of itself), but along with saving on costs it lowers chip complexity and reduces heat. RDNA 2 and onward are going to focus a lot on pushing clocks, to do that you need to have thermal budget to work with.

If that thermal budget can be increased with resources freed up to pushing higher clocks by cutting down reliance on off-chip memory to a degree, while at the same time "making up" a lot of that lost performance with on-chip memory increases, then that's a viable approach to take and it seems to be AMD's guiding philosophy going forward. I think if you pair that with stuff like SAM and the benefits of the Zen line working along with the benefits of the RDNA line, that in time will lead to some additional performance gains for RDNA 2, RDNA 3 etc. cards.

I guess to try and move things back to the Velocity Architecture, has anyone seen the video from the guy who modified the files in his Series X drive to remove the boot-up sequences and shave down boot time to 16 seconds? Think it's pretty interesting if that can lead to unofficial mods/hacks to change other parts of the file system or config settings for the OS that could maybe also tap into the VA hardware/software, maybe lead to general OS and QoL performance beating out Microsoft's own.

Don't mean anything jailbreak-like tho; the guy in question didn't jailbreak anything (though with Dev Mode being a thing could you technically jailbreak anything?).
 
You know how it goes sometimes; talk about one technical feature just slips into another, then into a third and next thing you know we're talking about caches and memory for 10th-gen systems xD



64 GB RAM for 10th-gen is probably out of the question; we only saw a 2x increase from 8th to 9th and RAM prices are going to likely see more increases as the floor for lower-end smartphones, laptops, tablets, and APUs rises upward, meaning more products competing with top-end phones, GPU cards and consoles for the upper range of top memory. So even if HBM3 would become even more affordable, more companies and products competing for securement of the memory will keep the prices up. If things like cryptocurrency mining continue to grow the way they are now and GPUs continue to be the preferred tool for doing them, then memory manufacturers can artificially keep prices high because they know GPU manufacturers are going to be putting in lots of orders and flip those cards around for a lot of potentially inflated profit margins.

Sickening in a way but, that's just how companies like to do things :/. I think 32 GB for 10th-gen is more realistic for those reasons. Agreed with pretty much all the rest tho.

The good thing is that at least we should expect a 2x increase to 32GB. With the 64GB I'm just being hopeful since 16GB of HBM2 is already at $120 right now. The cost was $120 per GB just 5 years ago. But yes 32GB is more realistic. The cost of the memory and APU will make up most of the BOM on 10th gen consoles.

It might be too early to write off IC just yet; it only just came about with RDNA 2 cards and I'm sure AMD will refine it for RDNA 3 and onward. I'd have to look at more benchmarks between the RDNA 2 cards and the RTX 30 ones, but I remember seeing some for rasterized performance in some testing cases where one of the RTX cards (either 3070 or 3080) was losing by 20 or so frames to the 6800 or 6800 XT.



If you read Allandor's reply there are other benefits besides cost savings why they did IC. They still hit within ballpark of Nvidia's cards on rasterized performance (which keep in mind this is AMD; to be competitive with Nvidia on rasterized performance after so long is a win in and of itself), but along with saving on costs it lowers chip complexity and reduces heat. RDNA 2 and onward are going to focus a lot on pushing clocks, to do that you need to have thermal budget to work with.

If that thermal budget can be increased with resources freed up to pushing higher clocks by cutting down reliance on off-chip memory to a degree, while at the same time "making up" a lot of that lost performance with on-chip memory increases, then that's a viable approach to take and it seems to be AMD's guiding philosophy going forward. I think if you pair that with stuff like SAM and the benefits of the Zen line working along with the benefits of the RDNA line, that in time will lead to some additional performance gains for RDNA 2, RDNA 3 etc. cards.

I agree its too early to write off Infinity Cache. Having larger on chip memory is always a plus and AMD is performing excellently at a lower price point. It's a better value for money but I think It's very clear right now that DLSS and hw accelerated RT is the way to go for future GPUs. So you have to consider them as a whole and not just switch off DLSS. Its just that now AMD can compete without an emphasis on AI upscaling. But that acceleration requires significantly more memory bandwidth. Also add virtual Reality and the need for higher memory bandwidth increases even more. So that will be one of the biggest factors in future GPUs and its why HBM memory makes the most sense moving forward. I think AMD has been pretty clear about increasing memory bandwidth in the future with HBM!!

If you emphasize on chip memory for the GPU and sacrifice the off chip memory bandwidth you bottleneck the system. Otherwise you could increase memory bandwidth and only provide marginal improvement in the size of the on chip caches and get significantly better performance. Thats what happened with the PS4 and Xbox One.

I guess to try and move things back to the Velocity Architecture, has anyone seen the video from the guy who modified the files in his Series X drive to remove the boot-up sequences and shave down boot time to 16 seconds? Think it's pretty interesting if that can lead to unofficial mods/hacks to change other parts of the file system or config settings for the OS that could maybe also tap into the VA hardware/software, maybe lead to general OS and QoL performance beating out Microsoft's own.

Don't mean anything jailbreak-like tho; the guy in question didn't jailbreak anything (though with Dev Mode being a thing could you technically jailbreak anything?).

I had to look it up and see. Very impressive the load times from the SSD.
 
We can expect any other amount of memory too, like 40GB, depending on density and bus width.

Well that also depends on the memory technology. HBM for example is usually done in multiples of 2 in terms of capacity, and I think the current module capacity limit is either 2 GB or 4 GB, with stacks up to 12-Hi for HBM2E. So you can't get 40 GB that way because the stacks can only be in amounts of 4, so either 4-Hi, 8-Hi or 12-Hi. You can have multiple stacks but I think you'd probably want to have them all be the same height, mainly for performance reasons.

GDDR is a bit different of course; MS actually wanted 20 GB for Series X but costs forced their hand (IMO I think that was a bit of a mistake because it's not like they COULDN'T have afforded the extra 4 GB per system, but I digress), and with clamshell mode the could be doing 40 GB for the Azure implementation of the system. But you need at least a minimum bus size of 320-bit to do that since capacities are either 1 GB or 2 GB. Even then, you still might have to go with clamshell mode unless you do some ridiculous 640-bit memory bus...I don't think that's ever been done with any system or GPU before on a GDDR-based memory, it likely never will.

I agree its too early to write off Infinity Cache. Having larger on chip memory is always a plus and AMD is performing excellently at a lower price point. It's a better value for money but I think It's very clear right now that DLSS and hw accelerated RT is the way to go for future GPUs. So you have to consider them as a whole and not just switch off DLSS. Its just that now AMD can compete without an emphasis on AI upscaling. But that acceleration requires significantly more memory bandwidth. Also add virtual Reality and the need for higher memory bandwidth increases even more. So that will be one of the biggest factors in future GPUs and its why HBM memory makes the most sense moving forward. I think AMD has been pretty clear about increasing memory bandwidth in the future with HBM!!

If you emphasize on chip memory for the GPU and sacrifice the off chip memory bandwidth you bottleneck the system. Otherwise you could increase memory bandwidth and only provide marginal improvement in the size of the on chip caches and get significantly better performance. Thats what happened with the PS4 and Xbox One.

Agreed; I guess what I want to convey then is, all of these things you mention are important, to the point where it won't come down to picking or choosing one over the other. A balanced design will likely implement a focus on both great on-chip capacity & performance, and off-chip memory bandwidth/latency, to ensure things like RT and VR/AR workloads can be satisfied.

I'm not too worried about DLSS, as in, I think in time AMD will have a very good implementation that the 10th-gen systems can leverage out of the box as hardware acceleration. Microsoft's use of Super Resolution already differs from AMD's since MS's is hardware-based through DirectML, so I suspect AMD will be leveraging whatever Microsoft is able to do on that front to integrate into future RDNA GPU cards.
 
GDDR is a bit different of course; MS actually wanted 20 GB for Series X but costs forced their hand (IMO I think that was a bit of a mistake because it's not like they COULDN'T have afforded the extra 4 GB per system, but I digress...

Perhaps power, heat, and size was a major contributing factor to the decision.
It seems they REALLY wanted that oddball minitower shape.
I don't know if an additional 4GB would have necessitated a significant redesign, but the Series X does seem to be pushing out all the heat that it's able to. A good design, but not a lot of thermal headroom remaining.
 
Perhaps power, heat, and size was a major contributing factor to the decision.
It seems they REALLY wanted that oddball minitower shape.
I don't know if an additional 4GB would have necessitated a significant redesign, but the Series X does seem to be pushing out all the heat that it's able to. A good design, but not a lot of thermal headroom remaining.

Would be a bit odd for them (IMO) to focus so much on the minitower design if what they said about designing for function first was true; even if that extra 4 GB'd mean going with a modified aesthetic, with their design philosophy that should not have been a question worth debating. Otherwise, they were in fact letting the aesthetic drive the design in some part over pure tolerance for whatever the functionality required; that's a mistake they made big time with the 360. Incredible aesthetic but they were hellbent on that aesthetic even if the actual tech needed more room to breath.

That aside tho, I agree on the point it was the combination of power/heat/size, as well as costs, likely driving them to go with 16 GB instead of 20 GB. I'm just wondering if it'd really had been that much more cost-wise in the long-run for the extra 4 GB, slightly costlier cooling and maybe a slightly bigger console? Considering the insane profits Microsoft makes through Azure and their software surfaces, and the fact they were willing to buy companies like TikTok for $40 billion (IIRC), I say some of that partitioned TikTok money could've been used to cover the extra costs for another 4 GB and slightly more cooling. But that's just me.
 
for a single ram chip your talking about 3-4 watts, so an extra chip or two isnt going to substantially change the consoles cooling solution, worst case you drop the gpu frequency by however many tens of mhz you need to make up the power difference
 
for a single ram chip your talking about 3-4 watts, so an extra chip or two isnt going to substantially change the consoles cooling solution, worst case you drop the gpu frequency by however many tens of mhz you need to make up the power difference
I was thinking the same. Seems they decided to just cut costs in any way they could. And knowing that 16GB was more than enough they decided to design around that. A 384 bit bus was expensive so they decided to stick to 16GB of RAM and a 320 bit bus and designed around that. Thats what I think.
 
Why make Series X 20 Gb when S is 10? Devs need to optimize for S memory anyway.

Likely if SX was 20GB, SS would be more; like 12GB everything fullspeed or maybe 14 with different bus-width? A bit too early in the morning for me to entertain different capacity and width implications.
 
these non power of two amounts probably can mean fewer cost reduction opportunities in the future, no?
 
Back
Top