Yeah, I think the simplest way (customisation work, silicon cost) is to add their SFS sampling mode to those already supported by the Texture Mapping Units. All the data they need (textures, tile map, residency map) are stored in texture format. I think the tile and residency maps will be stored in "game memory" as the game needs to be able to see them, and they might be useful for developers in some other way if they have access to them. (In the past I've suggested using the SFS data to decide on model / geometry LOD, as it could indirectly tell you about needed level of detail in dynamic res games).
Ah, this clears up a ton! So the way described here could suggest this is one of the other customizations MS might've done for the GPU, although there's the chance that AMD adopt this across the RDNA 2 PC GPUs, while other parts like the mip-blending hardware are exclusive to the Series systems, at least going by what Xbox engineers were saying on Twitter.
Yeah, I used to really like Sega. It's hard for people to understand, I think, what it was like to be gaming on the MD and SNES and then walk into an arcade and BAM see Daytona USA running on a sit down cab with something like a 50 inch screen and a great audio system turned up high. It was mindblowing! It's a long, long time since anything has completely redefined what games are for me. Those types of jumps don't exist these days (although the move to something like XVA / DirectStorage is still most welcome!).
I actually remember doing exactly that xD! Had a MegaDrive (Genesis technically over here), and would often go to the arcade on the naval base where they just had so many amazing games with that experience you simply couldn't get at home, even with the then-next gen consoles of PS1/Saturn/N64 (aside from the 3D, which the arcade did still better). Makes me sad that arcades fell off; I do think they could still serve a big role of driving the market forward and provide the kind of full-fat experience you can't really get in the home unless you have a REALLY good (and expensive) setup, which most gamers don't.
Like, people are already struggling to make sure they have TVs 4K/120 compatible for the Series X and PS5, and that's just one component of several in a good gaming setup. Something that were an arcade equivalent would have a much better convenience factor, then add in the real estate problem when it comes to floorspace, that's another area arcades still have the advantage in because not all gamers have large houses or rooms with large amounts of floorspace.
Having those sort of things could easily enable for some new mindblowing tech, like full VR/AR/location-based gameplay designs that wouldn't be easy to translate home but...not enough visionaries in the gaming field looking at arcade as a blue ocean market I guess :S.
If the industry can support a high end console with a high enough BOM, I suppose a HBM derivative isn't off the cards. As you say, BW, speed, power and footprint are all in its favour. Generally smaller BW's come from being further away (in both computer and physical terms) from the processor, and HBM sits on an interposer with a stupidly wide bus built on silicon. Something off package like GDDR6 is unlikely to match something on an interposer like HBM.
Where traditional memory types hold up is that with enough cache, and a wide enough bus and enough space for the dram on the board, you can get a better compromise for the cost and risk. For example, Radeon 7 was HBM, but Radeon 6900 is GDDR6 with a phat sram cache. Will this new approach hold up? Hopefully.
Yeah, I think AMD are onto something with IC. If it works out well (even if it has issues in RDNA 2, they can refine it for RDNA 3), that could draw others like Nvidia to develop equivalents. They can refine to work even smarter at smaller capacities, or get results with "slower" caches at the L3$ in later generations comparable to what they could be getting in the first generations. That might open up the chance of seeing it in future gaming consoles as well.
I think Nvidia's shown through GDDR6X that there's still room for GDDR to grow, though there's no way to tell where the ceiling is at this time. Whatever the ceiling is, I think they'll probably hit it by the end of the decade.
I mean, there are all kinds of whizz-kid silicon guys profiling this stuff and trying to predict the best set of tradeoffs to head towards. GDDR stacking might work, but there are inherently problems associate with running at high frequencies, using lots of power, stacking, and cooling. HBM works well with stack because it uses lower clocks and less power, but it makes up with a tremendously phat bus that can only work on an interposer. When your memory has to go across a board and up through pins / solder bumps under a chip package you're going to struggle with bus widths beyond a certain point.
Speaking of HBM, that could run into its own issues in the future. I've been reading a lot into FGDRAM (Fine-Grained Dynamic Random Access Memory), some great research and thesis papers exploring it and highlighting limitations in HBM technologies. If GDDR really does hit a wall sooner rather than later, HBM could be a great substitute provided costs go down, but for server and big data markets unless there's radical changes in HBM architecture, something like FGDRAM might have to step up to keep pushing larger bandwidths and wider buses, with better granularity levels.
Question is who's the first to develop a successful implementation of it? Or it could just end up that the ideas surrounding FGDRAM get brought forward into a future HBM spec, that's also possible. It'd be neat if yourself and some of the other super-technical posters around here had some thoughts on things like FGDRAM, if taken a look into. I'll have to re-read the papers at a future date, for sure.
Yeah, the success (or failure) of RDNA2 may give us some insight into the future. High end GPUs are much higher margin parts than consoles, so that probably gives them more area to play with on the silicon. Then again, the PHYs for external dram take up die area too, and as transistors get smaller that accounts for more and more cache that could be put on chip instead. But yeah, if GDDR hits a wall, and die area remains expensive with little cost reduction, eventually hardware vendors will be driven towards something more radical.
Maybe faster SSDs and persistent memory can save us.
I think you're on the money with SSDs, can't see them going away. Costs should continue to scale down, if there's any use of persistent memory it'll probably be as a large-ish (16 GB - 32 GB) byte-addressable cache replacement for DRAM or SRAM on the flash controller's side. Hopefully with lower latencies than 1st-generation Optane DC Persistent Memory which doesn't have terrible latencies in and of itself, I just think you'd probably want lower amounts to supplement it as a large cache in this instance.
XVA seems to comprise of a fast(ish) SSD, DirectStorage using virtual memory addresses, SFS, and a custom decompression block which I expect will need games to be compiled/built in a certain way to use.
Soooo..... other than using an SSD and having a very fast CPU, I'd agree with you that BC likely won't take advantage of most of XVA unless specifically patched to do so.
I think it will be a similar case for PS5 BC too. So I think BC games will mostly be limited (or boosted if you want to look at it that way!) by the CPU.
We're actually seeing some instances of BC titles running better on PS5 vs Series X, but that seems to do more with the fact PS5 would have the PS4 version to work with, while Series X has the XBO version (generally lower framerate and/or resolution vs. PS4 ver) or One X (generally higher native resolution vs. even PS4 Pro, but sometimes worst framerate as result) to work with.
One take I'm not agreeing with that some are trying to go for, though, is that variable frequency is affecting the SSD performance on PS5's side and that somehow is making for the (generally) longer load times for BC games there vs. Series X. That take doesn't make a lot of sense to me; it's already been well described variable frequency is a CPU/GPU thing, nothing in relation to the SSD or in PS5's case, the SSD I/O hardware block. AMD's own version of variable frequency they showed off at the RDNA 2 event is also CPU/GPU related.
Granted, BC games aren't stressing the SSDs in either PS5 or Series X, but it's just weird to see people rationalizing BC load times down to SSD being affected by variable frequency and Smartshift, or that variable frequency as-is would be affecting load times of BC games on PS5 because I don't think any of these unoptimized BC titles are necessarily pushing the GPU to its limits if much at all. Half the GPU is disabled anyway for BC on PS5 (IIRC), so there's almost virtually no way the GPU would have workloads stressing it to require power from the CPU's power budget (and therefore lower CPU profile performance which actually would affect BC).
That's also accounting for the fact PS5's CPU doesn't have a non-SMT mode (not that it'd be needed for BC; its clock is still much faster than PS4 or Pro's CPUs, though maybe the additional MHz overhead for Series X running BC games in non-SMT mode does help some with loading times there for non-optimized BC games?).