Hmm okay; well this would probably be one of the instances where they went with something to reinforce the Series X design as serving well for streaming of multiple One S instances. With that type of setup, probably better to have a Raster Unit to each Shader Array. How this might impact frontend performance I'm not sure, but I suppose one way would be, developers need to schedule workload deployment in better balances across each Shader Array to ensure they're being occupied.
It is not to say that the location of the Raster Unit is better or worse, but the change from RDNA1 to RDNA2 is better. And by better means RDNA2 Raster Units can scan convert triangles covering a range from 1 to 32 fragments, with coarse grained and fine grained rasterisation (rather than up to 16 fragments with 1 scan converter with RDNA1):
https://forum.beyond3d.com/posts/2176807/
So, XSX with RDNA1 Raster Units isn't as efficient with shading small triangles (CUs wasting fragment shading cycles by being less efficient). Add this inefficiency with XSX having RDNA1 CUs as well, then your raw Tera Flops are underutilised.
Perhaps lack of that consideration (likely due to lack of time) could be having some impact on 3P title performance on Series X devices, if say PS5 has its Raster Units set up more like what RDNA 2 seems to do on its frontend? Imagining that could be a thing. Also I saw
@BRiT the other day bringing up some of the issues still present with the June 2020 GDK update, it has me wondering if there are lack of efficiency/maturity in some of the tools that could assist with better scheduling of tasks along the four Raster Units if since indeed it might not only differ in frontend in that regard to PC but also potentially PS5?
If PS5 has its Raster Units setup like RDNA2, looking at Navi21 means instead of processing 4 triangles per cycle like XSX, it would drop to 2 triangles per cycle. Because Navi21 with 8 Shader Arrays only rasterises 4 triangles per cycle (halve to 4 Shader Arrays to get PS5). Lower triangle throughput, but higher efficiency of rasterising smaller triangles, with better utilisation from CUs and raw Tera Flops. Your geometry throughput is closer to your lower peak. This was briefly covered in Cerny's presentation about small triangles.
People mention tools, and they can have an impact on performance, but this applies to both XSX and PS5. Specifically optimised compilers. However, both XSX and PS5 aren't some clean slate designs, and are based on AMD technology. There isn't some multicore paradigm shift and new architecture that we got with Cell and PS3, so I'm not expecting massive improvements from compilers. Significant improvements will come from engines making use of XSXs and PS5s custom features and strengths.
Yeah I don't think there's any denying of that at this point. The question would be how does this all ultimately factor into performance on MS's systems? I know front-end on RDNA 1 was massively improved over the GCN stuff, but if it's assumed RDNA 2 frontend is yet further improved, it's that much additional improvement the Series systems miss out on. Maybe this also brings up a question of just what it requires to be "full RDNA 2", as well. Because I guess most folks would assume that'd mean everything WRT frontend, backend, supported features etc.
See above for frontend.
Full RDNA2 is a marketing term to differentiate a competitors offering. Also, if you're full and all encompassing RDNA2, you can't be fully customised at the same time. Both XSX and PS5 are custom designed, and they use different APIs. But their underlying capabilities are similar. Even Navi21 will build on previous architectures as it isn't a cleanslate design itself.
But ultimately if it's just having the means the support most or all of the features of the architecture, then a lot of that other stuff likely doesn't matter too much provided it is up to spec (I'd assume even with some of Series X's setup, it's probably improved over RDNA 1 for what RDNA 1 elements it still shares), and the silicon's there in the chip to support the features at a hardware level. So on the one hand one could say it's stating "full RDNA 2" on a few technicalities. On the other hand, it fundamentally supports all RDNA 2 features in some form hardware-wise, so it's still a valid designation.
See above for RDNA2. It's more a fulfilment of DX12 Ultimate. And stating something is "full" doesn't mean it's more performant. GPU manufacturers have a range of offerings from budget to enthusiast level. And over the years have advertised new offerings with DX 9, 10, 11, 12 etc... Yet some were under-performers compared to previous offerings by being compromised in some way for cost. It's the overall package that counts for both XSX and PS5, not some arbitrary components from various timelines being included.
This feature is also probably something featured in the Geometry Engine, so I'm curious where exactly in the pipeline it would fall. Sure it may be earlier in the pipeline than say VRS, but there's still some obvious stuff which has to be done before one can start partitioning parts of the framebuffer to varying resolution outputs. Geometry primitives, texturing, ray-tracing etc.
Maybe parts of it can be broken up along different parts of the pipeline, so it would be better to refer to it as a collection of techniques falling under the umbrella of their newer foveated rendering designation.
Geometry Engine being fully programmable will be important. Its culling capabilities also important if being compared to a TBDR-like architecture. I'm expecting coarse grained and fine grained culling, unlike a TBDR which would remove all occlusion before fragment shading.
There are plenty of VR patents dealing with various issues, so a collection of them is accurate. With foveated rendering, you have to deal with 2 viewports for each eye. And you get overlapped primitives across tiles which makes shading inefficient. So culling primitives efficiently is important so that you don't waste shading visible triangles later with CUs. VRS is about shader efficiency, and all this helps later down the pipeline.
That's a very different design to Zen 2 or Zen 3. Zen is L1 and L2 per core, and L3 per CCX (be it 4 or 8 cores).
Cache nomenclature can vary. The patent describes L1 and L2 can be the local caches. Which means L3 that I highlighted would be a Level 4 cache.
This would make the CPU have 8 cores, all with local L1 and L2 caches, L3 would be shared with 8 cores, making this more like Zen3. And L4 would be shared with other components like the GPU.
This patent is showing shared L2 per core cluster, and L3 that's not associated with a core cluster, but sits alone on the other side of some kind of bus or fabric or whatever. That's a pretty enormous change over Zen 2 / 3 and it would have major implications right up to the L1. And it's definitely not the same as having "Zen 3 shared L3" as stated in rumours. Very different!
Yes, the diagram shows that, but the patent mentions variations for the CPU as mentioned above. In addition, it mentions the GPU can also follow the same cache hierarchy and topology up to 4-tiers of cache.
The patent strikes me more as being the PS4 CPU arrangement with nobs on for the purpose of a patent.
This was one of my first thoughts. However, the PS4 arrangement doesn't make sense because:
- the patent is about backwards compatiblity and PS4 didn't have hardware BC
- PS4 launched 2013, and work for it would've been filed years before, circa 2010-2011
- patent was filed 2017, for an architecture not yet released for BC for PS4s predecessor from 2006, the PS3 (Cerny has better things to do)
- patent was granted 2019, with enough years development to launch for PS5