But bandwidth will be the bottleneck here in which PS5 has less. So raster performance will likely go to XSX in this case as it has more.AFAIK ROP count is the same (64) on both consoles. SeriesX's runs at 1.83GHz, PS5's runs at 2.23GHz.
But bandwidth will be the bottleneck here in which PS5 has less. So raster performance will likely go to XSX in this case as it has more.AFAIK ROP count is the same (64) on both consoles. SeriesX's runs at 1.83GHz, PS5's runs at 2.23GHz.
It's more about the absolute speed at which a player can turn around. So it would load a fraction every frame while turning. The maximum bandwidth case is a full 180 at the max turning speed. If you turning around takes a quarter of a second, the frame rate doesn't matter, you can load 1.3GB raw, to 5.5GB in memory at full compression within a 1/4 sec. Kraken have lossy modes for images for 4:1? I wasn't familiar with that format until today...Is it really fast enough to do that though? You're looking at around 352 MB/frame at 60Hz.
Edit: And that's assuming that all of the data you need is stored with the absolute maximum compression ration. And that's a full 16ms read, so it requires at least one frame of buffering just to hide the read.
Cerny did say in the presentation that if it takes a player 0.5 second to spin the camera behind them that's enough time to load in ~4Gb worth of textures from the SSD, if that's true it's insane........developers won't have to have corridors, use other tricks or level designs to hide streaming any more.....it'll give so much more freedom.
AFAIK ROP count is the same (64) on both consoles. SeriesX's runs at 1.83GHz, PS5's runs at 2.23GHz.
But bandwidth will be the bottleneck here in which PS5 has less. So raster performance will likely go to XSX in this case as it has more.
Depends on the performance hit from moving between memory pools on XSX..... if you exceed that 10Gb and need to use the other 'slower' RAM you could end up in trouble.
There was a reason Sony completely ditched split memory pools after PS3.
What is faster in the back end? Don't almost everything that matter (ROPS, TMU's) scale with CU's?
PS: we have (max)40CUs
You can have multiple arrays per engine, up to 16 rops per array (per rasterizer). The likely configuration is 2 shader engines, 2 shader arrays each, 7 WGPs per array, so total 28 WGPs for 56CUs, with one WGP disabled per SE to get 26 enabled WGPs or 52 CUs, therefore 16*2*2 ROPs.According to RDNA whitepaper each "shader engine" has 64-bit memory interface.
So, to reach XSeX 320bit we need 5 shader engines and not 4 (36CUs PS5 256bit memory).
Which gets us to 80ROPs.
But it's all speculation.
Developers won't split pools for rasterization. That's like purposefully gimping your performance.Depends on the performance hit from moving between memory pools on XSX..... if you exceed that 10Gb and need to use the other 'slower' RAM you could end up in trouble.
There was a reason Sony completely ditched split memory pools after PS3.
Hm? I don't see such thing in it? https://www.amd.com/system/files/documents/rdna-whitepaper.pdfAccording to RDNA whitepaper each "shader engine" has 64-bit memory interface.
So, to reach XSeX 320bit we need 5 shader engines and not 4 (36CUs PS5 256bit memory).
Which gets us to 80ROPs.
But it's all speculation.
The L2 cache is shared across the whole chip and physically partitioned into multiple slices. Four slices of the L2 cache are associated with each 64-bit memory controller to absorb and reduce traffic. The cache is 16-way set-associative and has been enhanced with larger 128-byte cache lines to match the typical wave32 memory request. The slices are flexible and can be configured with 64KB-512KB, depending on the particular product. In the RX 5700 XT, each slice is 256KB and the total capacity is 4MB.
It's a unified pool on an imbalanced bus. To maximize the bandwidth usage it needs all channels to get an equal amount of requests over a time slice, to keep the queue in a healthy range. The simple no-brainer method is to spread the address space equally across all chips, requiring identical size chips. The 10GB is that way. By having an additional 6GB only on some channels, it causes accesses at that lower speed to stall other request as if the entire bus is effectively at the lower speed. If it's used very lightly, it won't have much impact, if at all. But if there's a lot of throughput on this partition, it will cause the average B/W to go down significantly.The performance hit will just be the decreased bandwidth i think, its not really split memory pools just a slower pool.
Developers won't split pools for rasterization. That's like purposefully gimping your performance.
Bandwidth was a bottleneck on the PS4 Pro's 64 ROPs. That was using Polaris ROPs. I don't know if an equivalent bandwidth per pixel fillrate is still a bottleneck for RDNA2.But bandwidth will be the bottleneck here in which PS5 has less. So raster performance will likely go to XSX in this case as it has more.
According to RDNA whitepaper each "shader engine" has 64-bit memory interface.
So, to reach XSeX 320bit we need 5 shader engines and not 4 (36CUs PS5 256bit memory).
Which gets us to 80ROPs.
But it's all speculation.
But less bandwidth per compute throughput, which is a more important metric AFAIK.PS5 has more bandwidth per ALU
The first 10GB is a lot of space just for the framebuffers and textures to begin with. Given multiplatform, it's probably unlikely that they'll push it too hard for the majority since not everyone has >8GB GPUs, so it'll be for the rather wasteful (uncompressed).If they'll want to keep pace with PS5 they will have too, PS5 may have less bandwidth but it has more bandwidth per CU as well as having a single memory pool with a much faster I/O to swap assets in and out of said memory.
You can have multiple arrays per engine, up to 16 rops per array (per rasterizer). The likely configuration is 2 shader engines, 2 shader arrays each, 7 WGPs per array, so total 28 WGPs for 56CUs, with one WGP disabled per SE to get 26 enabled WGPs or 52 CUs, therefore 16*2*2 ROPs.
PS5 is similarly using the Navi 10 configuration of 5 WGP per array, then 18WGP/36CU, 2 shader engines, 4 shader arrays, so 2*2*16 ROPs.
Fillrates might end up being a wash as despite Anaconda having higher blend rate bandwidth (Read + Write), the higher core clock on PS5 will tend to show higher internal bandwidths and that may come in handy with Delta Compression throughput, although who knows how depending on average compression there.
CUs can pull from cache as well, so I'm not sure if that's how you want to do it.If they'll want to keep pace with PS5 they will have too, PS5 may have less bandwidth but it has more bandwidth per CU as well as having a single memory pool with a much faster I/O to swap assets in and out of said memory.
I think fill rate calculations should be the same though right? (I'm not sure how to account for compression to be honest)Bandwidth was a bottleneck on the PS4 Pro's 64 ROPs. That was using Polaris ROPs. I don't know if an equivalent bandwidth per pixel fillrate is still a bottleneck for RDNA2.
The more render targets, this greater the difference in performance will be. Luckily the difference in bandwidth isn't that large.DOOM 2016 "cleverly re-uses old data computed in the previous frames...1331 draw calls, 132 textures and 50 render targets," according to a new article which takes a very detailed look at the process of rendering one 16-millisecond frame. An anonymous Slashdot reader writes:
The normal map is stored in a R16G16 float format. The specular map is in R8G8B8A8, the alpha channel contains the smoothness factor.
So DOOM actually cleverly mixes forward and deferred with a hybrid approach. These extra G-Buffers will come in handy when performing additional effects like reflections.
It's 36CU
It's 40CUs (4disabled for yields) = 36