Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Jay · Nov 24, 2020

My only real disappointment with XSS is sometimes the lack of options.

I can think of a couple ways they could be surfaced and implemented.

The reason isn't only for XSS but if there's a mid gen or even next gen, that the 'S' version would be locked out of making use of the additional power like the XSS is on the 1S versions of the games.

Could get a mid gen RDNA3 6-8TF XSS, or next gen RDNA4 10TF XSS for example.

But these are launch games so be interesting how it progresses. COD is decent, just shame the other options aren't available.
Wonder what framerate would be like at 1080p without 60fps lock on VRR display.

liams · Nov 24, 2020

I agree it will be interesting
for another bold prediction,I don't think Microsoft will have another proper gen transition,
I think they will move to rolling mid gen upgrades, so every four years you get a mid gen upgrade, and at any given time the games will come out on the most current console, and the previous ones.

So in 4 years we get the series x+ and s+, games will come out for them and the current x/s, but when the x++ and s++ come out the current x/s models will get sunsetted.

I think games will use ray tracing on series s more going forward too, its probably just a bit too much to juggle as a first attempt in the console space for these launch games

thicc_gaf · Nov 24, 2020

liams said:
The other thing about the variable frequency that wouldn't be a problem for sony but would be for Microsoft is that, even though I'm sure it wont affect the games in anyway, and to a game, each PS5 is identical, the variable frequency of the CPU/GPU obviously produces variable amounts of compute - not a problem in games.

Microsoft, on the other hand, are going to be using these chips for non-gaming purposes in azure datacentres, they are going to be running any random virtual machine that someone spins up on them, so a set level of performance is incredibly important. Also, since the series x SOC is bigger than the PS5's SOC it will be better at transferring heat to the heat sink, because it has a larger surface area, this makes the small, passive heat sinks that are used in servers possible, where the airflow is provided by fans mounted onto the server case rather than the CPU cooler itself.

If you have seen a teardown of the series X I think its important to see how the components could very easily slot into a high density server, for instance the dual motherboard design, well just take the card with the SOC on it and plug it into a backplane, use the same heat sink that the series X uses, slap a bunch together, and you have an ultra dense blade server that has part commonality with you console that you were making millions of anyway. This dramatically reduces the build costs of the Xcloud servers.

I bet that the series X is the best bang for the GPU compute buck that Microsoft could possibly get for azure, with the added bonus of a decent CPU for running VM's at a lower power target. Also, there have been some rumblings about how Microsoft is producing substantially more silicon than sony is, by a lot. Since Microsoft evidently has less consoles available for purchase than sony they have to be going somewhere, so I think they want a full xcloud azure roll out by the middle of next year. That's my prediction

Sounds more like a quality analysis in my book :yep2:

boipucci said:
Continuing the Infinity Cache speculation... the latest DF face off (COD) results could perhaps be explained by IC?
PS5 locks 60fps 95% of the time in 4k/RT mode (dynamic 2160p-1800p) and it chugs in specific set pieces momentarily dropping to 45fps/50fps for a couple of seconds while the XSX mostly retains lock 60
On the 120 mode (1080p-1220p) things flip and PS5 holds a small performance advantage.

There are now two 4k modes where PS5 falls behind XSX: DMC5 & COD.
Using IC as a explanation for these results: At native 4k or close to there's more cache misses resulting in lower performance for PS5 when the misses are more abundant similar to what we are seeing
Though PS5 doesn't have these odd dips in the 4k mode RT disabled in COD, is possible the use of high resolution in combination with rt is too much for small pool IC incurring drops from the 16ms frame time when those cache misses occur something that optimization and tinkering may improve in the future

You might be right and you certainly make me question the possibility
Im torn depending of how you look at it seems very possible or very unlikely. I will say however that your argument also applies to Navi21 especially if they have to dedicate a sizeable portion of the die to IC they'll make sure to be as efficient as possible with the rest of the chip. It is in AMDs best interests to keep die sizes as small as possible irrespective of whether it is a SoC or a discrete GPU chip

Im speculating that PS5 GPU would be a match for Navi21chip halved whereas the XSX went with a different SE/SA layout to accommodate for more DCUs and also made some changes to the front end

Honestly if anything the 120 FPS stuff seen in Cold War footage seems it would suggest something CPU-related rather than GPU-related IMO. So maybe, MAYBE the rumors of unified L3$ on the CCXs' of the CPU could be true after all.

That said, it's not like we don't have games on Series X running 120 FPS with no issues; Gears 5 admittedly isn't a game natively built for the new consoles, but it looks at least as good as most of the launch games (and better than a few them in fact), even in the 120 FPS mode with no big issues. Or at least, nothing any more extreme than any slight drops we've seen in some of the PS5 games at 120 FPS.

So we can't even really say for sure if the performance issues in that regard on Series X are due to lack of unified L3$ on CPU, as that shouldn't theoretically cause any issues for retaining silky smooth 120 FPS for well-optimized games. And with that said, I'm inclined to think, at least for the time being, those performance issues are more due to lack of particular optimizations from some of these 3P games. It's up to MS to provide the support, and it's also up to 3P devs to familiarize themselves more with the new GDK tools. But how long will that take, is the question...

DSoup said:
I'm not sure I'm follow. Any change to game code - including a patch - is going to require a recompile. What's the difference if the recompile is targeting modes BC1-BC3 or full PS5 on PlayStation platforms? A bunch of PS4 games, including PSVR games like Blood and Truth, have already had PS4 patches that unlock higher performance on PS5.

If you want to address all of that hardware though, that's when you to re-target for PS5 because their simply aren't PS4 APIs for PS5 hardware, but I don't see how that's any more work. Bear in mind Mark Cerny said in his talk that devs can just ignore all the new stuff and the time to triangle for PS5 is ~1 month, down from 1-2 months for PS4.

Well maybe instead of recompile I probably should've just stuck to native port, the way the dev did

. They seemed to make an emphasis on more work being required to upgrade performance for their game on PS5 compared to Series X, but it's all relative to them I'm sure. "More work" could in actuality mean not too much at all, but I think their team size plays a factor into things.

Speaking more directly on PS5 though, I think it's very evident Cerny's design philosophy is flexing its prowess right now in some of the performance results we're seeing in most of the 3P multiplats. Whatever they did in order to simplify aspects of the design process, combined with keeping their tools mostly the same (thus keeping things very familiarized for 3P devs so transition is pretty straightforward for them) deserves its due.

I don't think that necessarily means MS's design is some monster to work with; once more of the GDK tools become familiar with 3P devs they'll start tapping a lot more into stuff like mesh shaders and SFS that should dramatically ease up dev process on the MS systems (as well as PC). But it's the fact the tools are apparently so new and thus unfamiliar to at least some of the 3P devs that's stinging ATM.

thicc_gaf · Nov 24, 2020

j^aws said:
I don't think Cerny cared for IC (a marketing term), rather he designed the PS5 for low latency and data throughput. The IO Complex is the heart of PS5. You don't need IC for it, although the design is heavily focused around managing cache.

Well that's true to a large extent, it's practically what the Cache Coherency Engines are for; pin-point eviction of stale data in the GPU caches. I'm still curious if it's demand scrubbing or patrol scrubbing; @3dilettante had a pretty detailed post in one of these threads that went into cache scrubbing and the benefits (and drawbacks).

Just going off the little I know I'd assume Sony are using demand scrubbing because patrol scrubbing waits on idle periods, but I don't see why you'd want to have intentional idle periods on a gaming device where you want components being busy with work at all times.

MS already made the transition from eSRAM from X1 to X1X and carried through to XSX. I think they were burned by taking up too much die space that replaced logic with SRAM. However, they messed up with what they had with eDRAM from the X360, which had the large chunk of eDRAM off the main GPU die, so logic space wasn't wasted. They could've done something similar with the X1 and have an off-die module, but probably didn't due to cost and latency.

Could they have maybe used PS-RAM instead? Dunno how good the bandwidth and latency is on top-end PS-RAM off-die chips, but I've seen nice enough sizes (32 MB, 64 MB etc.) go for relatively pretty cheap prices on bulk through wholesalers, and those are small bulks compared to what a console would be buying quantity-wise.

They needed more bandwidth for 52 CUs, and the traditional way is just a wider bus. And they would've needed to wait for AMDs IC as well, and the risks involved.

Yeah, if it ain't broke don't fix it. That said, it's just a bit odd they couldn't be asked to wait for AMD on IC, if they have already apparently waited on them for so many other things for full RDNA2 compatibility. Though that could maybe ask the question of what is "full" in the first place, but different conversation. The thing is they did kind of go wider with Series X but it's actually still narrower than One X (384-bit).

Much like CPU cores and scaling them up, you can look at Shader Arrays or Shader Engines in GPUs as analogous cores, and scale them up in server environments. I think MS took the simpler, lower risk approach.

This should also (thankfully, hopefully) also indicate that the frontend has been improved greatly for RDNA2, because that'd be an absolute requirement to make such a strategy even work (in addition to other improvements too of course).

IC is new and a risk compared to traditional wider buses, and as they were burned by eSRAM, I don't think MS wanted to take that risk for scalability.

Tech flip-flops back and forth. Until you run benchmarks and games, you won't really know. And something that was old becomes new again. GPU fashion catwalk.

Yeah, like deferred rendering. It's crazy that Dreamcast was one of the first (probably the first) gaming system with it at the hardware level, but it's not seemingly become a thing baked into consoles again until PS5 and Series, mainly because of AMD's work nowadays. At least, that's my understanding of it. Not that AMD have a literal deferred rendering feature, just something analogous to it.

Regarding eSRAM or an equivalent, personally I still think there may've been some room for MS to try for it and keep scalability in mind. Just settle on a fixed amount large enough to help get the job done, but small enough to not add too much to production costs. Ultimately though that's why they hired a team of engineers in the first place; even considering other use-cases for the design in other divisions I'm sure they crunched all the data and ultimately settled on what seemed like a best approach without blowing up the budget.

boipucci · Nov 24, 2020

thicc_gaf said:
Honestly if anything the 120 FPS stuff seen in Cold War footage seems it would suggest something CPU-related rather than GPU-related IMO. So maybe, MAYBE the rumors of unified L3$ on the CCXs' of the CPU could be true after all.

That said, it's not like we don't have games on Series X running 120 FPS with no issues; Gears 5 admittedly isn't a game natively built for the new consoles, but it looks at least as good as most of the launch games (and better than a few them in fact), even in the 120 FPS mode with no big issues. Or at least, nothing any more extreme than any slight drops we've seen in some of the PS5 games at 120 FPS.

So we can't even really say for sure if the performance issues in that regard on Series X are due to lack of unified L3$ on CPU, as that shouldn't theoretically cause any issues for retaining silky smooth 120 FPS for well-optimized games. And with that said, I'm inclined to think, at least for the time being, those performance issues are more due to lack of particular optimizations from some of these 3P games. It's up to MS to provide the support, and it's also up to 3P devs to familiarize themselves more with the new GDK tools. But how long will that take, is the question...

The first part of my comment was making the case to explain PS5 struggling more at ~4k coupled with rt to maintain <16ms frametimes as a result of cache misses from IC in specific scenes (however VGTech pointed out its a bug).
BTW I don't think XSX has a "problem" with 120fps though i do have a theory for the odd drops

Taking odd drops out these consoles are incredibly close in real world performance 2% to 8% margins

3dilettante · Nov 24, 2020

boipucci said:
Continuing the Infinity Cache speculation... the latest DF face off (COD) results could perhaps be explained by IC?
PS5 locks 60fps 95% of the time in 4k/RT mode (dynamic 2160p-1800p) and it chugs in specific set pieces momentarily dropping to 45fps/50fps for a couple of seconds while the XSX mostly retains lock 60
On the 120 mode (1080p-1220p) things flip and PS5 holds a small performance advantage.

There are now two 4k modes where PS5 falls behind XSX: DMC5 & COD.
Using IC as a explanation for these results: At native 4k or close to there's more cache misses resulting in lower performance for PS5 when the misses are more abundant similar to what we are seeing
Though PS5 doesn't have these odd dips in the 4k mode RT disabled in COD, is possible the use of high resolution in combination with rt is too much for small pool IC incurring drops from the 16ms frame time when those cache misses occur something that optimization and tinkering may improve in the future

You might be right and you certainly make me question the possibility
Im torn depending of how you look at it seems very possible or very unlikely. I will say however that your argument also applies to Navi21 especially if they have to dedicate a sizeable portion of the die to IC they'll make sure to be as efficient as possible with the rest of the chip. It is in AMDs best interests to keep die sizes as small as possible irrespective of whether it is a SoC or a discrete GPU chip

Im speculating that PS5 GPU would be a match for Navi21chip halved whereas the XSX went with a different SE/SA layout to accommodate for more DCUs and also made some changes to the front end

Dropping to 45-50 from 60 on a console with a CU throughput and bandwidth deficit of similar percentage could just mean that raw bandwidth and CU throughput make more of a difference in those instances. If it's a bandwidth limitation, it wouldn't be out of expectation for bandwidth-limited workloads without any additional cache. That might be an argument for the absence of any additional bandwidth. Even poorer hit rates would pull significant fractions of ~2TB/s of infinity cache bandwidth. Even cutting it down by 50-75% would give enough bandwidth to double the PS5's average bandwidth or at least exceed the Series X.

boipucci said:
Btw isn't L2 included with phy/mc in rdna?

It probably shouldn't be. It's on the wrong side of the data fabric, and it's on either side of the command processor section and between the shader engines with Navi.

thicc_gaf said:
Well that's true to a large extent, it's practically what the Cache Coherency Engines are for; pin-point eviction of stale data in the GPU caches. I'm still curious if it's demand scrubbing or patrol scrubbing; @3dilettante had a pretty detailed post in one of these threads that went into cache scrubbing and the benefits (and drawbacks).

Perhaps a GDC presentation or a dev slide deck will mention what data flows or processes are targeted by the cache scrubbers. Using virtual memory allocation to buffer new SSD data could avoid many flushes, and oversubscribing memory is part of the process for the Series X's SFS (allocates a wide virtual memory range, which demand traffic then dynamically populates).

Deleted member 11852 · Nov 24, 2020

liams said:
The other thing about the variable frequency that wouldn't be a problem for sony but would be for Microsoft is that, even though I'm sure it wont affect the games in anyway, and to a game, each PS5 is identical, the variable frequency of the CPU/GPU obviously produces variable amounts of compute - not a problem in games.

This shouldn't a problem for any application, the important thing is that performance is consistent across different deployments of the same hardware running the same code.

liams said:
Microsoft, on the other hand, are going to be using these chips for non-gaming purposes in azure datacentres, they are going to be running any random virtual machine that someone spins up on them, so a set level of performance is incredibly important. Also, since the series x SOC is bigger than the PS5's SOC it will be better at transferring heat to the heat sink, because it has a larger surface area, this makes the small, passive heat sinks that are used in servers possible, where the airflow is provided by fans mounted onto the server case rather than the CPU cooler itself.

One of the biggest cost challenge to running a server farm is energy consumption so clocking low when the code doesn't demand it, is critical. Intel, AMD and ARM server CPUs are all running viable clocks. I wouldn't assume that just because Xbox Series X|S do not that the APU cannot, it would be kind of weird if it couldn't.

liams · Nov 24, 2020

DSoup said:
This shouldn't a problem for any application, the important thing is that performance is consistent across different deployments of the same hardware running the same code.

I agree, it shouldn't be a problem, but if Microsoft used a smart shift style system that increases the performance of the GPU but hurt the CPU, you wouldn't have consistent performance. if for arguments sake you have a VM running on two CPU cores and training a AI model on the gpu, the chip would decrease the cpu performance to boost the gpu performance. This is fine for the VM using 2 cpu cores and all the gpu but all the other vms would be affected as well. I know that this change in performance is analogous to a cpu increasing frequency opportunistically, but that is a function almost purely of cooling performance. I would imagine that the servers used typically in azure servers stay at the same clock essentially all the time.

the xbox SOCs are also the only cpu with an iGPU that are intended for data centre use. could be that adding more variability was just a headache to juggle too far.

Allandor · Nov 24, 2020

liams said:
I agree, it shouldn't be a problem, but if Microsoft used a smart shift style system that increases the performance of the GPU but hurt the CPU, you wouldn't have consistent performance. if for arguments sake you have a VM running on two CPU cores and training a AI model on the gpu, the chip would decrease the cpu performance to boost the gpu performance. This is fine for the VM using 2 cpu cores and all the gpu but all the other vms would be affected as well. I know that this change in performance is analogous to a cpu increasing frequency opportunistically, but that is a function almost purely of cooling performance. I would imagine that the servers used typically in azure servers stay at the same clock essentially all the time.

the xbox SOCs are also the only cpu with an iGPU that are intended for data centre use. could be that adding more variability was just a headache to juggle too far.

Well, what they could do, is boost frequencies when the GPU (and the power budget of the GPU) allows it. But at the same time, this would not work very consistent as every SOC might reach other frequencies and some might even only reach the specified frequencies.
And I'm really not a fan of variable stuff. Yes it can help on some points but it makes things just less reliable/predictable and therefore overcomplicates an already complicated system.

Deleted member 11852 · Nov 24, 2020

liams said:
I agree, it shouldn't be a problem, but if Microsoft used a smart shift style system that increases the performance of the GPU but hurt the CPU, you wouldn't have consistent performance. if for arguments sake you have a VM running on two CPU cores and training a AI model on the gpu, the chip would decrease the cpu performance to boost the gpu performance.

The CPU in Series X doesn't seem suitable for this type of virtualised server workload - for the exact reasons you state but I presume that Microsoft intend to use the APU in other, more predictable, code environments where the it's running a set task. But it would still be valuable where the APU is part of a wider cluster of APUs all working on bits of a parallelizable problem, it doesn't really matter than if the CPU or GPU is demanding more power - that is kind of the point and advantage.

liams · Nov 24, 2020

DSoup said:
The CPU in Series X doesn't seem suitable for this type of virtualised server workload - for the exact reasons you state but I presume that Microsoft intend to use the APU in other, more predictable, code environments where the it's running a set task. But it would still be valuable where the APU is part of a wider cluster of APUs all working on bits of a parallelizable problem, it doesn't really matter than if the CPU or GPU is demanding more power - that is kind of the point and advantage.

True, maybe I'm overthinking the complexity of it.
Microsofts next big thing is cloud pc, which there have been some leaks about, and they literally have a working website for, link below, I wonder if that's their plan? run cloud pcs for remote office workers by day, running xbox games at night? that would really flatten out the demand curve for the xcloud blades.
Also when they mentioned that the series X soc would be running VM's I'm pretty sure they also mentioned it would be running serverless tasks like azure functions.

https://cloudpc.microsoft.com/#

see colon · Nov 24, 2020

boipucci said:
Continuing the Infinity Cache speculation... the latest DF face off (COD) results could perhaps be explained by IC?
PS5 locks 60fps 95% of the time in 4k/RT mode (dynamic 2160p-1800p) and it chugs in specific set pieces momentarily dropping to 45fps/50fps for a couple of seconds while the XSX mostly retains lock 60
On the 120 mode (1080p-1220p) things flip and PS5 holds a small performance advantage.

There are now two 4k modes where PS5 falls behind XSX: DMC5 & COD.
Using IC as a explanation for these results: At native 4k or close to there's more cache misses resulting in lower performance for PS5 when the misses are more abundant similar to what we are seeing
Though PS5 doesn't have these odd dips in the 4k mode RT disabled in COD, is possible the use of high resolution in combination with rt is too much for small pool IC incurring drops from the 16ms frame time when those cache misses occur something that optimization and tinkering may improve in the future

I would have assumed PS5's higher fill rate/front end performance would help out in the higher frame rate modes while XSX's higher compute would help it have better performance while ray tracing.

manux · Nov 24, 2020

see colon said:
I would have assumed PS5's higher fill rate/front end performance would help out in the higher frame rate modes while XSX's higher compute would help it have better performance while ray tracing.

Ray tracing memory accesses are tied into tmu's on amd hw. I have no idea what tmu count ps5 has versus xbox. Likely ray tracing especially on the harder diverging ray tracing cases is memory bound.

Deleted member 7537 · Nov 24, 2020

manux said:
Ray tracing memory accesses are tied into tmu's on amd hw. I have no idea what tmu count ps5 has versus xbox. Likely ray tracing especially on the harder diverging ray tracing cases is memory bound.

Because TMUs are part of each Dual CU, 50% more but running at a lower clock. PS5 should have an advantage in performance on anything that is not tied to CU count.

edit: RDNA2 CU

Quadbitnomial · Nov 24, 2020

Some information on Sampler Feedback and Mesh shader on RDNA 2. Other stuff has been posted on the channel also like VRS

boipucci · Nov 24, 2020

3dilettante said:
Dropping to 45-50 from 60 on a console with a CU throughput and bandwidth deficit of similar percentage could just mean that raw bandwidth and CU throughput make more of a difference in those instances. If it's a bandwidth limitation, it wouldn't be out of expectation for bandwidth-limited workloads without any additional cache. That might be an argument for the absence of any additional bandwidth. Even poorer hit rates would pull significant fractions of ~2TB/s of infinity cache bandwidth. Even cutting it down by 50-75% would give enough bandwidth to double the PS5's average bandwidth or at least exceed the Series X.

True but some of those drops (33%) go beyond even the best case raw bandwidth gap (25%) if we assume xsx only uses 10GB total to make the most optimistic use of 560GB/s. Performance gap could be even bigger considering XSX is capped at 60.
Having said that, its been confirmed those odd frame drops are due to a bug which can be "solved" by restarting from checkpoint (memory leak?) which makes those previously problematic scenes locked 60.

Could sporadic 1-5 fps drops at ~4k be explained by hypothetical cache misses? Keeping it mind if PS5 does have IC it would be ~32-50MB average bandwidth would be lower than 6800 1.4TB/s

3dilettante said:
It probably shouldn't be. It's on the wrong side of the data fabric, and it's on either side of the command processor section and between the shader engines with Navi.

What i meant to say is that it was grouped together for 5700 mc/phy area estimates i seen

see colon said:
I would have assumed PS5's higher fill rate/front end performance would help out in the higher frame rate modes while XSX's higher compute would help it have better performance while ray tracing.

I didn't mean 120fps in particular, just lower than native 4k

AbsoluteBeginner · Nov 24, 2020

There is no IC in PS5. I am pretty sure if it had it we would know by now. Happy to eat crow if in few months someone x rays it and there is 10% of chip dedicated to on die memory, but considering what we know about chip size (308mm²) and what Sony has confirmed chip packs, I would not hold my breath.

Performances can easily be explained by a well known factors such as full 16GB having 448GB/s, higher pixel fillrate, caches and front end scaling with clocks, tools and dev kits being with devs for almost 2 years etc.

There is 20% difference in TF, but PS5 has its own advantages for all things concerning clocks given that both have 2SE. Only question is how much going forward are we going to be tilted to one or the other, but difference should be neglible.

Globalisateur · Nov 24, 2020

AbsoluteBeginner said:
There is no IC in PS5. I am pretty sure if it had it we would know by now. Happy to eat crow if in few months someone x rays it and there is 10% of chip dedicated to on die memory, but considering what we know about chip size (308mm²) and what Sony has confirmed chip packs, I would not hold my breath.

Performances can easily be explained by a well known factors such as full 16GB having 448GB/s, higher pixel fillrate, caches and front end scaling with clocks etc.

There is 20% difference in TF, but PS5 has its own advantages for all things concerning clocks given that both have 2SE. Only question is how much going forward are we going to be tilted to one or the other, but difference shouls be neglible.

If PS5 has unified L3 cache (CPU) then the performance gap in some CPU heavy games could get bigger as developer will inevitably push more the CPUs. Similarly XSX could also start having some performance edge in more compute heavy games.

j^aws · Nov 24, 2020

PSman1700 said:
I think a 64 or 128mb infinity cache certainly would have helped even the PS5.

No doubt, IC is a bandwidth saving feature. It will eat up die space, and its cost effectiveness with existing cache management on PS5 isn't clearly obvious. For example, stacking two bandwidth saving features doesn't mean you'll get a linear cost effectiveness.

boipucci said:
This is the part where knowing PS5's & 6800/6900 i/o size would help determine how much if any amount of IC will be possible

You don't need to know that. I've shown you have a 290 sq mm die with no IC, and 15 sq mm free to fit in the aforementioned blocks. Also, you have not commented on any of my Multimedia logic queries as well.

We've discussed XSX SSD IO at 13 sq mm, and you've added an additional 5 sq mm to make 18 sq mm which you have incorporated into a 333 sq mm die. Removing 43 sq mm gives you 290 sq mm with zero IC.

You need to account for 15 sq mm before seeing how much IC too add. So, still missing is:

- SRAM for IO Complex
- Multimedia logic
- 2MB L2 cache
- Halved Command Processor, Geometry Processor and ACEs

After this, see if you have anything left. As I mentioned in my original post regarding die size, you don't have much wiggle room for a lot of SRAM. I know you are optimistic, but the numbers are in front of you - 15 sq mm still needs accounting for before IC.

boipucci said:
Using 5700 i/o (~37mm) as baseline accounts for ~18mm2 that can be used for ps5 io after we halve navi21

We've discussed this already. You've added 5 sq mm to 13 sq mm for XSX SSD IO, making 18 sq mm incorporated in your 333 sq mm hypothetical die.

boipucci said:
Btw isn't L2 included with phy/mc in rdna?

Well, you could ignore it, but I wouldn't be too sure. The Hotchips XSX die shot doesn't clearly show SRAM within the MC block. There is a higher resolution die shot that seems to show cache structures within the MC block.

thicc_gaf said:
Well that's true to a large extent, it's practically what the Cache Coherency Engines are for; pin-point eviction of stale data in the GPU caches. I'm still curious if it's demand scrubbing or patrol scrubbing; @3dilettante had a pretty detailed post in one of these threads that went into cache scrubbing and the benefits (and drawbacks).

I'm also curious if these also affect the CPUs cache, as Cerny's presentation only block highlighted GPU caches.

thicc_gaf said:
Just going off the little I know I'd assume Sony are using demand scrubbing because patrol scrubbing waits on idle periods, but I don't see why you'd want to have intentional idle periods on a gaming device where you want components being busy with work at all times.

You won't see 100% utilisation in a gaming device, so idle time is expected. PS5 does have variable clocks to adjust for any idling and conserve power if required.

thicc_gaf said:
Could they have maybe used PS-RAM instead? Dunno how good the bandwidth and latency is on top-end PS-RAM off-die chips, but I've seen nice enough sizes (32 MB, 64 MB etc.) go for relatively pretty cheap prices on bulk through wholesalers, and those are small bulks compared to what a console would be buying quantity-wise.

Gamecube used 1T-SRAM, so there are other options.

thicc_gaf said:
Yeah, if it ain't broke don't fix it. That said, it's just a bit odd they couldn't be asked to wait for AMD on IC, if they have already apparently waited on them for so many other things for full RDNA2 compatibility. Though that could maybe ask the question of what is "full" in the first place, but different conversation. The thing is they did kind of go wider with Series X but it's actually still narrower than One X (384-bit).

Cache is meant to be transparent in hardware, and 'full RDNA2' is just a marketing term. XSX is more a fulfilment of DX12 Ultimate.

thicc_gaf said:
This should also (thankfully, hopefully) also indicate that the frontend has been improved greatly for RDNA2, because that'd be an absolute requirement to make such a strategy even work (in addition to other improvements too of course).

I'm not convinced XSX frontend is RDNA2 specification. You can check the Rasteriser Units in the Hotchips block diagram - there are 4 and each Raster Unit is at the Shader Array level. Compare to Navi21, the Raster Units are at the Shader Engine level.

Also, check Navi21 Lite in the driver leaks (aka XSX GPU), and its driver entries show no change to SIMD waves (CUs) from RDNA1 and Scan Converters/ Packer arrangements from RDNA1 as well.

thicc_gaf said:
Yeah, like deferred rendering. It's crazy that Dreamcast was one of the first (probably the first) gaming system with it at the hardware level, but it's not seemingly become a thing baked into consoles again until PS5 and Series, mainly because of AMD's work nowadays. At least, that's my understanding of it. Not that AMD have a literal deferred rendering feature, just something analogous to it.

FYI, PowerVR existed on PC before Dreamcast. Smartphones also use/ used TBDR. And a handheld console, PS Vita also used TBDR and a PowerVR GPU.

Interestingly, Mark Cerny was involved with the Vita, and there are hints that PS5 has analogies. I haven't found anything concrete yet of a TBDR architecture.

boipucci · Nov 24, 2020

AbsoluteBeginner said:
There is no IC in PS5. I am pretty sure if it had it we would know by now. Happy to eat crow if in few months someone x rays it and there is 10% of chip dedicated to on die memory, but considering what we know about chip size (308mm²) and what Sony has confirmed chip packs, I would not hold my breath.

Performances can easily be explained by a well known factors such as full 16GB having 448GB/s, higher pixel fillrate, caches and front end scaling with clocks, tools and dev kits being with devs for almost 2 years etc.

There is 20% difference in TF, but PS5 has its own advantages for all things concerning clocks given that both have 2SE. Only question is how much going forward are we going to be tilted to one or the other, but difference should be neglible.

You could well be right, i guess im not closing the posibility until we get undisputable proof
Answer me this though (and this goes for you too @j^aws ): Let's say PS5 doesn't have IC why do you think they wouldn't invest an extra ~22mm2 for 32MB IC, perhaps too low a amount to make a difference in performance that justifies cost?

j^aws said:
You don't need to know that. I've shown you have a 290 sq mm die with no IC, and 15 sq mm free to fit in the aforementioned blocks.

Yes, we do because within that 260mm2 half navi21 IO is included, which would be 18mm2 if it remains the same size as 5700 but could be more (or less) depending upon navi21 actual io size
and again we don't know ps5 io block size to determine whether it needs <18mm2 or >18mm2 (got any guesstimates?)

j^aws said:
Also, you have not commented on any of my Multimedia logic queries as well.

Its unknown, could be off die like you said

j^aws said:
You need to account for 15 sq mm before seeing how much IC too add. So, still missing is:

- SRAM for IO Complex
- Multimedia logic
- 2MB L2 cache
- Halved Command Processor, Geometry Processor and ACEs

Don't know about SRAM to make an estimate, but i'd guess it's a tiny amount or maybe acting as Sonys take at ic?
I could be wrong but don't the Command Processor, Geometry Processor and ACEs scale with SEs? wouldn't they be using a beefier version with 4SEs

j^aws said:
Well, you could ignore it, but I wouldn't be too sure. The Hotchips XSX die shot doesn't clearly show SRAM within the MC block.

I mean for the ~30mm2 estimate using 5700 L2 is included, meaning its bigger than just mc/phy, the layout is different in RDNA2 true.
I hope you're not getting tired/annoyed with me i realize how frustrating these types of discussions can get and im probably wrong anyways, its a loop of mental gymnastics without being certain 100% or having hard facts

Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Jay

liams

thicc_gaf

thicc_gaf

boipucci

3dilettante

Deleted member 11852

Guest

liams

Allandor

Deleted member 11852

Guest

liams

see colon

All Ham & No Potatos

manux

Deleted member 7537

Guest

Quadbitnomial

boipucci

AbsoluteBeginner

Globalisateur

Globby

j^aws

boipucci

Similar threads