Next Generation Hardware Speculation with a Technical Spin [post E3 2019, pre GDC 2020] [XBSX, PS5]

Status
Not open for further replies.
No, that slide recaps all the features talked about publicly. Just as much noise was made about SSD, RT, and 3D Audio.

If that's the case they would have actually shown the effects like with the spiderman trailer in regards of their load time feature. We wouldn't be sitting here discussing what kind of RT solution Sony uses and if it's the same or different to AMD's if Sonly actually had said something concrete.

To me "PS5 has RT support" has the same quality as "PS5 has HDMI 2.1 support". A box full of surprises.
 
You deny the existence of 3D audio as well? That too wasn't demonstrated and only talked about. Only one aspect was showcased, which was the SSD loading speed. Everything else was only talked about. BC hasn't been showcased either. If you think Sony have only talked about SSD and BC, you've just missed some of the conversation. ;)
 
They haven't even shown or talked about the case. Are we sure they even have one? We know they have a logo because they showcased that.
Apparently only intelligent people can see if. If you haven't seen it...well...but I have. It's amazing looking. Really...er...good looking console, 'coz I'm smart.
 
The slide is from an AMD presentation:

https://www.tomshardware.com/news/a...noa-architecture-microarchitecture,40561.html

Btw, does the shared cache automatically mean 8-core CCX or are the other factors that make up the cluster?

Cheers, That seems like a official enough statement - Thanks!
I had seen that image before, but not as part of the rest of that article, so i was not aware it was official.

Whilst I'm not CPU arch expert, it would in *in theory* be possible to make a 2 x4core ccx arrangement, with a single shared L3,
it wouldn't make much sense. SO i'm pretty sure that a single shared L3, also means an 8 core ccx.

Honestly it ends up being a discussion about what makes up a ccx, and if you consider the L3 part of it or not.
imho going to an 8 core ccx arrangement makes a lot of sense, small increase in ccx complexity, although some of it in the hot path,
for at least the single thread case being improved by access to more RAM, and fewer levels of intercore distances,
less inter-ccx traffic on the IF bus too.
 
How can u not consider the L3? It’s the L3 that connects the CCX cores together. Or am I wrong?

I thought cores within the CCX pass data to each other through the L3 while data is passed to other CCXs using infinity fabric.

If 8 cores share a L3 and each with its own L3 slice, how would it not be an CCX?
 
How can u not consider the L3? It’s the L3 that connects the CCX cores together. Or am I wrong?

I thought cores within the CCX pass data to each other through the L3 while data is passed to other CCXs using infinity fabric.

If 8 cores share a L3 and each with its own L3 slice, how would it not be an CCX?

Sorry perhaps i wasn't clear enough.
I totally agree with everything you said, considering the L3 as part of the core makes sense.

I was just allowing for the possability of some crazy setup, unrelated to Zen, where by you might have a CCX like configuration but not determined by the L3.
EG. if the L3 was a pure victim cache, and maybe the memory controller sat between the L2, and L3 you could have a huge combined L3, which could in theory be faster than DRAM in case of a L3 hit,
but it would be pseduo independent of the ccx size, as multiple ccx's could possibly share a single L3 victim cache...

But i think we are getting waaaaay off topic here :)

A more useful / valuable discussion would be to theorize on the way-ness and latency of the new combined 32Mb L3,
currently the 16MB L2 is 16 way, ~~ 40cycle, what cost does moving to a 32MB L3 come at? 32 way, @ 50cycles? 50 cycles is a looong wait...

But agian, this is not related at all to Next-gen consoles... maybe i'll float it in the Zen3 thread
 
Sorry perhaps i wasn't clear enough.
I totally agree with everything you said, considering the L3 as part of the core makes sense.

I was just allowing for the possability of some crazy setup, unrelated to Zen, where by you might have a CCX like configuration but not determined by the L3.
EG. if the L3 was a pure victim cache, and maybe the memory controller sat between the L2, and L3 you could have a huge combined L3, which could in theory be faster than DRAM in case of a L3 hit,
but it would be pseduo independent of the ccx size, as multiple ccx's could possibly share a single L3 victim cache...

But i think we are getting waaaaay off topic here :)

A more useful / valuable discussion would be to theorize on the way-ness and latency of the new combined 32Mb L3,
currently the 16MB L2 is 16 way, ~~ 40cycle, what cost does moving to a 32MB L3 come at? 32 way, @ 50cycles? 50 cycles is a looong wait...

But agian, this is not related at all to Next-gen consoles... maybe i'll float it in the Zen3 thread

How much inter-CCX traffic is typically going on? Even when CCXs are on the same die the traffic between the two still has to routed through the IF and the IO die.

A single 8 core CCX on a die eliminates that round trip.
 
Last edited:
Consoles should at least have the benefit of having both CCXs, all IF traffic and the memory controller all on the same die.

That should, in theory, allow for reduced latencies and potentially a small gain in throughput.

I'm looking forward to seeing the 4xxx series APUs getting examined by the likes of Anandtech, especially if the consoles are going to have similarly reduced L3 caches.
 
Have we discussed the possible BOM of two different design approaches?

That is, "narrow and fast " vs. "wide and slow" if the performance is very close.


For example, given xbox has 60~64CUs with 56CUs enabled @ 1.67GHz (12TF), some analysts already predict $460~520 BOM

Since some reports suggest the cooling solution may be relatively cheap, if SONY chooses something like

48CUs with 44CUs active @ 1.96GHz (11TF) with better cooling solution, do we have any idea how much difference of BOM will be?
$20? or $50 ?

What will be the actual performance difference between a 11TF high-frequency GPU and 12TF lower-frequency GPU?
 
Have we discussed the possible BOM of two different design approaches?

That is, "narrow and fast " vs. "wide and slow" if the performance is very close.


For example, given xbox has 60~64CUs with 56CUs enabled @ 1.67GHz (12TF), some analysts already predict $460~520 BOM

Since some reports suggest the cooling solution may be relatively cheap, if SONY chooses something like

48CUs with 44CUs active @ 1.96GHz (11TF) with better cooling solution, do we have any idea how much difference of BOM will be?
$20? or $50 ?

What will be the actual performance difference between a 11TF high-frequency GPU and 12TF lower-frequency GPU?
Generally more clock speed is easier to scale, as the whole pipeline improves together. So whatever bottlenecks you had will continue, making mitigation easier to resolve.
With wide and slow, you really have to watch where new bottlenecks could show up.
 
Generally more clock speed is easier to scale, as the whole pipeline improves together. So whatever bottlenecks you had will continue, making mitigation easier to resolve.
With wide and slow, you really have to watch where new bottlenecks could show up.
Powerwise wide and slow should scale better though?
 
Powerwise wide and slow should scale better though?
However if cooling solution is really only several dollars (or relatively cheap) then power consumption may not be a major concern of BOM and the "narrow and fast " console may achieve lower BOM with roughly the same performance.
 
Consoles should at least have the benefit of having both CCXs, all IF traffic and the memory controller all on the same die.

That should, in theory, allow for reduced latencies and potentially a small gain in throughput.

I'm looking forward to seeing the 4xxx series APUs getting examined by the likes of Anandtech, especially if the consoles are going to have similarly reduced L3 caches.

ND once described the access latency across Jag L2 clusters on the PS4 as almost as bad as accessing DRAM. A single 8 core CCX may be the better option regardless of the increase in L3 latency.
 
Status
Not open for further replies.
Back
Top