Playstation 5 [PS5] [Release November 12 2020]

In other news:
I'm not really surprised Sony is crushing it, they have a ton of momentum from the PS4. I do find it curious how they come up with these numbers though. Did Sony have 80% more PS5's available for pre-order? I know XBSX/XBSS is completely sold out everywhere as well because I've been trying to order one on a daily basis. Is this based upon polling?
I'm fairly positive that Sony is selling pre-orders well beyond launch date and I believe MS is trying to reserve as much as possible for launch day.

So I don't know if you can buy a PS5 on launch day if you don't have a pre-order. But you can definitely have a chance for Xbox Series. I was not guaranteed a launch day order for PS5, and I'm second block. I can only assume any orders taken after mine are progressively less likely to make launch day.
 
I'm fairly positive that Sony is selling pre-orders well beyond launch date and I believe MS is trying to reserve as much as possible for launch day.

So I don't know if you can buy a PS5 on launch day if you don't have a pre-order. But you can definitely have a chance for Xbox Series. I was not guaranteed a launch day order, and I'm second block.
Oh, OK that makes sense. Thanks for the info.
 
Copper has a higher thermal conductivity than Aluminum, but Aluminum has a higher heat capacity than Copper.
I meant higher than steal.
Forgive me, but I don't understand this obsession with "17%". Even using (as presented above) a fixed 10.3 TF for PS5, and saying MS were lying about 12.155 TF and that it's actually 12.147, XSX is still 18% faster in pure compute.

Anyway, while I think you're right about CPU BW requirements being broadly similar for both consoles playing the same games, things get a bit more complicated in that the same CPU traffic is proportionally a greater consumer of BW on PS5, and that XSX has 25% more memory channels to schedule across and (probably) 25% more L2 GPU to potentially reduce pressure on the memory bus. Perhaps this won't amount to much in reality though.

It all seems rather complicated. Some of the shit-tier internet rumour mongers have claimed Sony invented the idea of larger caches and that they've used them in PS5, and that AMD didn't realise this worked and that they will copy this for RDNA3. That seems unlikely, but I guess there's still the possibility of some Sony Secret Sauce (SSS).
Don't forget the specific memory configuration on XSX. When the CPU is used (using the slower pool of memory) then it will reduce the total available bandwidth, on top of the regular memory contention. And actually XSX has less L2 cache by CU which means overall there will be more L2 misses on XSX (good thing it has more bandwidth).

In the end, only the games will show us the outcome of those constraints.
 
And actually XSX has less L2 cache by CU which means overall there will be more L2 misses on XSX (good thing it has more bandwidth).
I think you mean L1 since that's shared in the number of CUs per shader array.
L2 is matched to bus width on RDNA. So there's going to be much more L2 available. L2 is 5MB on XSX.
 
For those of you wondering how effective AVX instructions can be - on a particular test on my workstation here with 2xP100s
This isn't efficient btw, but I'm expecting the compiler to not be complete shit. Neither am I'm writing pure CUDA, I did not specify blocks etc. Which I'm not sure what the compiler is doing. I should be messing with blocks and threads, but I'll do that later.

Code:
@jit(nopython=True)
def sqdiff(x, y):
    out = np.empty_like(x)
    for i in range(x.shape[0]):
        out[i] = np.cos((x[i] - y[i])**2)
    return out
def sqdiff_n(x, y):
    out = np.empty_like(x)
    for i in range(x.shape[0]):
        out[i] = np.cos((x[i] - y[i])**2)
    return out
@cuda.jit
def sqdiff_c(x, y, out):
    i = cuda.grid(1)
    if i< x.size:
        res = (x[i] - y[i])**2
        out[i] = math.cos(res)
So I'm just going subtract x from y and square it all then cosine it all.
I make it do 2 runs
a 32bit list
and a 64 bit list

sqdiff_c.forall(x32.shape[0])(x32,y32, out)
sqdiff_c.forall(x64.shape[0])(x64,y64, out2)

The total time it takes to run both is below is in seconds:
When the array is only 150 long
---SIMD----
0.3253677561879158
---NORM ST----
0.0009309761226177216
----CUDA----
0.3576290234923363

@150000
---SIMD----
0.27464934438467026
---NORM ST----
0.871973555535078
----CUDA----
0.2349286787211895

@ 1500000
---SIMD----
0.24070119112730026
---NORM ST----
8.221116621047258
----CUDA----
0.29617665335536003

@150000000
---SIMD----
4.314507059752941
---NORM ST----
----CUDA----
3.303687706589699

Normal single threaded would take 800+ seconds, and crash the workstation, so I didn't bother.
You can see SIMD AVX load is doing pretty good here against CUDA. I may go back to revise this, as I suspect it's not using the hardware properly on CUDA. But you can see how much AVX is dramatically better than single core math once you scale up.
 
I think you mean L1 since that's shared in the number of CUs per shader array.
L2 is matched to bus width on RDNA. So there's going to be much more L2 available. L2 is 5MB on XSX.
It doesn't work like that. XSX has 44% more CUs. So (with 4MB L2 cache) PS5 will have about 16% more L2 by CUs (because those caches are clocked 22% higher). This is mostly what will determine the cache pressure and the L2 miss. More L2 miss will mean more pressure on GDDR6 bandwidth.
 
It doesn't work like that. XSX has 44% more CUs. So (with 4MB L2 cache) PS5 will have about 16% more L2 by CUs (because those caches are clocked 22% higher). This is mostly what will determine the cache pressure and the L2 miss. More L2 miss will mean more pressure on GDDR6 bandwidth.
Cache size is more important than the clockspeed of those caches because the cost to go to GDDR6 is much larger than the clockspeed differences.
I would say having a larger L2 is going to be more critical to feeding CUs. The 25% increase in cache size directly follows to the increase in CUs at 22%. Clockspeed shouldn't pay a factor here.

edit sorry 44%, more L2 would have been ideal. But once again, more is more. RDNA writes to L2, so anytime you don't need to travel out to system memory, the CUs can stay active.
 
Cache size is more important than the clockspeed of those caches because the cost to go to GDDR6 is much larger than the clockspeed differences.
I would say having a larger L2 is going to be more critical to feeding CUs. The 25% increase in cache size directly follows to the increase in CUs at 22%. Clockspeed shouldn't pay a factor here.
You are not paying attention here. Each CUs need to access to the L1 and L2 cache to work. On PS5, each CUs have about 0.111 MB of L2 cache. On XSX each CUs have about 0.096 MB of L2 to work. So if you feed those CUs with the same amount of things to do (ideally), XSX CUs will have less L2 cache so there will be more L2 miss, hence more accesses to the GDDR6 memory to do the same work.

Again, XSX has 44% more CUs and only 25% more L2.
 
Worth keeping in mind that the ROPs also go through the L2 now.

The implications might be more for latency (going through stages of cache) and power consumption, although there may be more threads in flight to hide latency with more CUs working.
 
I'm fairly positive that Sony is selling pre-orders well beyond launch date and I believe MS is trying to reserve as much as possible for launch day.

So I don't know if you can buy a PS5 on launch day if you don't have a pre-order. But you can definitely have a chance for Xbox Series. I was not guaranteed a launch day order for PS5, and I'm second block. I can only assume any orders taken after mine are progressively less likely to make launch day.

Anecdotally it would have been easy for me to get preorder in for xbox. For ps5 I have been unable to get preorder in despite trying hard. I don't need xbox as I want to have pc+sony console combo.
 
You are not paying attention here. Each CUs need to access to the L1 and L2 cache to work. On PS5, each CUs have about 0.111 MB of L2 cache. On XSX each CUs have about 0.096 MB of L2 to work. So if you feed those CUs with the same amount of things to do (ideally), XSX CUs will have less L2 cache so there will be more L2 miss, hence more accesses to the GDDR6 memory to work.
There are significantly more CUs operating simultaneously as a result however.
You're still looking at direct comparisons between 2 systems. Which is probably why you're hung up on the numbers there.
You should be looking at how much work they need to process, of which the architectures are built around.

You can easily showcase a cpu processor per instruction, per branch, per core, per l1, per l2, per system memory to be significantly better than any GPU.
But it stands to point that there are 32 at most versus _thousands_ on the GPU.

The GPU has a significant overhead to get started, but once it's gets started it's ability to consume massive amounts of work as a result of having a massive amount of processors is what allows it to bolt ahead.

We can point at individual metrics all day long, but at the end of the day there are 44% more processors in the XSX. Hyper tuning hardware helps, it helps that PS5 is fairly general purpose, flexible and adaptable because of it's clock speed setup. But at a certain scale of load, having more compute is going to matter more than all the hyper tuning you can do.

Smaller loads will definitely benefit more the higher performance processors. That's not being debated. But each of these metrics, conveniently leaves out how many more processors are working at the same time.

tldr; 2080 probably has similar more ideal characteristics with cache and memory bandwidth vs the 3080. And in some workloads yes, the 3080 is poorly leveraged against the 2080. But in others, which the workloads we are heading into, it's 2x the performance.

If cache was the greatest bottleneck to getting more compute we would have made caches the focus. But each generation, the compute gets larger and wider, the caches increase to support it and the clockspeeds generally improve only slightly.

I don't mind looking at these things, but I don't want to overattribute something in which every architecture will suffer from. Its not like PS5 solved cache hit/miss problems by clocking high and keeping processor counts low. We need more compute because we need more calculations going into next generation.
 
Last edited:
Talking about caches etc we don't actually know how much cache is in the PS5 and remember its got those GPU cache scrubbers so I imagine those help with available cache. For all we know ps5 could be utilising that leaked AMD infinity cache.
 
My understanding of the cache scrubbers was to avoid a complete flushing of caches.

"Coherency comes up in a lot of places, probably the biggest coherency issue is stale data in the GPU caches," explains Cerny in his presentation. "Flushing all the GPU caches whenever the SSD is read is an unattractive option - it could really hurt the GPU performance - so we've implemented a gentler way of doing things, where the coherency engines inform the GPU of the overwritten address ranges and custom scrubbers in several dozen GPU caches do pinpoint evictions of just those address ranges."
 
My understanding of the cache scrubbers was to avoid a complete flushing of caches.

Which in turn would result in more effiecent usage of the caches. So even if it has less available cache, the more effiecnt usage of the caches would give you more cache to use at one time no? Or am I missing something here.
 
Which in turn would result in more effiecent usage of the caches. So even if it has less available cache, the more effiecnt usage of the caches would give you more cache to use at one time no? Or am I missing something here.

You can can get same effect by using cpu to invalidate cache lines. Using cpu would lead to using expensive cpu resource for something that could have been don more optimally. This really only matters if there is a ton of data being moved around all the time. If it's just a level load to ram and off you go cpu would be fine choice to do invalidating cache lines.
 
I'm fairly positive that Sony is selling pre-orders well beyond launch date and I believe MS is trying to reserve as much as possible for launch day.
Why is this an either/or statement? Why wouldn't Sony be trying to reserve as much PS5 consoles as possible for launch day?
I don't think Sony is in the business of keeping warehouses filled with brand new PS5 consoles just because.


Again, XSX has 44% more CUs and only 25% more L2.
Do we have confirmed L2 amounts for the PS5?

Which in turn would result in more effiecent usage of the caches. So even if it has less available cache, the more effiecnt usage of the caches would give you more cache to use at one time no? Or am I missing something here.
Cache scrubbers should be a means to require fewer movements to the GDDR6 by leaving more cache available, resulting in fewer cache misses.
Whether or not these are effective is something we don't know. We do know AMD decided to not use it for RDNA2 PC graphics cards.
 
Back
Top