Is PS4 hampered by its memory system?

Arwin · Sep 26, 2013

I think by giving CPUs a fixed budget, no matter what solution is picked, it should be fairly easy to avoid contention ...

Strange · Sep 26, 2013

oldschoolnerd said:
Bear in mind the cpu is far more sensitive to latency than the GPU, so any contention is going to have a bigger impact. It's the effect on the CPU that I'm interested in.

I remember there has been a discussion here about DDR3 vs GDDR5 and the general consensus and data reveals that there is no significant difference of latencies. IIRC GDDR5 has relatively higher clock latency but GDDR5 also runs at a much higher clocks, in the 5.5~7Ghz range while current DDR3s run in the 1.6~2.3 Ghz range. If you factor in the higher clock latencies with the actual clocks themselves together they cancel each other out. If you measure the time itself I think they are generally the same.

Here's the info.

http://www.hynix.com/datasheet/pdf/dram/H5TQ1G4(8_6)3AFP(Rev0.1).pdf
http://www.hynix.com/datasheet/pdf/graphics/H5GQ1H24AFR(Rev1.0).pdf

Examples here are
GDDR5 timings as provided by Hynix datasheet:
CAS = 10.6ns

tRCD = 12ns

tRP = 12ns

tRAS = 28 ns
tRC = 40ns

DDR3 timings for Corsair 2133@11-11-11-28

CAS = 10.3ns

tRCD = 10.3ns

tRP = 10.3ns

tRAS = 26.2ns

tRC = 36.5ns

So latency wise you probably shouldn't see a disadvantage of GDDR5 as they're in the same ballpark.

oldschoolnerd · Sep 26, 2013

Arwin said:
I think by giving CPUs a fixed budget, no matter what solution is picked, it should be fairly easy to avoid contention ...

I don't think the split achieves that, I think it stops the GPU monopolising all the bandwidth. I think there are two measures getting confused. One is the bandwidth, one is the latency of any given request. Each cycle the memory bus can only do one thing. Service one client. Switching between them has a latency cost. I'm not sure there is a limit on that cost (eg 150 vs 20). I could be wrong....

MrFox · Sep 26, 2013

All of this applies equally to the contention between the 8 CPU cores. You have 9 entities trying to access the memory at the same time, but 8 of them have priority over the GPU. I think contention between cores should logically dwarf any impact from the GPU, and it still doesn't seem to be an issue. The memory arbitration is probably designed to deal with this.

oldschoolnerd · Sep 26, 2013

Strange said:
I remember there has been a discussion here about DDR3 vs GDDR5 and the general consensus and data reveals that there is no significant difference of latencies.

Ignore any raw latency difference then, it's the contention that induces additional latency.

pMax · Sep 26, 2013

oldschoolnerd said:
Code is more complex or dataset smaller.

I have said something different: I have said to group threads that access similar data in the same cluster, in order to maximize the potential good effects of shared L2.
The dev should, on the other hand, optimize the code on the opt stage to keep a coherent access pattern (easier cache reuse among threads in same cluster?).

It might actually reduce average accesses to RAM from CPUs, due to average better local data coherency.

oldschoolnerd · Sep 26, 2013

MrFox said:
All of this applies equally to the contention between the 8 CPU cores. You have 9 entities trying to access the memory at the same time, but 8 of them have priority over the GPU. I think contention between cores should logically dwarf any impact from the GPU, and it still doesn't seem to be an issue. The memory arbitration is probably designed to deal with this.

So are all the GPU cores making memory requests funnelled through one aggregator? Makes sense to do that. But it's going to be very greedy...and still there is the problem I suggested in the OP. When the memory controller is faced with arbitrating a conflict does it interrupt the in-flight GPU request, or does it make the CPU wait? Option one has to lead to increased contention, option 2 leads to increased latency for the cpu. ?

Strange · Sep 26, 2013

oldschoolnerd said:
Ignore any raw latency difference then, it's the contention that induces additional latency.

Why does contention induce additional latency just for PS4? :???:

oldschoolnerd · Sep 26, 2013

Strange said:
I remember there has been a discussion here about DDR3 vs GDDR5 and the general consensus and data reveals that there is no significant difference of latencies. IIRC GDDR5 has relatively higher clock latency but GDDR5 also runs at a much higher clocks, in the 5.5~7Ghz range while current DDR3s run in the 1.6~2.3 Ghz range. If you factor in the higher clock latencies with the actual clocks themselves together they cancel each other out. If you measure the time itself I think they are generally the same.

Here's the info.

http://www.hynix.com/datasheet/pdf/dram/H5TQ1G4(8_6)3AFP(Rev0.1).pdf
http://www.hynix.com/datasheet/pdf/graphics/H5GQ1H24AFR(Rev1.0).pdf

Examples here are
GDDR5 timings as provided by Hynix datasheet:
CAS = 10.6ns

tRCD = 12ns

tRP = 12ns

tRAS = 28 ns
tRC = 40ns

DDR3 timings for Corsair 2133@11-11-11-28

CAS = 10.3ns

tRCD = 10.3ns

tRP = 10.3ns

tRAS = 26.2ns

tRC = 36.5ns

So latency wise you probably shouldn't see a disadvantage of GDDR5 as they're in the same ballpark.

I wasn't really saying differing latencies were a big factor, more of a contributing factor. To be honest I don't know what those KPIs mean. But I can see 10% ish in most of them, which in a stall situation could add 10% to the bottom line...which when you have logical dependencies in your threads would have more of an impact.

Having said that, if the latencies were identical my point still stands ...the more simultaneous requests made to the memory bus, the more contention. The PS4, without a dual pool configuration has loads of requests hitting their single pool.

Pixel · Sep 26, 2013

Perhaps there are some oddities with gddr5 that were not expected by AMD or Sony. Why else is AMD backing off its gddr5 based Kaveri PC apu?

oldschoolnerd said:
Sorry I'm on a phone and quoting/bolding is difficult, but the guy who replied to you saying the cpu could be the looser in all this, that's the point I am making. Multi core coding is difficult at best, and there going to be logical dependencies between the threads. ANY increased latency of cpu requests could impact more than just the thread concerned. Multiply that by the number of threads and the potential for stalls (logical or physical) has to go up.

There is that too, but I'm just looking at skyrim with 4k texture mods which can run @ 30fps on a 7850 and I ask why do so many ps4 game have not impressive textures? We all know very very high textures res on PCs only has a moderate impact on gpus.

oldschoolnerd · Sep 26, 2013

Strange said:
Why does contention induce additional latency just for PS4?

It doesn't. But the more simultaneous requests you make the more contention you have. If you'd found a way to get 3/4 of your requests going to a separate pool, you'd have less of a problem in the first place...

oldschoolnerd · Sep 26, 2013

pMax said:
I have said something different: I have said to group threads that access similar data in the same cluster, in order to maximize the potential good effects of shared L2.
The dev should, on the other hand, optimize the code on the opt stage to keep a coherent access pattern (easier cache reuse among threads in same cluster?).

It might actually reduce average accesses to RAM from CPUs, due to average better local data coherency.

It might do, but it's complex to do hey? And if you were aiming to squeeze every last drop of performance, you'd be doing that anyway, whatever the platform.

Pixel · Sep 26, 2013

Instead of looking at it on a bandwidth basis, look at it from a commandclock/cycles basis. How many memory cycles (ballpark range) would something that saturates a 7850 bandwidth (like Skyrim with 4k texture mods) take?
Are the remaining comandclock/ memory cycles sufficient to feed the latency sensitive cpu with a tiny cache? If not they'd have to compromise on feeding the gpu in a cycles basis.

gurgi · Sep 26, 2013

PS4 has low res textures because of CPU latency created by UMA memory contention?

MrFox · Sep 26, 2013

We are entering conspiracy theory land.

I would have said it's FUD land already, but no, it's a bit further up ahead...

Arwin · Sep 26, 2013

oldschoolnerd said:
I don't think the split achieves that, I think it stops the GPU monopolising all the bandwidth. I think there are two measures getting confused. One is the bandwidth, one is the latency of any given request. Each cycle the memory bus can only do one thing. Service one client. Switching between them has a latency cost. I'm not sure there is a limit on that cost (eg 150 vs 20). I could be wrong....

It means you've given CPU a budget to work with. This then means you can think about priority, latency and what not. Every 'frame' (clock cycles equivalent) you can say: prioritise up to x cycles for CPU, then service GPU. It's probably a huge oversimplification, but it seems ... Unlikely to say the least that there wasn't a clear design decision that was made here.

And incidentally, I thought the days where a memory bus does 'only one thing' were long behind us, but I am certainly no low level expert.

dobwal · Sep 26, 2013

There are dozens upon dozens of research papers on this subject that presents dozens upon dozens of solutions to tackle this issue. None I have read have stated this is a difficult problem to overcome just that giving simple preference to the cpu or gpu when accessing memory isn't a robust solution.

That being said, its going to be hard to determine the effect of memory contention on the PS4, if you don't look at the fact that PS4 isn't a discrete cpu/gpu set up but probably a HSA design which minimizes data copying between different pools of memory normally employed by a discrete system. Thereby reducing bandwidth pressure, memory contention as well as latency.

dobwal · Sep 26, 2013

There are dozens upon dozens of research papers on this subject that presents dozens upon dozens of solution to tackle this issue. None I have read have stated this is a difficult problem to overcome just that simply giving preference to the cpu or gpu when accessing memory isn't a robust solution.

That being said it going to hard to determine the effect of memory contention on the PS4, if you don't look at the fact that PS4 isn't a discrete cpu/gpu set up but probably a HSA design which minimizes data copying between system and video ram needed by discrete system. Thereby reducing bandwidth pressure as well as latency.

DrJay24 · Sep 26, 2013

Pixel said:
There is that too, but I'm just looking at skyrim with 4k texture mods which can run @ 30fps on a 7850 and I ask why do so many ps4 game have not impressive textures? We all know very very high textures res on PCs only has a moderate impact on gpus.

What is this thread, internet warriors have discovered what game devs and Sony/AMD engineers have not? Maybe Sony found some holes they can access to reduce contention?

Pixel · Sep 26, 2013

DrJay24 said:
What is this thread, internet warriors have discovered what game devs and Sony/AMD engineers have not? Maybe Sony found some holes they can access to reduce contention?

Why are you saying this? I don't get it. We are the last to discover it on earth.
We are discovering it long after the Sony/AMD engineers have complete knowledge and understanding of it. Long after game studios have discovered it. We are the last on earth to discover it, and its just a result of all the gameplay videos we've seen from all the games.

Is PS4 hampered by its memory system?

Arwin

Now Officially a Top 10 Poster

Strange

oldschoolnerd

MrFox

Deludedly Fantastic

oldschoolnerd

pMax

oldschoolnerd

Strange

oldschoolnerd

Pixel

oldschoolnerd

oldschoolnerd

Pixel

gurgi

MrFox

Deludedly Fantastic

Arwin

Now Officially a Top 10 Poster

dobwal

dobwal

DrJay24

Pixel

Similar threads