Could next gen consoles focus mainly on CPU?

Your 55% is mixing apples and oranges.

You can't get an interface specs bandwidth in a real world usage. Unless all you do is reading the entire chip sequentially.

You would get only a few % performance with random 1 byte read because the max clock without dead cycles depends on the prefetch size. And there are also penalties every time it switches bank or change read/write direction.

You'd get the same thing on any memory. Gddr6 manages to increase the prefetch while keeping data granularity the same (256bits just like gddr5) so there's that.
yep simultaneous/constant back and forth read/write will kill the bandwidth dramatically.
I believe the 140 GB/s on that slide was sort of average case of available bandwidth while the game code is going at it. Certainly not worse case scenario
 
Maybe more memory channels would mitigate this without creating a ps3-style memory path spaghetti.

I keep thinking that the significant silicon area to put an additional cpu-only bus, or esram pool, or hbm, would be more useful simply making the main ram wider?
 
in an heterogenous CPU approach we may have 8 jaguars looking only at 16 GB of GDDR6 toghether with GPU and 8 Ryzens looking only at dedicated 4 GB of DDR5... I know it sounds strange... Some kind of "bridge" between the two banks is necessary to let all work... Dont know how
 
That 110 GB/s includes the CPUs BW. It's 100 GB/s for GPU in this rough graph. Furthermore, the RAM is rated 176 GB/s, no? So from the theoretical peak BW available to GPU, we see a 55% reduction to what's actually available to the GPU when the CPU is busy. I can't recall the specifics of that 140 GB/s figure.

No.

The 176GB/s is just a theoretical max of 5.5GHz x 32 bytes. The 140GB is the actual maximum bandwidth achieved by the SOC (by the GPU). And it's not even the same gigabytes. The 176 GB is 176 x 10^9, the 140GB is 140 x 2^30 (150 x 10^9).

Yes, the impact on bandwidth is disproportional to the CPU's bandwidth usage. Why is that ? Don't know, but could be:
1. The CPU has smaller, less effective transactions.
2. It might also be CPU memory transactions have priority over GPU memory transactions and thus leaves less room for reordering for open pages.
3. Poorer locality which might interfere with 2.)

That is a result of a memory system built for a GPU with a CPU cluster attached as an afterthought. Will that be so in the future? I doubt it.

Cheers
 
The 176GB/s is just a theoretical max of 5.5GHz x 32 bytes. The 140GB is the actual maximum bandwidth achieved by the SOC (by the GPU). And it's not even the same gigabytes. The 176 GB is 176 x 10^9, the 140GB is 140 x 2^30 (150 x 10^9).
For some reason this never occurred to me until you pointed this out.

So the SoC has no way to get to the theoretical maximum here? this really just put things into perspective for me
 
For some reason this never occurred to me until you pointed this out.

So the SoC has no way to get to the theoretical maximum here? this really just put things into perspective for me

Wait..what?

I can't ever seem to commit ram tech to long term memory since this is only place I ever discuss it and we only delve into the minutiae of the tech every now and again.

But if the PS4 is rated at 176 GBps (10^9) or 5.5 GHz x 32 bytes then as long as the PS4 can reach 5.5 GHz and can move 32 bytes in any one cycle then the actual theoretical max bandwidth can be achieved.

Or am I missing something? Are bits lost every cycle?
 
But if the PS4 is rated at 176 GBps (10^9) or 5.5 GHz x 32 bytes then as long as the PS4 can reach 5.5 GHz and can move 32 bytes in any one cycle then the actual theoretical max bandwidth can be achieved.

Measured over a single (or few) cycles, yes, measured over a second, no.

The PS4's GPU has 18 CUs, each CU has 4 16-wide SIMD execution units. Each of these exec units can issue a load or store per cycle. That's 72 memory transactions per cycle. The memory system consists of 8 1GB x32 modules, each with 16 banks. That means you have a total of 128 banks. If two memory transactions hit the same bank, one will have to wait for the other to complete. The memory controller tries to reorder memory transactions to resolve bank conflicts (as well as optimize for open pages in a given bank), but there will always be conflicts that induce stalls.

That means you will never ever reach the nameplate bandwidth outside of specially engineered Mickey Mouse benchmarks.

I'm guessing CPU transactions aren't reordered with GPU transactions; The GPU can handle memory latency quite well, a CPU, not so much. Raw latency of GDDR5 is on the order of 50-something ns, but the GPU normally sees ~200ns latency for memory operations, - because of the heavy buffering going on. The CPU memory transactions throws a spanner into the bank/open page optimization strategy and the memory bandwidth utilization drops.

HBM2 has 8 channels per stack, two pseudo channels per channel and 16 banks per pseudo channel, or 256 banks per stack,- and normally multiple stacks. With more banks you have fewer conflicts and higher utilization.

Cheers
 
so till HBM2 is cheap and reliable enough will not see new gen consoles that with high bandwith and much lower read/write conflict possibility can really give sense to Ryzens....
 
I'm pretty sure that ameliorating the memory contention problems of the ps4 is among the first and most imediatly obvious design goals for the team desiging ps5, so it may be fair to not worry much about that.
 
so till HBM2 is cheap and reliable enough will not see new gen consoles that with high bandwith and much lower read/write conflict possibility can really give sense to Ryzens....

HBM2 isn't reliable?
 
reliable, but dont know at what cost right now
and performance?
Vega and Fiji weren't the fastest cards in the world. Might not have to do something with their memory, but still I haven't seen a fast card with HBM 1 or 2 memory
Well, vega and fiji have a problem bringing it's theoretical power to the table.
 
I think in a console environment the heavy use of highly optimised async would see much greater utilisation of the CUs and much greater throughput. Would possibly better justify the use of such expensive memory... though I still don't think we'll see HBM in a console.
 
I dont see how both Sony and Ms can engineer a new console now... To many question marks. To much profitable the current gen. Maybe a new PS4 refresh at 7nm is on the way... Maybe 10 TF and 16 Jaguar cores... and GDDR6 on a 256bit bus
 
They'll use Ryzen 7nm + Vega/Navi. Going from Jaguar to Ryzen is pretty much several generations leap in perf. They don't have to "focus" on improving the CPU, AMD did that already.

I think the GPU will be Navi/NextGen, if not just slightly custom Navi.
 
The cpus will be more of a focus than they were last time, anyway.

If it were up to me I would put the emphasis on cpu power and memory bandwidth rather than just a huge gpu, the bare minimum cpu and merely adequate bandwidth. I would like 8 full fat zen cores, not zen lite and either gddr6 with eDRAM (96-128mb?) or gddr6 on a massive bus. Gpu could be 8-10 tf and that'd be more than ok with adaptive resolution.

I'd like 100% perfect hardware BC (beefed up jaguar/amd pitcairn) but not at the cost to new games. I don't think they could beef up jaguar enough for it to not hinder the rest of the system.
 
Last edited:
From a cost standpoint wouldn't HBM be the smart choice for a console ? nothing would have to go off chip and you could have a HBM stack dedicated to the CPu and the rest to the GPU
 
Back
Top