Could this gen have started any earlier?

Ok, I thought Bobcat had a single L2 cache for each pair of cores.


I'm still wondering how much the lack of cache coherency could hurt performance in console games given a developer's control over what the cores do.
For example, the Athlon II series are Phenom II CPUs with their L3 cache disabled. There's only L2 cache per core, just like Bobcat, so data between the cores must go through the RAM.
Yet, in PC games (where the developer only controls how many threads/processes there are, there's no fine control over each CPU), the gains per core seem pretty good at least in these results:


xsq0qCR.jpg
fRmM8zl.jpg

xwstBse.jpg
hfO4Avc.jpg


(note: the legend is wrong, it's an Athlon X4 and not a Phenom)

It's scaling pretty well, compared to a Nehalem i7 870 which has 8MB of L3 cache shared by all the cores.


In this 2011 console, would the lack of cache coherency hurt performance scaling with more cores? Definitely. Would it be so much that the console makers would prefer to have less, bigger, hotter and much more power-hungry cores?
 
Ok, I thought Bobcat had a single L2 cache for each pair of cores.
I'm still wondering how much the lack of cache coherency could hurt performance in console games given a developer's control over what the cores do.
There are two issues at play. One is the performance penalty of sharing cache lines, a penalty that console developers have noticed. Since consoles have to heavily leverage concurrency to make up for a lack of straightline speed, it shows up.
The other is that regardless of the performance of such sharing, it is required that the CPUs be coherent. This means that any independent CPU domain--in this case the L2 for either architecture--must plug into something that allows it to perform the required actions per AMD's MOESI protocol. To do otherwise is to be broken. AMD APUs have so far only ever had 2 such CPU coherent integration points implemented, and since Bocat is 1:1 and Jaguar is 4:1 in terms of cores per domain, the concern would be whether Bobcat could provide more than two weak cores at all.

For example, the Athlon II series are Phenom II CPUs with their L3 cache disabled. There's only L2 cache per core, just like Bobcat, so data between the cores must go through the RAM.
AMD's caches can forward modified lines between each other without a round trip to memory.

Yet, in PC games (where the developer only controls how many threads/processes there are, there's no fine control over each CPU), the gains per core seem pretty good at least in these results:
The Phenom is a quad core with desktop-level cores and broader crossbar clocked for desktop speeds. Given the earlier concerns of a physical ceiling to Bobcat's core count, the AMD 1 core score might be the one most applicable.

In this 2011 console, would the lack of cache coherency hurt performance scaling with more cores? Definitely. Would it be so much that the console makers would prefer to have less, bigger, hotter and much more power-hungry cores?
If you mean a lack of high-speed concurrency, it would be notable. No coherency at all would be a defective APU.
I speculate that Bobcat's limited performance, limited FP, and unclear path to upping the core count back in 2011 would have made it much harder to justify going with an AMD small core, possibly harder going for x86 at all. Cerny's research into the improvements in x86's performance scaling probably didn't conclude Bobcat was the bar of good-enough.

There were leaks or rumors of a version of the PS4's APU that used desktop cores, and the switch to Jaguar only made sense because Jaguar's vector resources were upgraded even as the max number of cores possible was quadrupled.
 
Ok, I'm convinced.
The PS4 from 2011 would have 4 K10 (maybe K10.5/Stars) cores at ~2GHz. AMD used to sell the Athlon II 600e with a 45W TDP and 2.2GHz clocks.

AMD's caches can forward modified lines between each other without a round trip to memory.
But Bobcat couldn't do that?
 
But Bobcat couldn't do that?
Bobcat specifically, I do not have a ready reference, but MOESI protocols are used by all of AMD's other CPUs, so it should.
It would be a question of how high the latency is. It's significant enough that the current consoles are optimized to minimize sharing, but then at least you have neighboring cores to run threads that share data.
If the rule is "avoid sharing data between L2s", that can constrain the amount of concurrency available when following that rule leaves just one thread that can be currently running in that L2. Then there's the question of what parallelism you can get if there's still a system-reserved core.
 
No coherency at all would be a defective APU.
It's an APU that manages coherency through software to maximize die utilization for superior throughput in targeted tasks.

The issue isn't the hardware, it's the marketing team! :yep2:

User manual: "To bypass the cache, flush it by carefully reading from other memory locations."
 
Last edited:
Ok, I'm convinced.
The PS4 from 2011 would have 4 K10 (maybe K10.5/Stars) cores at ~2GHz. AMD used to sell the Athlon II 600e with a 45W TDP and 2.2GHz clocks.


But Bobcat couldn't do that?
Why not go with an improved Cell? It would have been relatively cheap and power efficient. A new member posted its work on the matter of Asymmetric Multi Processing (iirc which I feel I don't) I enjoyed reading it and in that context Cell made sense.
In 2010 or 2011 I've a tough time thinking of anything else but a slight evolution of Xenon or The Cell. What sounds reasonable to me is either a quad core Xenon or a Cell 2 both being a lot more conservative set-up than some costly evolution we were envisioning back in time.
I think Sony had the best design with the Cell, I'm convinced a 2 PPE+ 4 SPE would have been a tiny power efficient little beast, 6 SPEs or more might have ended the best choice wrt to retrocompatibility.
 
I am curious if IBM's desire to bow out of its in-house microelectronics business was formulated at that time. Maybe it would have been more willing at that time to iterate on the designs it already provided.
 
Well it might have been badly imbalanced but it had its high points. Both the Xenon and the Cell came with a big issue, their performances per core were not good at a time where multithreading application was not the norm. That is where the similarity ended, Xenon offered 3 sucky CPUs, the Cell only one, in turn the devs had to fit code to the SPU that might not have been such a natural match and there was case where I case they simply could not spread the load as some dev commented this year, a sad matter really... There is the core count there is also the amount of L2 available to the CPU, 512KB, that is tiny by 2010 or 2011 standard.
Now if you double the CPU performances (and more) I would think that it would make a big difference and the SPU as accelerators are far from bad and it seems there were tasks that was pretty easy match.
I really think it would have been a little killer, better than an hypothetical quad core Xenon or anything doable (cost/power) at that time.
Another choice would be something based on the cores Nintendo uses with or without the help of SPUs. SPUs could come handy to cover some of the CPU lacking, either way add some cores.

OT it is really sad to see that Nintendo made a great choice with the POWER 750CL back in 6th gen era, a cpu that could have been the successful basis for three generations of consoles, and then wasted its potential away. Nintendo had the tool to bring the pain to both MSFT and Sony, they choose to skimp like hell on their hardware leaving core gamers out in the cold.
 
Back
Top