Thoughts on next gen consoles CPU: 8x1.6Ghz Jaguar cores

Worth saying that Carmack in the same twitter said that Sony made wise engineering choices, so I think he is not dissatisfied with the Jaguar.
Carmack twitter here anyway.
 
If power consumption scaled linearly, the consoles would probably have had 4 cores at 3+ GHz, but unfortunately it doesn't and I think power consumption was the main driver for 8 1.6+ GHz cores vs 4 3+ GHz cores.
While true, I`d think the primary concern was being able to fab the CPU at several Fabs like TSMC instead of GloFo. Piledriver/Steamroller cores are bound to processes not available at TSMC while GloFo likely doesnt have the capacity to cheaply manufacture them.
That only leaves Bobcat/Jaguar and these architectures simply wont scale to 3Ghz at all.
 
John Carmack specifically addressed the question of more slower cores vs fewer faster cores recently on Twitter. His answer was if total performance were equal he'd go for fewer faster cores every time but the question got interesting if you could get 1.5x the peak power out of the slower, more numerous cores.
So 4 x Piledriver (2 modules) at something like 2.5 gHz (like on the faster binned 35W Richlands) would probably be preferable to 8 Jaguars at 1.6 gHz.
How do you follow that? John Carmack addressed specifically this case as "more nuanced", i.e. it isn't the clear call you assume. There are plenty of situations, where you would prefer the higher total throughput compared to the higher single thread performance.
 
While true, I`d think the primary concern was being able to fab the CPU at several Fabs like TSMC instead of GloFo. Piledriver/Steamroller cores are bound to processes not available at TSMC while GloFo likely doesnt have the capacity to cheaply manufacture them.
That only leaves Bobcat/Jaguar and these architectures simply wont scale to 3Ghz at all.

Yes, that's good point to. I really think both console companies are making trying to make wiser choices when it comes to the long term cost reductions of their consoles. Simpler, lower clocked designs are going to be easier to scale to lower nodes and allow for quicker cost reductions. We should hit lower price points much quicker this up coming gen.
 
For reference:

PCmark Vantage
4 cores reference Kabini A6-5200: 5271 ( 17 watts )
2 cores reference Brazos E2-1800: 2807 ( 18 watts )
2 modules 4 cores AMD A10-4600M ( Trinity at 2,3 GHz-3,2 GHz Turbo ): 5552 ( and more or less 35watts )

The fact that the dual core bobcat @ 1.7 scores more than half the amount that a quad core Trinity @ 2.5/3.2 means you should already know not to judge gaming CPUs by those results. PC Mark is a complete system benchmark that's influenced by all kinds of things.

Try and find some CPU limited gaming stuff - the Trinity should be about 2 ~ 3 times faster depending on the game.
 
How do you follow that? John Carmack addressed specifically this case as "more nuanced", i.e. it isn't the clear call you assume. There are plenty of situations, where you would prefer the higher total throughput compared to the higher single thread performance.

I'm assuming that it's nuanced, that's what he's getting at. Even after years of multi threaded game development it is still very common for a not so fast Intel 4 core / 4 thread processor to absolutely smash an 8 core AMD. Sometimes a dual core core can do it, and sometimes a fast Intel dual core can still beat a slow Intel quad core (with more than 1.5 times the "peak power", assuming that's FLOPS, instructions per clock, BW, or whatever).

I'm taking Carmack to mean, very roughly, that there's a point where more cores begin to trump faster cores across the kind of workloads that games require. By logical extension there must also be a point where fewer, faster cores trump more cores even with less "peak power." In fact we can see that both of these are true and have been for years in the PC space. It all depends on the software.

I was simply looking for an example of where fewer, faster cores with fewer peak FLOPS or IPS or whatever would probably be faster, given the very rough figure of "1.5 times". The Piledriver based CPU I picked seemed to fit the profile of having less than 1.5 times the "peak power" of an 8 core Jaguar CPU @ 1.6 gHz but probably being better for running games, at least as we know them at the moment.

I bet if you stuck Rage on a quad core trinity and an 8 core Jaguar PC (if such a thing existed) that the Trinity would fare better.

Edit: okay so apparently it wouldn't, because Rage seems to be capped at 60 fps, and even a Phenom 2 X2 can top it out ...

http://www.tomshardware.com/reviews/rage-pc-performance,3057-8.html
 
Last edited by a moderator:
How do you follow that? John Carmack addressed specifically this case as "more nuanced", i.e. it isn't the clear call you assume. There are plenty of situations, where you would prefer the higher total throughput compared to the higher single thread performance.

It depends on the problem and programming. Many things don't scale linearly so people can typically get better throughput in a more powerful, fewer core setup. Then again, if it's just small number of cores (4 vs 8 cores), the efficiency should be close. It's not like a supercomputer setup with thousands of cores. The (scaling) gap may not be that big anyway. With more (lower power) cores, you can have other advantages like more parallelism, more power efficient.

There is setup, switching and development overhead for threading. But both will incur some just because we are talking about a threaded environment in both cases (unless we are talking about 1 core vs n cores).
 
One problem of scaling with core count on a PC is basically that the developers don't know how many threads the machine has when writing the code. That's not an issue on consoles. It is easier there to put 8 cores to good use than on a PC. And it is often more efficient to find a static split than to dynamically trying to scale the number of used threads according the number the machine supports. In the latter case (PC) it can easily become unbalanced, limiting the scaling with higher threadcounts (while always using a high number of threads may make it less efficient from the start and may badly influence synchronization latencies). PCs therefore prefer less but faster cores to a higher extent than I would expect it from consoles.
 
theres a reason why bulldozer isn't use and I think it has to do with low FPU performance. 8 jaguar cores would theoretically deliver like 2-4x the float point performance as 2 trinity modules at the same clock depending on the operations. In gaming floatpoints are useful so it made sense to go with something that isn't bulldozer derived.

Lower power consumption and die area is also very good too.
 
Back
Top