Thoughts on next gen consoles CPU: 8x1.6Ghz Jaguar cores

no it isn't, again watch the hotchips presentation, IPC comes from more agressive front end ( aditional L2 predictor/fetch, more agressive core prefetch/predictor) . Improved scheduling and a big improvement in OOO Load and Store capabilities, they even go as far as giving overall IPC improvement for each area.

Okay, so if they go that far, do they specifically say there is a 15% IPC improvement in the SIMD units on top of the doubling in width? When hotchips talk about doubling throughput, do they talk about throughput being double or 2.3x greater?

It might be useful if you posted a link to the presentation so I know what you're referring to. I wasn't able to find it on Google.

No im talking about benchmarks which compare bulldozer AVX vs FMA compiled code. of corse in a SR console devs would take every chance to write code that could be FMA'd but really how offen is that going to be fessable.

And do those same benchmarks also show twice the performance when comparing roughly similar CPU's where one has twice the SIMD width (ala bobcat and Jaguar)? If you would share your links it might aid the discussion.

And yet games developed for a jaguar console wont be "normal" applications.

It's true that the benchmarks at Anand don't reflect the SIMD improvements of Jaguar over Bobcat and in consoles, that advantage will be apparent, but it certainly won't be a doubling of overall performance. The none SIMD performance of Jaguar is fairly well represented by Bobcat (once you add in the 15% or so) and it's quite clear that there, it's not all that hot. That's why my A10 comparison was a bit more relevant. Since they have fairly comparable theoretical SIMD throughput you can at best say when it comes to SIMD they'll be roughly comparable but in anything else the Jaguar will be a little slower in multithreaded code (8 cores vs 2 modules) and vastly slower in single threaded code.
 
Now that we have some form of discussion going....i like to say i find it hard to believe Jaguar will be 4 times faster than Xenon or 1.5 times faster than Bobcat (let alone 2 times faster)...not even Intel gets such gains going from one generation to the next...call my jaded...but i been through such talks about closed box consoles...and how it is their special sauce...people said a lot of EE, Cell, RSX and Xenon back then...but it always all ended as fanbois reading off the checklists...so unless brother itsmydamnation, you are a games developer?....pardon me for it..

To share my concerns about CPU bottlenecks....it is a real problem even on the more powerful PC platform....you will need a i7 3930K to comfortably run BF3 or PS2 mp at >60fps....a lesser (but still jawesome) i5 3570K will send your fps down to 40s under load...true story....but now AMD is telling Sony/MS game developers to run their codes on 1.6Ghz Jaguar cores.... call me skeptical but even with the CUs of GPGPUs...i fear AI/Physics/Performance will just not reach the next level. AFAIK....GPGPU physics are still for eye candy and not gameplay physics...

The average recommended PC requirements of the newest batch of games (uprez PS3/Xbox ports) hovers around a quad core x86 CPU at 2.8Ghz...the minimum requirements are lower at 2.4Ghz dual cores... can we assume the 8 Jaguar cores are as good as Intel 2.4Ghz Core 2 Duo...? It is not a good thought either way...
 
call me skeptical but even with the CUs of GPGPUs...i fear AI/Physics/Performance will just not reach the next level. AFAIK....GPGPU physics are still for eye candy and not gameplay physics...

That won't be how it works in the consoles. They'll likely both feature HSA which from a gaming point of view is specifically all about allowing GPGPU operations to be performed within timescales that make them usable in gameplay. Hopefully PC's will be able to do the same using HSA (or equivalent) APU's. I started a thread to explore that possibility but no-one seems interested in discussing it at the moment!

can we assume the 8 Jaguar cores are as good as Intel 2.4Ghz Core 2 Duo...? It is not a good thought either way...

Based on the Anandtech benchmarks, while single thread general performance is much higher on the C2D, the number of cores in the console CPU's would be more than enough to make up the difference in multi threaded code. SIMD performance would also be much higher than the C2D at a meagre 38.4 GFLOPs

EDIT: I'd say you're going to want at least a 3+Ghz Core 2 Quad to make sure you keep can keep up with the new console CPU's.
 
Last edited by a moderator:
The way I see it..

1) PC games on Trinity tend to be GPU limited, not CPU limited, and they're likely to use more CPU time than if they were developed for a console of similar hardware (driver overhead, etc). Therefore it's not the ideal balance for games, which makes sense since it's meant to be an adequate general purpose PC CPU too.
2) Consoles have pushed multithreading requirements pretty hard already last gen, and can easily make good use of 6+ threads. Therefore the new choice should favor cores over clock speed, and therefore smaller and lighter cores.
3) Consoles have for a while favored SIMD FP streaming performance over general purpose scalar stuff so the CPU should be balanced in that direction, and Jaguar is much more so than PD with its shared FPU.

So Jaguar makes sense over PD (or SR for that matter, whose improvements are mostly on the general purpose scalar code front), but there are some things that'd be nice..

1) Better than 1.6GHz clock speed.. some turbo at least would be good. While games may use 8 cores they won't use them all 100%, especially if one or two are dedicated to OS things.
2) An FMA instruction which Jaguar lacks, which would be an easy perf win for consoles. But we don't know if this will be the case for the console version. Console designs don't have to follow release roadmaps that make sense for the PC world, so AMD can sneak something in at least a little early.
3) SMT. This was probably going to be out no matter what unless they went with PPC or MIPS again.. But it seems like a reasonable choice for consoles.

Intel was always out of the question because they never would have licensed a decent core at a reasonable price. With AMD they get a big extra bonus of using the same vendor for CPU and GPU, and one who has developed a bunch of technologies for integrating the two on an SoC.

Cortex-A15 would have been an interesting situation. It uses 1.5W @ 1.7GHz on a leakage optimized (over power optimized) 32nm process.. I bet it would use under 5W @ 2.5GHz on a performance optimized 28nm process (possibly much less, possibly under 4W?), which would allow for under 40W for 8 cores full tilt which is a reasonable allotment. The IP configuration for 8 cores is in place already. But here they'd likely need to do a lot of the implementation/layout work themselves instead of getting it from AMD. They'd also need to work it with a memory controller from AMD or do their own - the one you can license from ARM is probably not sufficient for the task. And a bunch of middleware would have to be written for ARM/NEON where I'm sure it's already written for SSE4. So there'd be a lot of new work. But it'd have given a good bridge with future phone/tablet dev.
 
Now that we have some form of discussion going....i like to say i find it hard to believe Jaguar will be 4 times faster than Xenon or 1.5 times faster than Bobcat (let alone 2 times faster)...not even Intel gets such gains going from one generation to the next...call my jaded...but i been through such talks about closed box consoles...and how it is their special sauce...people said a lot of EE, Cell, RSX and Xenon back then...but it always all ended as fanbois reading off the checklists...so unless brother itsmydamnation, you are a games developer?....pardon me for it..

To share my concerns about CPU bottlenecks....it is a real problem even on the more powerful PC platform....you will need a i7 3930K to comfortably run BF3 or PS2 mp at >60fps....a lesser (but still jawesome) i5 3570K will send your fps down to 40s under load...true story....but now AMD is telling Sony/MS game developers to run their codes on 1.6Ghz Jaguar cores.... call me skeptical but even with the CUs of GPGPUs...i fear AI/Physics/Performance will just not reach the next level. AFAIK....GPGPU physics are still for eye candy and not gameplay physics...

The average recommended PC requirements of the newest batch of games (uprez PS3/Xbox ports) hovers around a quad core x86 CPU at 2.8Ghz...the minimum requirements are lower at 2.4Ghz dual cores... can we assume the 8 Jaguar cores are as good as Intel 2.4Ghz Core 2 Duo...? It is not a good thought either way...

Should we be measure the computional abilities of durango and orbis by simply looking at the jaguar cores in isolation? Does the performance of apus in the PC space even serve as a good model for the potential of apus in the console space?

Correct me if I am wrong but Jaguar nor Bobcats are manufactured in a strictly cpu core configuration for the pc space. APU cpus arent design to run in isolation efficiently as the gpgpu functionality is there to make up for the simplification of cpu core design.

The whole development and software ecosystem of the console will be tailored to apu based hardware. AMD has already stated that one of its apu hardware issues in the PC space is that the desktop OSes arent design to recognize gpus as general purpose processors. The console space in general will more efficiently expose the computional abilities of apu hardware as apus in the PC space still make up a small portion of overall unit sales.

Not only will the console space push GCN use to make up for the short comings of the jag cores, there will be an equal effort to make sure that the computional footprint used in the SIMD cores is as small as possible as gpu function will compete for that hardware. The PC space will actually benefit from that effort.
 
jagaur hotchips presentation is in the middle here.

http://youtu.be/_GXA38vFPXA


also i think we need a point of reference, general desktop/server perfromance against general desktop/server CPU's or relative to Xenons/PPE and console gaming performance. We are only going to see quad core temash/kabini yet the consoles will have 8 cores, that alone should show that console gaming workloads are very different to even desktop gaming workloads.

My question is if people think Jaguar is so anemic why did both MS and Sony pick it, you would assume they learn't a lot from the development of the last gen consoles. They would also know what Developers want more of in terms of performance. Consoles are hopefully coming Q4, jaguar is coming Q2 with SR in H2 so it would seem that both where options.


To share my concerns about CPU bottlenecks....it is a real problem even on the more powerful PC platform....you will need a i7 3930K to comfortably run BF3 or PS2 mp at >60fps....a lesser (but still jawesome) i5 3570K will send your fps down to 40s under load...true story....but now AMD is telling Sony/MS game developers to run their codes on 1.6Ghz Jaguar cores.... call me skeptical but even with the CUs of GPGPUs...i fear AI/Physics/Performance will just not reach the next level. AFAIK....GPGPU physics are still for eye candy and not gameplay physics...

i average around 60 FPS @ 3840 x1024 on 64 player servers using a 920 @ 3ghz and im majorly GPU limited it would seem BF3 scales performance with threads :) of which the consoles have many. But why do we keep comparing to PC gaming as the reference point the consoles can run BF3 just fine on Xenons/Cell.
 
Last edited by a moderator:
My question is if people think Jaguar is so anemic why did both MS and Sony pick it, you would assume they learn't a lot from the development of the last gen consoles. They would also know what Developers want more of in terms of performance. Consoles are hopefully coming Q4, jaguar is coming Q2 with SR in H2 so it would seem that both where options.

Cost. Heat. Size. Those are the major factors. With those being limiting factors you then try to get the CPU that you think is the best balance for the GPU that you're choose (which is also limited by cost, heat, size).

Your 920 @ 3 ghz for example would be overkill. Not only in performance but in those 3 important aspects for a home console.

Regards,
SB
 
Cost. Heat. Size. Those are the major factors. With those being limiting factors you then try to get the CPU that you think is the best balance for the GPU that you're choose (which is also limited by cost, heat, size).

Your 920 @ 3 ghz for example would be overkill. Not only in performance but in those 3 important aspects for a home console.

Regards,
SB

which i completely agree with, the 8 cores of jaguar @ 1.6 is likely using something around 15-25 watts if we asume 150-170watt target for a console thats 30-50 watts for memory controller/memory/ I/O etc and then a 90-100watt GPU. which would be somewhere right around highend cap verde low end Pitcairn which is right in our FLOP range....... WHO WOULD HAVE THUNK IT!!?!?! :LOL:


should also note that 2.2 quad core PD opteron is 35watts and 8 core @ 2.6 is 65watts.
 
I think it was ERP who also raised the possibility that there might not have been an alternative CPU ready for 28nm.

32 nm would cost you too many transistors, and Steamroller is nowhere near ready yet.
 
I think it was ERP who also raised the possibility that there might not have been an alternative CPU ready for 28nm.

32 nm would cost you too many transistors, and Steamroller is nowhere near ready yet.

SR is releasing in H2 of this year, so its not that far away from jaguar being released which is Q2.
 
That 6 ~ 9 months could make all the difference as customisation for a console part wouldn't be instantaneous or risk free, plus as SR was further out there would have been more uncertainty around selecting it 2+ years ago.

AMD's roadmaps have wobbled a bit over the last 12 months, with SR disappearing and then reappearing on the 2013 schedule. For AMD's sake I hope it makes it. Not that Richland isn't impressive, it's just that AMD need to close the gap faster. Richland looks to be around Sandybridge on the CPU side and probably around Haswell on the GPU front.
 
I never said that Streamroller was off the roadmap, I said that it disappeared off the 2013 schedule for a while.

http://www.donanimhaber.com/islemci...-ailesi-icin-2013te-guncelleme-gorunmuyor.htm

The performance 32nm part is still AWOL. And we can't assume that just because AMD could squeeze out Kaveri this year that they'd have time to customise it and integrate it into a bespoke chip that they could mass produce and supply in time for shipping millions of console by October / November this year.
 
The performance 32nm part is still AWOL. And we can't assume that just because AMD could squeeze out Kaveri this year that they'd have time to customise it and integrate it into a bespoke chip that they could mass produce and supply in time for shipping millions of console by October / November this year.

that would purely be a function of how much MS and SONY want to pay which of corse would factor into any jaguar vs SR comparision. Also if the performance gap was large enough would they wait 3 months for something that is going to last 7 or so years?
 
A10 4600M is a 35w TDP CPU with "4" piledriver cores at 2.3GHz (turbo up to 3.2)
and it has a a full GPU inside with 384 sps,

keeping 2.3GHz as the target, excluding the GPU and memory controller (shared with the external GPU?), couldn't they have a 8 core version (without l3 cache) at a pretty low TDP?

it would certainly be a lot faster, or there is something else to consider, like size, or the design being optimized for the global foundries process?
 
well a BD/PD module on 32nm SOI was about 30 mm sq ( including L2, about 18 without) so it would be twice the die size. The way Kabini is also getting an Ax name should tell you about its performance relative to both trinity and bobcat.
 
jagaur hotchips presentation is in the middle here.

http://youtu.be/_GXA38vFPXA

Cheers


itsmydamnation said:
also i think we need a point of reference, general desktop/server perfromance against general desktop/server CPU's or relative to Xenons/PPE and console gaming performance. We are only going to see quad core temash/kabini yet the consoles will have 8 cores, that alone should show that console gaming workloads are very different to even desktop gaming workloads.

Certainly compared to Xenon/Cell the new CPU's are a massive step up. Comparing core numbers to Temash and Kabini doesn't make sense though since they are APU's aimed at the netbook and tablet markets. i.e. none gaming systems.

At the high end PC's are actually using 8 thread CPU's in a lot of cases (4 module bulldozers and 4C/8T i7's). Not that it matters since when it comes to gaming, the new console workloads will very quickly become PC workloads.

itsmydamnation said:
Also if the performance gap was large enough would they wait 3 months for something that is going to last 7 or so years?

I'd say no because then they miss Christmas. So really their competitor would have closer to a years lead on them when it comes to the big sales period rather than 3 months. That's too much to risk IMO.

function said:
Richland looks to be around Sandybridge

That sounds a bit optimistic to me (assuming a dual core Sandybridge at that).
 
1) PC games on Trinity tend to be GPU limited, not CPU limited, and they're likely to use more CPU time than if they were developed for a console of similar hardware (driver overhead, etc). Therefore it's not the ideal balance for games, which makes sense since it's meant to be an adequate general purpose PC CPU too.

But pair up Trinity with a discrete 7970 and then benchmark that against a 7970 running with a top end Ivybridge at reasonable resolutions and you'll probably find yourself CPU limited. More CPU performance is always going to be a good thing, but obviously the trade off has to be made for size and power too.

Whether MS/Sony considered this trade off to be a good one in the case of Jaguar or whether they simply worried Steamroller wouldn't be ready in time I guess we'll never know.

2) Consoles have pushed multithreading requirements pretty hard already last gen, and can easily make good use of 6+ threads. Therefore the new choice should favor cores over clock speed, and therefore smaller and lighter cores.

I don't see why. The same performance across fewer threads makes the developers life easier so if it can be achieved in the same power/size enveloper as well then I's assume it to be preferable. COnsoles went havility threaded through necessaity, not because it's necessarily better than few threads witht he same total performance.

3) Consoles have for a while favored SIMD FP streaming performance over general purpose scalar stuff so the CPU should be balanced in that direction, and Jaguar is much more so than PD with its shared FPU.

But Piledrivers FPU is twice as wide and runs at a much faster clockspeed so it balances out. This is what extremtech say about the FPU in Jaguar compared with Pilepriver:

http://www.extremetech.com/gaming/142163-amds-next-gen-bobcat-apu-could-win-big-in-notebooks-and-tablets-if-it-launches-on-time

Extremetech said:
FPU performance won’t match Trinity — Jaguar can only decode two instructions per clock cycle, compared to four for the larger core — but it should substantially improve over Bobcat.
 
But pair up Trinity with a discrete 7970 and then benchmark that against a 7970 running with a top end Ivybridge at reasonable resolutions and you'll probably find yourself CPU limited. More CPU performance is always going to be a good thing, but obviously the trade off has to be made for size and power too.

Whether MS/Sony considered this trade off to be a good one in the case of Jaguar or whether they simply worried Steamroller wouldn't be ready in time I guess we'll never know.

It isn't as simple as that though, as you've somewhat mentioned. They also have to worry about picking a CPU that is too powerful and then end up being always GPU bound. The optimal setup will be one where you are CPU and GPU bound in roughly equal proportions depending on the workload. Obviously some games will favor the CPU more and some will favor the GPU more. But in general you want them to be as balanced as possible.

Just like a 7970 on Trinity would likely be worthless, a 7850 on an i7 3970X would also be worthless.

And argument could surely be made that Jaguar is likely to be more CPU than GPU limited with a 7870 and possibly a 7850. But if you consider that one of the largest consumers of CPU power in a game is physics, then it starts to not seem so bad. At least on Orbis 4 CUs appear to be reserved for just such a workload. Whether it'll actually be useful for interactive physics remains to be seen however (GPU compute units thus far haven't been good for interactive physics). If it isn't then yes, we're likely to be CPU bound in most cases, at which point feel free to go hog wild with additional graphics effects or non-interactive GPU compute friendly physics effects.

Regards,
SB
 
At least on Orbis 4 CUs appear to be reserved for just such a workload. Whether it'll actually be useful for interactive physics remains to be seen however (GPU compute units thus far haven't been good for interactive physics).

That shouldn't be a problem for these consoles given they'll both be HSA (presumably). The high speed link between CPU and GPU along with the shared memory space should make gameplay effecting physics calculations on the CU's entirely possible.

That's partly the reason why it's not such a big deal that the Jaguars themselves don't have huge SIMD performance. Part of the philosophy behind HSA is that the GPU's SIMD resources can be used to suppliment those of the CPU.

This should be possible on the PC aswell using APU's but I'm not sure of the practicalities in that space.
 
Back
Top