Thoughts on next gen consoles CPU: 8x1.6Ghz Jaguar cores

Discussion in 'Console Technology' started by gongo, Feb 2, 2013.

  1. gongo

    Regular

    Joined:
    Jan 26, 2008
    Messages:
    582
    Likes Received:
    12
    [​IMG]

    I find it is woefully under powered...and potential bottleneck..? Is there even a turbo mode in this thingie 'cos 1.6Ghz is W-T-F? How is the memory controller for it? How did AMD convinced both Sony and MS to take it...? Flash back to IBM convincing them the in-order castrated PPE was a good evil...

    I don't know how it compares in a closed box...so here is a bench graph from AT comparing its predecessor to the venerable P4
    http://www.anandtech.com/bench/Prod....31.32.33.34.35.36.37.38.39.40.41.42.43.45.46

    Mumblings from the developer community is that the Jaguar CPU is a side/downgrade from last gen consoles CPU...

    Further thoughts?
     
  2. itsmydamnation

    Veteran Regular

    Joined:
    Apr 29, 2007
    Messages:
    1,311
    Likes Received:
    411
    Location:
    Australia
    thoughts,

    1. 4x number of cores, 2x throughput per core of your P4 comparision
    2. how is 8x ~1-1.X IPC threads @ 1.6 sideways from 6 X 0.2 IPC threads.
    3. where are these Mumbelings?
    4. all the stuff the core can do many times better then Xenons/PPE branch predict, moves between registers, prefetchers etc.
    5. there is a balance of cost/power/performance, how about in a given TDP (say 25-30watts) what would you do for CPU?
    6. a jaguar CU doesn't have a memory controller, that is a SOC level unit.


    this thread is just as crap as 95% of the new threads popping up in this subsection.
     
  3. Hecatoncheires

    Newcomer

    Joined:
    Jan 11, 2013
    Messages:
    179
    Likes Received:
    0
    The new consoles are most likely to use a heterogeneous processor architecture. AMD calls it "HSA". What's special about this HSA is that different kind of cores, in this case x86 Jaguar cores and GCN streamprocessors, can be utilized to work together as a combined processor. AMD calls such a processor "APU".

    X86 Jaguar cores are pretty smart, but you only have very few of them. On the other hand, GCN streamprocessors are extremely dumb, but you have literally hundreds of them in your APU. So you want your smart Jaguars to do runtime intensive tasks and your GCN streamprocessors to do parallelizable tasks. This video explains it very, very easy to understand: HSA explained

    It's a little bit like the Cell in the PS3, which is also a heterogeneous processor. Heaviliy abstracted one could say that instead of the PPE you will have the 8 Jaguars, and instead of the eight SPEs you will have a couple of GCN streamprocessors.
     
  4. gongo

    Regular

    Joined:
    Jan 26, 2008
    Messages:
    582
    Likes Received:
    12
    Now now...lets not go into the name calling bits...the purpose of this thread is to invite more expressions from experienced console developers to share their views on the new CPUs....we all know what AMD has said about HSA and what not....reading off their list sheet is nice and all...but the perspective is narrowed down to specifics...
     
  5. itsmydamnation

    Veteran Regular

    Joined:
    Apr 29, 2007
    Messages:
    1,311
    Likes Received:
    411
    Location:
    Australia
    if you wanted to know what people think you would do something like.

    So i dont really understand the choice of jaguar as console CPU, from what i can see performance per clock seems on the low side and it isn't designed to clock high. The large amount of cores could be useful but how likely are they to be utilised and how much extra devloper effort is there in getting good performance out of 8 weakish lower clocked threads then say 4 more powerful and higherclocked threads.

    post something like that and you might get the responce your pretending your after.
     
  6. ninelven

    Veteran

    Joined:
    Dec 27, 2002
    Messages:
    1,712
    Likes Received:
    131
    I agree 1.6GHz seems a little low (was expecting ~2GHz), but I don't know what other choice you were expecting them to make.

    Their options were basically:

    X64: Jaguar

    ARM: Cortex-A15

    I thought MS might go with ARM & possibly even design their own core like Apple, but that didn't happen.

    I suppose they could have waited for Cortex-A57 or Denver, but that would have pushed the launch even further back.
     
  7. gongo

    Regular

    Joined:
    Jan 26, 2008
    Messages:
    582
    Likes Received:
    12
    W-T-F....some one seems cranky today..it could just be me being exposed to faster consumer oriented CPU today ...but now now...how ever you put it, lets stay on subject...

    Are there better alternatives..? How about Piledriver/Vishera from AMD...? That was the early rumored PS4 CPU..Intel hex-core was rumored to be in 720 at one point...but in the end both went with the stock low power Jaguar cores....no hint of special cpu sauce sadly even...

    Another question, how many and what devices are powered by Bobcat today?
     
  8. Jugix

    Newcomer

    Joined:
    Mar 4, 2008
    Messages:
    102
    Likes Received:
    2
    How would you adress the added 100W heat output in a small closed box if CPU was switched to piledriver? Take half of the GPU CUs away and bitch about weak GPU? :twisted:
     
  9. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    7,798
    Likes Received:
    1,080
    Location:
    Guess...
    Maybe I'm misinterpreting your meaning here but Jaguar does not have 2x the throughput per core of Bobcat. It has an optimistic 15% improvement going off AMD's own slides.

    Obviously when you add in that 15% and take account of the fact that there are 4x as many cores in the consoles then the comparison with the P4 doesn't look so great for the P4 with the best case scenario based on those benchmarks being (assuming linear scaling which won't be the case but whatever) ~4.5x faster than a single core P4 at 3.6 Ghz.

    That in itself doesn't really come across as great but a more interesting comparison IMO is this one with the A10-5800K:

    http://www.anandtech.com/bench/Product/328?vs=675

    In most cases even 4x the cores and a 15% improvement on top wouldn't quite be enough to match a pair of Piledriver modules (albeit at 3.8Ghz). And obviously the single thread performance vs piledriver is pretty dire at around than 1/3rd.

    That certainly raises interesting questions around the decision of using Jaguar rather than the upcoming Steamroller cores that will feature in Kaveri.

    A high end Kaveri in the PC space will feature 2 Steamroller modules which while comparable to the console CPU's overall, will still be noticeably faster in multithreaded scenario's and vastly faster in single threaded code. But then consider that the console CPU's double the standard PC Jaguar configurations so if the same would have held true for Steamroller....
     
    Mobius1aic likes this.
  10. itsmydamnation

    Veteran Regular

    Joined:
    Apr 29, 2007
    Messages:
    1,311
    Likes Received:
    411
    Location:
    Australia
    about 40million devices, HTPC's, webtops etc, low end dekstops. The point is jaguar is a significate leap above bobcat. an 8 core jaguar should have about 4x the performance of Xenons in what was by far its strongest suit, let alone its weakest one. If code was only vectors then you might be dissapointed but it isn't and no one has done a good job at comparing integer performance between the two.

    64bit FP ALU's vs 128bit, so really its 2.3 times the thorughtput.
     
  11. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    7,798
    Likes Received:
    1,080
    Location:
    Guess...
    Where do you get 100w from? The entire Trinity APU comes in at around that so there's no way that the delta between 8 jaguar cores and 2 Piledriver modules is 100w.
     
  12. itsmydamnation

    Veteran Regular

    Joined:
    Apr 29, 2007
    Messages:
    1,311
    Likes Received:
    411
    Location:
    Australia
    two piledriver modules have a total of 4 128bit FMA' units. 8 jaguar cores have 8 128bit ADD's and 8 128bit MULS. pilediver would have double the clock but unless all your code is FMA FP thoughtput would likely be lower. That is my guess as to why both sony and MS picked jaguar over PD/SR.
     
  13. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    7,798
    Likes Received:
    1,080
    Location:
    Guess...
    In SIMD capability yes but that's obviously not a particularly key measure in overall CPU performance - especially considering the HSA nature of the consoles. IPC is 15% higher like AMD themselves say.

    And I don't think you can combine the IPC and SIMD improvements in that way. The reality is that a doubling of width in the SIMD units isn't going to directly relate to twice the SIMD performance anyway since other elements of the core which haven't doubled will also factor in on the real world output.
     
  14. fehu

    Veteran Regular

    Joined:
    Nov 15, 2006
    Messages:
    1,711
    Likes Received:
    698
    Location:
    Somewhere over the ocean
    I was thinking about the clock and turbo too
    To have a system with predictably performance you can't let it adjust the frequency on its own based on thermal room, but maybe they can put an api to let the developer run one or more cores at max frequency only in a particular section of the code
    And btw if at least one core is dedicated to os, during game sections it must stay at 800MHz or even lower power state, leaving more room to increase other core frequency
     
  15. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    7,798
    Likes Received:
    1,080
    Location:
    Guess...
    At 3.8Ghz (A10-5800K speed) a pair of piledriver modules have a peak of 121.6 GFLOPS compared to the 102.4 GFLOPs of the 8 Jaguar cores in the consoles running at 1.6Ghz.
     
  16. itsmydamnation

    Veteran Regular

    Joined:
    Apr 29, 2007
    Messages:
    1,311
    Likes Received:
    411
    Location:
    Australia
    So what needs to double that hasn't, given that in hotchips they said they doubled everything they needed too in order to double throughtput. Also unless games are very FPU heavy then Xenons and Cell would have bene useless, so im going to guess they are FP heavy :lol: . Why would 15% IPC improvement only apply to int code but not FPU?

    yes which it pretty much what i said, but go look at benchmarks that show FMA vs just AVX/SSE4 nothing is seeing a performance doubling.
     
  17. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    43,463
    Likes Received:
    15,903
    Location:
    Under my bridge
    Games use whatever resources you have available. If the system architecture is FP strong, you use FP-heavy code. If it's branchy and memory strong, you use branchy and memory hungry code.
     
  18. ninelven

    Veteran

    Joined:
    Dec 27, 2002
    Messages:
    1,712
    Likes Received:
    131
    The consoles are working with a finite power and heat budget. Every additional watt a Piledriver CPU would have used would have been one not available for the GPU or bandwidth.

    Jaguar supports SSE 4.2 and AVX, so it is about as feature complete as any X64 chip available today.
     
  19. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    7,798
    Likes Received:
    1,080
    Location:
    Guess...
    Fair enough, lets say it can sustain double the peak SIMD throughput for the sake of argument.

    Cell and Xenon where fairly useless. Okay, that's not true, they were decent SIMD performers but unless you don't pay much attention to developers posts on these boards you will know that is far from the main or only driver of CPU performance. You simply cannot just look at how many flops a CPU can push out and conclude its overall power from that. In fact that would be a fairly ludicrous thing to do. If that were the case Haswell would be jumping off the blocks with twice the general performance of Ivybridge regardless of the benchmark. Which we know is obviously not the case.

    The 15% IPC will come from tweaks to the overall pipeline, changes in the cache implementation and of course, from the doubling in width of the SIMD units themselves. To say you first double the SIMD units and then add an extra 15% for improved IPC is basically counting the same thing twice.

    Probably because those benchmarks aren't using FMA instructions. In consoles however if that's what's available then that's what developers will use where possible.

    Yes, in a console, Jaguar will have near twice the real world SIMD throughput of Bobcat, but that's not the only way to measure CPU performance and the fact is, in the benchmarks originally posted, the doubled SIMD performance wouldn't have made much difference in the overall comparison. You wouldn't have been seeing double Bobcat performance, you'd have been seeing 15% (at best) more performance.
     
  20. itsmydamnation

    Veteran Regular

    Joined:
    Apr 29, 2007
    Messages:
    1,311
    Likes Received:
    411
    Location:
    Australia
    no it isn't, again watch the hotchips presentation, IPC comes from more agressive front end ( aditional L2 predictor/fetch, more agressive core prefetch/predictor) . Improved scheduling and a big improvement in OOO Load and Store capabilities, they even go as far as giving overall IPC improvement for each area.



    No im talking about benchmarks which compare bulldozer AVX vs FMA compiled code. of corse in a SR console devs would take every chance to write code that could be FMA'd but really how offen is that going to be fessable.


    Yes it will and Dev's will try to use every last drop of it.

    And yet games developed for a jaguar console wont be "normal" applications.


    edit: they other tihng that was said is jaguar would get quite a bit more then 15% IPC increase on single threaded workloads as a single core can use all L2 and the L2 predictor/perfetcher would in effect be dedicated.
     
    #20 itsmydamnation, Feb 2, 2013
    Last edited by a moderator: Feb 2, 2013
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...