New AMD low power X86 core, enter the Jaguar

Discussion in 'PC Industry' started by liolio, Aug 28, 2012.

Tags:
  1. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    Obviously die cost would go up in this case.

    Power may not necessarily be lower because although your dynamic power may be equal (or even better due to a lower voltage), the static (leakage) power may will increase and it may end up as a tradeoff in peak power scenarios. Idle power will surely increase, so this coupled with increased die costs will dictate smaller being better.
     
  2. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    Die size? Kabini is a bit smaller than an IVB dual-core, although the latter isn't an SOC.
    It's between the Apple A5 and A6 in size.
     
  3. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,976
    Likes Received:
    5,213
    After a long search .. It seems Core2 and Nehalem can both decode 4 instructions , 3 simple and 1 complex fused micro-ops.

    In SandyBridge, Intel claims 5 instructions can be decoded (probably possible by the new u-ops cache) . 3 simple 1 fused and another 1 macro fused .My source here is Realworldtech's article , although it states they are 5 instructions , the diagrams show only 4!
    http://www.realworldtech.com/sandy-bridge/4/

    In Haswell it's basically the same as , 5 instructions , 3 simple and 2 fused.

    In Bulldozer it is as you said .
     
  4. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    Well that "4" wasn't entirely correct. Even Core2 could theoretically, thanks to macro-op fusion, decode 5 instructions per clock (4 "normal" and 1 fused). And since those 5 decoded instructions would only be 4 uops the rest of the chip can handle them easily as well (4 uops can go into ROB per clock, and be retired as well). I guess since Sandy Bridge (when the uops can come from uop cache) those 4 uops could theoretically represent more than 5 x86 instructions, but I don't think it would actually be possible to execute them at the same time (because macro-op fusion is mostly compare+jump and core2 to ivy bridge cannot execute more than one such instruction per clock). With Haswell it could possibly work (as it should be able to handle two branches per clock) which while the throughput would still be 4 uops per clock those could possibly represent 6 x86 instructions.
    But anyway this is a highly theoretical value. The idea behind a cpu design is to increase real-world IPC, any idiot can build a 8-wide inorder architecture (ok not quite idiot-proof with x86 due to complex decoding) with a theoretical IPC of 8 and achieving 0.1 in practice just burning power on all your unused parts of the chip. intel did lots of things to increase real-world IPC since Core2 while not really making the design wider (with the exception of haswell, while still restricted to 4 uops per cycle there's now 8 execution ports instead of 6).
     
  5. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,976
    Likes Received:
    5,213
    Got it , Thanks ..
     
  6. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    Somebody actually did it already -- http://en.wikipedia.org/wiki/MP6

    :lol:
     
  7. Blazkowicz

    Legend

    Joined:
    Dec 24, 2004
    Messages:
    5,607
    Likes Received:
    256
    :shock: I have the netbook based on an 1GHz mp6 :razz:, turned into a SoC with 1GHz CPU, 256K full speed L2, 2D graphics and sound. it's a 1.2 watt x86 SoC, the laptop is called Gecko Edubook and dates from 2009, but it has to be repaired (if possible)

    I thought it was kind of a 486. Not strictly equal to the mp6 surely, for one thing the Rise mp6 boasts about MMX but the derived SoCs at best have compatibilty at extremely reduced performance.
     
  8. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,541
    Likes Received:
    964
    Isn't the GPU power-gated in Kabini? I know that even power-gated units still leak a bit, that is the gate itself leaks some power, and the bigger the unit, the bigger the gate; the bigger the gate, the bigger the gate leakage. But is it significant enough to matter?

    After all, Apple seems to be pretty happy with that kind of trade-off (obviously on different designs).
     
  9. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    Don't forget the clock of the 3.9W Temash part is already _very_ low (apple easily exceeding that frequency though I know frequency alone doesn't tell you much), I doubt you could gain anything at all by decreasing it further and using more units instead. Even the other 8/9W parts all don't exceed 300Mhz (ok one does with turbo) which is still very low for an architecture which apparently is designed to reach 1Ghz (ok maybe on a slightly different process?).
    So more CUs would probably only start being slightly helpful at 15W and certainly at 25W.
     
  10. Accord1999

    Newcomer

    Joined:
    Jun 21, 2003
    Messages:
    133
    Likes Received:
    6
    It's a piece of cake for IVB, I have a desktop Pentium 2020 at 2.9 GHz that uses <18W with both cores running Prime 95 torture testing and ~15.5W with Cinebench 11.5.
     
  11. itsmydamnation

    Veteran

    Joined:
    Apr 29, 2007
    Messages:
    1,349
    Likes Received:
    470
    Location:
    Australia
    and what about all your I/O etc? you using the on chip GPU?
     
  12. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    I agree that 2 CUs was the correct choice for Temash tablet APU. My critique was targeted towards Kabini APU.

    The Kabini notebook APU is just an (2x) overclocked Temash (no extra cores, and no extra CUs). If they had spend more resources to create/validate a separate SOC configuration (with 4 CUs) for Kabini (instead of just upping the clocks), they could have created an APU with both (slightly) lower TDP and (slightly) higher GPU performance. The manufacturing cost would have of course been slightly higher as well (+2 CUs require a small amount of extra die space).
     
  13. Blazkowicz

    Legend

    Joined:
    Dec 24, 2004
    Messages:
    5,607
    Likes Received:
    256
    You would then need dual channel memory to increase that GPU performance. That'd be starting to be another class of system.
    If you want a low watt notebook with a faster GPU then you should probably look for an underclocked Richland.
     
    #233 Blazkowicz, May 25, 2013
    Last edited by a moderator: May 25, 2013
  14. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    Ok for Kabini only it probably would make sense. I guess though AMD didn't feel like manufacturing separate dies (or just always going with a bigger die), the gains might not have been worth that.

    Not really. We're not talking about doubling GPU performance, just something like 4 CUs at 350Mhz instead of 2 CUs at 500Mhz. Also, you could theoretically go to slightly higher clock ddr3l-1866 if more CUs at lower clocks manage to save you some power.
    I wonder though if 4 CUs at low clocks would be really faster. Not quite sure if that wouldn't shift bottlenecks in the gpu elsewhere significantly by lowering clocks and adding more CUs (i.e. that one quad-rop block now looks a bit underspecced, same for setup which can only do 1/4 prim/clock) which might need more significant rearchitecting to make this worthwile.
     
  15. codedivine

    Regular

    Joined:
    Jan 22, 2009
    Messages:
    271
    Likes Received:
    0
  16. RedVi

    Regular

    Joined:
    Sep 12, 2010
    Messages:
    407
    Likes Received:
    59
    Location:
    Australia
    I get the feeling from both Richland and Kabini products that they didn't want to go for too high-end expensive and potentially more power hungry memory. Richland Mobile was meant to launch with a DDR3 1833MHz capable part, but they scrapped it and upped the GPU clock instead.
     
  17. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    Thanks for the comparison. That's 80% IPC of Stars (K10).

    Some clock normalized scores of Trinity compared to K10 (http://www.tomshardware.com/reviews/a10-5800k-a8-5600k-a6-5400k,3224-14.html):
    7-Zip 77%, Sandra Dhrystone 81%, Whetstone 97%. It's disheartening to notice that K10 is now 6 years old, and it is still the AMD chip with the best IPC.

    It would be great if someone had the time and effort to test K10 + Bulldozer + Piledriver + Jaguar at identical clocks (2.0 GHz). That would give us a more clear view how AMDs IPC has progressed (/regressed) in the last years.
     
  18. itsmydamnation

    Veteran

    Joined:
    Apr 29, 2007
    Messages:
    1,349
    Likes Received:
    470
    Location:
    Australia
    I can do deneb, but my piledriver box runs ESXi so it would be run on a guest. But when i tested when i first got it i was within 1%-2% of not VM'd machines.
     
  19. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    At the same time, it's really impressive how much performance they retained in Jaguar while keeping it small. Anandtech says each core is only 3.1 mm2 (excluding L2). Techreport shows it approaching the IPC of a single-channel ULV i3 (though I suspect that platform was gimped in some other way).
     
  20. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,493
    Likes Received:
    474
    Where did you read that?
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...