Haswell vs Kaveri

Discussion in 'Architecture and Products' started by AnarchX, Feb 8, 2012.

  1. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    The mesa code (not just gpu driver in kernel) has also arrived. The changes may be big but to me it looks like the architectural changes are certainly much smaller than gen3->gen4 (gen3->gen4 is really everything changed completely, like radeon r5xx->r6xx). gen8 might be a major overhaul of the architecture but it still seems to resemble gen7.
     
  2. Paran

    Regular

    Joined:
    Sep 15, 2011
    Messages:
    251
    Likes Received:
    14
    R5000 to R6000 was a minor update from VLIW5 to VLIW4. Even Gen6>Gen7 brought much bigger changes.

    This
    and this
    and this

    doesn't sound like something minor you want to imply. And by the way wrong thread. We don't have a Broadwell thread yet I know.
     
  3. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,541
    Likes Received:
    964
    I believe mczak was referring to R500 to R600, i.e. the introduction of unified shaders, VLIW5, the ring bus, etc.
     
  4. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    Just like Alexko said, when I said r5xx->r6xx I meant it (if you're talking technical things, you really don't care about marketing names, which sometimes don't align with architecture generations at all).


    I am _not_ implying these changes are minor. Just that they aren't quite as big as gen3->gen4 was. Because if you look at these two, you'd have trouble figuring out those two archs are somehow related at all. I don't dispute that the changes may be bigger than anything else since gen4.
     
  5. homerdog

    homerdog donator of the year
    Legend Subscriber

    Joined:
    Jul 25, 2008
    Messages:
    6,294
    Likes Received:
    1,075
    Location:
    still camping with a mauler
    So Haswell -> Broadwell will be a bigger change than Sandy Bridge -> Ivy Bridge?
     
  6. Paran

    Regular

    Joined:
    Sep 15, 2011
    Messages:
    251
    Likes Received:
    14
    With gen3->gen4 you mean GMA3000>GMA4000? Can you explain more why you think so? You can judge about all Gen8 changes and improvements from these incomplete mesa code?


    At least he claims it.
     
  7. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    Gen3 was what was in i915, i945 chipset. Usually called gma900, gma950, but also things like gma3150 and IIRC gma3000 indeed (but NOT gma X3000, what did I just say about marketing names...). In other words, a dx9 capable architecture, with no vertex shader units at all. Gen4 was i965 chipset, whose original name was GMA X3000, but there's other chipsets sailing under gen4 (usually called gen4x as they are not 100% identical though it was mostly bug fixes), g35/g45 come to mind.
    No. But you can see from that code that it is still somewhat similar to gen7. Try looking at gen3 code and find similarities there...
     
  8. homerdog

    homerdog donator of the year
    Legend Subscriber

    Joined:
    Jul 25, 2008
    Messages:
    6,294
    Likes Received:
    1,075
    Location:
    still camping with a mauler
    Man, Intel has made so much progress on the graphics front in the last 5 years. The GMA9XX was the most terrible, awful thing you can imagine. It was barely serviceable even for light office use.
     
  9. Raqia

    Regular

    Joined:
    Oct 31, 2003
    Messages:
    508
    Likes Received:
    18
    It is a major change from today's paradigm, but if it offers comparable latency and much better power characteristics, density, and bandwidth, HMC sounds very very desirable. It's been a while since memory has jumped in performance as much CPUs or GPUs have over the past two decades.

    We're seeing adoption of this already: Power 8's architecture is specifically designed for this new wave even though it won't initially use HMC, and each of Power 8's off die Centaur memory controllers has the added benefit of acting like an L4 as well. Instead of the 8 controllers for the big iron Power 8, one or two similar controllers would probably suffice for a consumer part since stacked ram density should be high enough. You could also imagine the interface eventually being entirely on one package or on top of the APU die with a fixed amount of stacked dram, either acting as the main memory or if density is insufficient, another tier of memory between the APU and the DRAM (see the RSX die for PS3 Slim).

    I think the cost of having all the ram on package should be less than the cost of adding a pair of Dimms even if it is based on HMC technology. I guess Intel's Iris Pro's lot prices don't attest to this quite yet but judging by the price of the Macbook Air, they're just ratcheting up prices for people who don't make sized major orders. I agree that such technology won't make it to market if it doesn't have compelling cost to benefit in today's business environment.
     
    #849 Raqia, Nov 5, 2013
    Last edited by a moderator: Nov 5, 2013
  10. Does anyone know if it's expected get details about Kaveri during AMD's developer summit?
     
  11. Paran

    Regular

    Joined:
    Sep 15, 2011
    Messages:
    251
    Likes Received:
    14
  12. Raqia

    Regular

    Joined:
    Oct 31, 2003
    Messages:
    508
    Likes Received:
    18
    Perhaps this is a cut back version of Kaveri with only 512 shaders. I'll make the wild guess that they originally planned something with 768 shaders on a different socket or BGA with GDDR5, but cut down on execution risk by using their existing socket. The limited memory bandwidth would have meant that 768 shaders wouldn't have been properly fed so they went with 512 instead.
     
    #852 Raqia, Nov 12, 2013
    Last edited by a moderator: Nov 12, 2013
  13. Triskaine

    Newcomer

    Joined:
    Mar 28, 2010
    Messages:
    59
    Likes Received:
    57
  14. kalelovil

    Regular

    Joined:
    Sep 8, 2011
    Messages:
    568
    Likes Received:
    104
    Without a TDP being given we cannot determine that.

    If AMD has decided that the target market for such a chip primarily wants 45W APUs at the expense of some performance then the lowered CPU and GPU clocks make a lot more sense.
     
  15. Andrew Lauritzen

    Andrew Lauritzen Moderator
    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,629
    Likes Received:
    1,227
    Location:
    British Columbia, Canada
    Only ~100gflops for the CPU? No AVX2/FMA? I guess it's only 2 "modules" of FP compute, but that's still well lower than a dual core Haswell.

    If you add the theoretical peak of a 4770R you get something like:
    CPU: 32 flops/cycle * 4 cores * 3.9Ghz = 499 flops/s
    GPU: 16 flops/eu/cycle * 40 eus * 1.3Ghz = 832 flops/s
    Total: 1331 flops/s
    Now even at 65W it probably can't maintain those clock speeds with everything powered up, but we're talking theoretical here to start with.

    And of course raw flops isn't everything (or even much) but the 100 gflops for the CPU is surprising to me. Makes it a bit more obvious why they are so driven to offload stuff to the GPU. 7:1 ratio is a bit more serious than 2:1 :)

    Overall this seems well south of the next generation consoles too. I expected something a bit closer as a flagship to be honest, even taking into account the bigger cores.
     
  16. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,541
    Likes Received:
    964
    Well there are two FMA-capable 128-bit FPUs per module, so four in total. 4×4 FMAs = 16 FMAs = 32 FLOP/cycle.

    32×3.5GHz (reasonable clock speed assumption) = 112 GFLOPS.
     
  17. rSkip

    Newcomer

    Joined:
    Jan 10, 2012
    Messages:
    18
    Likes Received:
    35
    Location:
    Shanghai
    FINAL_Lisa_Opening_Keynote_Draft_-_v12.1tb.pdf
    [​IMG]
     
  18. Andrew Lauritzen

    Andrew Lauritzen Moderator
    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,629
    Likes Received:
    1,227
    Location:
    British Columbia, Canada
    Did I miss the link to the full slide deck? Or are all you guys just at the conference? :) If the latter, anything else interesting yet?
     
  19. I thought each module only had a single FPU unit ever since Bulldozer.

    1TFLOPs shouldn't be hard to attain with the desktop chip. A ~18% overclock would reach that and the latest unlocked APUs tend to be easily pushed higher than that.
    Performance advantage over Haswell should be maintained because of better foothold on drivers and developer relations, but this thing will end up competing with Broadwell most of its lifetime.

    Still no news about how they're going to handle memory bandwidth, which makes us all think they'll just go with dual-channel DDR3 and call it a day.

    Unless Kaveri miraculously lowers power consumption relatively to its predecessor, it won't fit notebooks/ultrabooks, and with 2 CPU modules it doesn't seem to fit desktops that well either.
    Either AMD has plans to convince game developers to make heavy use of the iGPU for GPGPU tasks (TressFX on the iGPU?) or I see Kaveri as a chip with no place where it really fits.
    People who don't play games will settle for less and people who play games will need more. Kaveri ended up coming at the same time as the next-gen consoles so recommended spec requirements for games in the near future will soar.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...