Haswell vs Kaveri

Discussion in 'Architecture and Products' started by AnarchX, Feb 8, 2012.

  1. swaaye

    swaaye Entirely Suboptimal Legend

    HSA probably has no future if Intel doesn't support it.

    The Bulldozer architecture is certainly a strange bit of nonsense. It seems like they should have overhauled Stars instead of pumping more resources into BD. It probably just shows how thinly stretched AMD R&D is and so they just keep putting band aids on.
     
  2. 3dilettante

    3dilettante Legend Alpha

    Shared units can be done for hardware the design barely cares about.

    When dealing with something where performance matters, the architect who designed the concept wanted it for certain specific things:
    1) Enabling a tighter critical loop for a fireball-type architecture -- this has completely lost with the physical realities of today, although this was clear much earlier than BD
    2) Enabling newer, interesting modes of execution such as speculative threading and other memory tricks. He flat out said CMT didn't make sense in the absence of doing things in a more interesting manner, and Bulldozer does nothing interesting.
    Problem is, nobody is doing any of these interesting things on an ongoing basis, or if they are they aren't doing CMT.
    If the interesting things that the designer of CMT said were needed to justify it are either impractical or don't need CMT at all, then CMT is pointless.

    The transistors themselves don't add cost. Kaveri isn't bigger than Trinity, and it's on a bulk process.
    It likely trends towards being mildly cheaper to make.

    Hmm, an amped high-clocking Jaguar.
    It probably needs some extra pipe stages. With the current 15-cycle mispredict latency, we could maybe squeeze in a few more stages, sure there's a performance hit, but maybe boosting the length to something like 18-19 cycles wouldn't be too bad.
    So its integer resources would be about 2 wide, with about two instructions per core per cycle and about 32KB of L1 Icache per core.
    Shared, long latency L2.
    It might be hard getting the L1 to clock as high, especially if we want to avoid losing performance with its very low associativity. Maybe a 4-way 16 KB L1 data cache.
    The higher end workloads might enjoy having a more flexible load/store situation than one dedicated store and one load pipe.
    Roughly 8 FLOPs per cycle per core.

    Does that sound about right?

    Orbis would not be a good desktop chip. Kaveri might not be that compelling, but it doesn't need a competely custom platform and custom code that doesn't do a fraction of the things of a desktop to not fall on its face.

    Those still had four cores.
     
  3. Alexko

    Alexko Veteran Subscriber

    There's this:
    [​IMG]

    And that:
    [​IMG]
    Source: http://www.extremetech.com/computin...-wait-for-the-first-true-heterogeneous-chip/5
     
  4. swaaye

    swaaye Entirely Suboptimal Legend

    So if you are big on charts in LibreOffice Calc and decoding what must be gigantic JPEGs, then Kaveri is your chip. Oh boy.
     
  5. Alexko

    Alexko Veteran Subscriber

    You have to start somewhere. For the launch of a mid-range consumer chip produced by a company with what must be less than 20% market share these days, that's better initial support than I anticipated.

    Plus, Kaveri also does quite well in regular OpenCL, non-HSA applications.
     
  6. pjbliverpool

    pjbliverpool B3D Scallywag Legend

    I think it's a foregone conclusion that Intel will support the same functionality, it's simply the way the industry is heading a a whole. Ivybridge was already more "HSA like" than Richland.

    Broadwell probably won't make that level of a leap but I's like to think Skylake will match Kaveri in HSA sysle functionality. It seems to me that without a discrete GPU product line, HSA makes even more sense for Intel than it does for AMD.
     
  7. Gipsel

    Gipsel Veteran

    Either parts of AMD are severely confused about what process they use or 28SHP may be an PD-SOI process. An AMD representative confirmed to at least two different sites that 28SHP is an SOI process, their shop says it too (but admittedly, on products.amd.com it is listed with 32nm, which is likely a C&P error).
     
  8. 3dilettante

    3dilettante Legend Alpha

    Just like how AMD's own feature table said pre-GCN cards had Mantle in direct contradiction to the next marketing tab over, I think we can conclude AMD doesn't have their A team updating the tables.

    AMD's CTO, slides at Kaveri's launch detailing the semi-custom 28nm SHP, and statements to investors say it's bulk.
     
  9. silent_guy

    silent_guy Veteran Subscriber

  10. Alexko

    Alexko Veteran Subscriber

    The talk that will be given at ISSCC to present Steamroller clearly states in its title that it's a bulk process. Plus, GloFo hasn't talked about SOI in a while.

    AMD also mentions very large performance gains in binary tree searches, since you don't have to flatten and copy the whole tree anymore. However, this hasn't been included into any distributed application at this point. Generally speaking, I suppose that any kind of very parallel processing on dynamic data structures can benefit significantly from HSA, which is a big change.
     
  11. Gipsel

    Gipsel Veteran

    About as long as they didn't talk about the SHP process AMD uses (claims to use?).
    I'm aware of the arguments against it. I only brought it up because AMD obviously confirmed to use SOI for Kaveri to some sites (heise.de and golem.de) specifically asking for it.
     
  12. 3dilettante

    3dilettante Legend Alpha

    Which representatives confirmed SOI?
    Was one of them the CTO of AMD?
     
  13. Gipsel

    Gipsel Veteran

    Certainly not. That's why my first alternative to explain the different accounts was that "parts of AMD are severely confused about what process they use". We should know it for sure in 4 weeks at the latest.
     
  14. moozoo

    moozoo Newcomer

  15. 3dilettante

    3dilettante Legend Alpha

    There is pushback from the incumbent compute workhorses at Intel--the mainline cores. Events in recent years seem to indicate that some of those objections may have been at least partly overridden.

    Skylake's revamping of the GPU's memory handling to align it closer to x86 page tables is a significant step, as GCN did the same thing.
     
  16. DSC

    DSC Banned

    http://techreport.com/blog/25930/a-subjective-look-at-the-a8-7600-gaming-performance

    :lol:
     
  17. kalelovil

    kalelovil Regular

    I am somewhat disappointed Techreport did not seem to attempt more troubleshooting (Different memory? Different motherboard? etc.) or consulting AMD or other reviewers before publishing that (perhaps they did, but if so there is no note of it in the article).

    As it is, and if they did not, it does give the effect of (most likely unintentionally) spreading FUD.
     
    Last edited by a moderator: Jan 17, 2014
  18. Albuquerque

    Albuquerque Red-headed step child Veteran

    Yeah, I agree with kalelovil. Those issues sound like a bad DIMM to me, not a processor failure.
     
  19. Blazkowicz

    Blazkowicz Legend

    There's word that the Excavator APU (Carrizo) is on FM2+, with ddr3. That could be inferred btw : year 2014 has ddr4 for servers and in year 2015 availability (price) for the general public so a 2014 APU probably has ddr3 no matter what.

    As for the PCIe bus, "Extend to Discrete GPU", that should mean PCIe 3.0 is extended with support for coherency. Is there a name for that and will any non AMD product support that?, I don't know. But that's a protocol update and the PCIe 16x controller is right in the CPU (with socket FM1/FM2/FM2+, like with socket 1156/1155/1150) so at least I can see how a new motherboard or new socket is not needed.

    Yes it's meaningless without a HSA APU (or conceivably, a low cost CPU that is a HSA compatible APU with the GPU disabled)
    It's a bit weird that Kaveri doesn't support "hUMA extended to discrete GPU", or maybe they can't even test and validate it (or don't want to, or it would be meaningless). Which GPUs are supported?, I don't think we know, it could be Bonaire, Hawaii and up, or only future products (20nm GPUs)

    Possibly, Kaveri could support the feature with a BIOS update after Carrizo and next-gen dedicated GPUs are released.
     
Loading...

Share This Page

Loading...