AMD: Zen 2 (Ryzen/Threadripper 3000?, Epyc 8000?) Speculation, Rumours and Discussion

Discussion in 'PC Industry' started by ToTTenTranz, Oct 8, 2018.

Tags:
  1. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,489
    Likes Received:
    400
    Location:
    Varna, Bulgaria
    Intel introduced AVX clock throttling with high-core count Haswell-E Xeons.
     
    Lightman and Rootax like this.
  2. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    There's a few places where it's not clear if they simplified the arrows, or there's something to be read into the diagram for the integer execution engine. The Load/Store block in particular has arrows that go to the retire queue, the forwarding mux, and register file.
    Comparing the Zen and Zen 2 diagrams shows a change from paired arrows going into the register file and forwarding mux to a single arrow. The arrows from the integer units and load store blocks to the retire queue now show that one arrow from the integer block is sharing an entry point from the originally independent load/store path.
    It could be to reduce visual clutter, or possibly a streamlining choice when it came to having a broader amount of hardware paths versus the likelihood that the congestion of more paths may not match the likelihood that they would all be used.
    The number of uops that can be dispatched to the renamer hasn't changed, so the wider later stages may make that a more clear bottleneck than before.

    The TAGE predictor is a level-two predictor, meaning it is accessed after the initial prediction by the perceptron. Perhaps Zen3 has a similar arrangement, or the later addition with Zen2 meant it was easier to fit the larger TAGE one level further out from the inner prediction loop, since power was the supposed reason for keeping the perceptron as the initial predictor.

    The number of ports and dispatch width hasn't changed with the FPU, so I don't think it does.

    I think this is the case, or at least I've not seen a strong enough distinction in terms of features or design behavior to make this appear any different from other cycles of integration and separation that happen over time.

    I think the cited mechanism is that the DVFS system uses activity monitors and built-in estimates for the power cost of instructions to determine what voltage and clock steps should be used, rather than a coarse change in clocking regime based on what category of instruction the decoder encounters.
    This may help in certain cases where instructions that might be considered wide by the front end have internally lower costs for whatever reason. One possible area is using very wide AVX instructions to boost the performance of memory copies and clears, where a naive throttling of the core that makes sense for heavy ALU work hurts the memory optimization. However, I think more recent Intel cores have gotten better at subdividing AVX categories so that fewer optimizations are treated like very wide ALU ops.
     
    Gubbi, hoom, Lightman and 1 other person like this.
  3. hoom

    Veteran

    Joined:
    Sep 23, 2003
    Messages:
    2,947
    Likes Received:
    495
    So could be they simplified/downgraded some bits that haven't been bottlenecked to help make space for the extra bits?

    Yeah, that and its probably the sort of thing they'd mention explicitly if it was double-rate.
     
  4. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,528
    Likes Received:
    862
    Intel has two clocking penalties, one for AVX-256 ALU instructions and one for AVX-512 ALU instructions. AVX-256 memory moves runs at full speed (no transition to slower clock), but AVX-512 memory moves incur the AVX-256 frequency penalty.

    If you try to optimize memory functions (memcpy/strcpy) with AVX-512 moves, you might very well end up with lower performance overall; Your memory moves will be faster, but everything else runs 10-15% slower.

    Intel has hysteresis built into the frequency transitions. Before powering up the full width of the AVX-256/512 ALUs, the instructions are run through the narrower execution units (microcode ! ) until several thousand instructions have been executed within a set interval before powering up the full width ALUs and lowering frequency. There is also a delay after use, before frequency returns to normal.

    AMD's mode of operation seems less heavy handed.

    Cheers
     
    Lightman, AlBran and entity279 like this.
  5. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,489
    Likes Received:
    400
    Location:
    Varna, Bulgaria
  6. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,496
    Likes Received:
    910
    Was it always possible to select the IF clock speed?
     
  7. Malo

    Malo Yak Mechanicum
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    7,029
    Likes Received:
    3,101
    Location:
    Pennsylvania
    I don't believe so. It was never an option on my X370 but I don't have the high end OC board. Don't remember seeing it mentioned on OC threads ever.
     
    Alexko likes this.
  8. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,489
    Likes Received:
    400
    Location:
    Varna, Bulgaria
    IF in Zen1 was fixed at 1/2 DRAM transfer rate.
    So, looks like only X500 series mobos will be able to set arbitrary IF divider. Legacy boards probably lack the dedicated clock generator for that. Dunno.
     
    Lightman and Alexko like this.
  9. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,496
    Likes Received:
    910
    That opens up some really, really interesting benchmarking options. Inter-CCX latency was said to be responsible for some performance pain points, so I'm eager to see how performance scales with IF clocks on those applications.
     
    Lightman likes this.
  10. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,489
    Likes Received:
    400
    Location:
    Varna, Bulgaria
    Async clock domains always incur latency penalty during transition. Overclockers will probably try to keep synced IF and DRAM clocks as far as possible, for latency sensitive benchmarks.
    Kind of reminds me of the good old i875P chipset for P4, that had special "short path" mode when FSB and DRAM were operating at the same clocks.
     
    Lightman and digitalwanderer like this.
  11. digitalwanderer

    digitalwanderer Dangerously Mirthful
    Legend

    Joined:
    Feb 19, 2002
    Messages:
    17,267
    Likes Received:
    1,783
    Location:
    Winfield, IN USA
    Pentium 4! :lol2::lol2::lol2:

    Ah thanks, I needed that. Hadn't thought of that CPU in a while, it was the reason I switched to AMD. :)
     
    Lightman likes this.
  12. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,489
    Likes Received:
    400
    Location:
    Varna, Bulgaria
    Lightman and Alexko like this.
  13. xEx

    xEx
    Regular Newcomer

    Joined:
    Feb 2, 2012
    Messages:
    939
    Likes Received:
    398
  14. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,489
    Likes Received:
    400
    Location:
    Varna, Bulgaria
    Memory latency and write speed are quite terrible. Hopefully it will perform much better on X570 with final BIOS.
     
    Lightman and Alexko like this.
  15. Malo

    Malo Yak Mechanicum
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    7,029
    Likes Received:
    3,101
    Location:
    Pennsylvania
    Well we don't know the latency cost of having the separate I/O die.
     
  16. Rootax

    Veteran Newcomer

    Joined:
    Jan 2, 2006
    Messages:
    1,173
    Likes Received:
    576
    Location:
    France
    If this is legit (like, not due to a bug, early bios, or something like that), it seems that the ipc gains or whatever are "wasted" by the memory performances ? I'll wait for more reviews of course. But it's a bad first impression...
     
  17. xEx

    xEx
    Regular Newcomer

    Joined:
    Feb 2, 2012
    Messages:
    939
    Likes Received:
    398
    Yes the memory performance was surprising but may be a BIOS bug, they say that they cannot OC it which tells the BIOS is not a 100% working yet.Although the gaming test, capped it seems, are not bad for that price. its a preview and will need to see more.
     
    Lightman likes this.
  18. hoom

    Veteran

    Joined:
    Sep 23, 2003
    Messages:
    2,947
    Likes Received:
    495
    Is anyone really surprised that moving the memory controllers off-die makes for a big bump in memory latency? :neutral:

    What I'm seeing is a 6-core & 100Mhz slower (both base & turbo) bottom of the new line-up chip hanging with & in a bunch of tests healthily beating the previous 8-core top model both in single-thread and multi-thread.

    I think if this is legit then Intel is in a lot of trouble from the higher end :cool2:

    It'd be interesting to see what this core could do with an onboard memory controller though.
     
  19. tunafish

    Regular

    Joined:
    Aug 19, 2011
    Messages:
    542
    Likes Received:
    171
    I suspect we'll see that once the APUs arrive next year. If they are monolithic, I think they might beat the high-end ryzens in most gaming loads.
     
    hoom likes this.
  20. hoom

    Veteran

    Joined:
    Sep 23, 2003
    Messages:
    2,947
    Likes Received:
    495
    Yeah I was thinking the same.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...