AMD: Zen 2 (Ryzen/Threadripper 3000?, Epyc 8000?) Speculation, Rumours and Discussion

Discussion in 'PC Industry' started by Deleted member 13524, Oct 8, 2018.

Tags:
  1. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,833
    Likes Received:
    18,633
    Location:
    The North
    Agreed.

    Though for consoles, this is an interesting take. AMD; Instead of having specialized hardware like tensor cores to support tensor flow, they're beefing up their compute units to support a variety of tasks. There could be a variety of reasons not to support tensor flow, unfortunately I wouldn't know. But if this is the direction that AMD wants to take to tackle deep learning, we could be getting some insight into what to expect for our next generation consoles.
     
    chris1515 and beyondtest like this.
  2. beyondtest

    Newcomer

    Joined:
    Jun 3, 2018
    Messages:
    58
    Likes Received:
    13
    Does this likely mean that the next GPU will be about 20-25% better for the same watts?
     
  3. Tkumpathenurpahl

    Tkumpathenurpahl Oil Monsieur Geezer
    Veteran

    Joined:
    Apr 3, 2016
    Messages:
    1,910
    Likes Received:
    1,929
    Do you mean that in the sense that RTRT performance, in general, even on the RTX2080, is terrible?
     
  4. Not necessarily because Vega 20 has other power-consuming features compared to Vega 10.

    Though I suggest you follow those questions in this thread.


    Yes.
    All examples we've seen from nvidia themselves put the hybrid raster+RT demos running at 1080p 60FPS on the RTX 2080 Ti.
    RTX 2080 and RTX 2070 will perform below that.
     
    beyondtest and BRiT like this.
  5. beyondtest

    Newcomer

    Joined:
    Jun 3, 2018
    Messages:
    58
    Likes Received:
    13
    Oh is that what's going on? If so I guess they're doing what I mentioned a few pages back

    https://forum.beyond3d.com/threads/...chnical-spin-2018.60604/page-169#post-2048236

    Would it be possible for say, AMD to have a new type of shader cores that have improved ML design in each of them (but individually weak) instead of Nvidia's 3 innard design of having Rasters, Tensors and RT Cores?

    I hope that's what it means because I'd be kinda happy a bit since I don't know much about these things.

    Thanks.
     
  6. Tkumpathenurpahl

    Tkumpathenurpahl Oil Monsieur Geezer
    Veteran

    Joined:
    Apr 3, 2016
    Messages:
    1,910
    Likes Received:
    1,929
    Fair enough, but if the hardware's capable of RTRT in even the relatively limited capacity of the RTX2070, then at least it's in developers' hands for a generation.

    If the M160's any indication of AMD's answer to RTX, we could be looking at a fairly versatile approach, letting developers slide quite easily around on the hybrid rendering gradient.

    Want to use all 14.8TF for rasterisation? Go for it. Want to use 7 for rasterisation and the rest for RT? Go for it.

    Hopefully that's the case, anyway.
     
  7. beyondtest

    Newcomer

    Joined:
    Jun 3, 2018
    Messages:
    58
    Likes Received:
    13
    Wanna use some of that for more ML/AI tasks? Go for it.

    Pricing is still probably the major factor.

    I guess it's still back to the question where a general compute bunch matches the three class system of Nvidia.
     
  8. Tkumpathenurpahl

    Tkumpathenurpahl Oil Monsieur Geezer
    Veteran

    Joined:
    Apr 3, 2016
    Messages:
    1,910
    Likes Received:
    1,929
    I'm not sure if pricing is inevitably an issue. Probably to begin with, but once 7nm matures, a ~330mm2 die shouldn't be too expensive.

    As for who's approach is better, time will tell, but I'd put money on the traditional split of performance for Nvidia and price for AMD.
     
    vipa899 likes this.
  9. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,833
    Likes Received:
    18,633
    Location:
    The North
    It's their answer to a deep learning yet flexible performance GPU. Not to be confused with Ray Tracing.
     
    vipa899 likes this.
  10. hoom

    Veteran

    Joined:
    Sep 23, 2003
    Messages:
    3,262
    Likes Received:
    813
    Found a pic of the 2* chiplet Consumer version
    [​IMG]
    :lol2:

    Semi-seriously though, if packages are gonna be that huge, at some point doesn't it make sense to bring back CPU slots instead of giant sockets?
     
  11. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    That could be a limit on placement, as some elements like IO and to a lesser extent where the DDR lines go to leave the package seem to have some similarity within some margin of error. That margin gets larger where the IO die is, which concentrates PHY for DRAM and IO nearer the middle.
    If the concern is how direct this is from the pins for power, there's a fair amount of distribution and metal used for Zen's voltage regulation that muddles things. Slide 19 from https://www.slideshare.net/AMD/isscc-2018-zeppelin-an-soc-for-multichip-architectures seems to show some power pins align with the CPUs, but a very large chunk of it would now be under the IO die, and another large chunk is concentrated on one side of the MCM.

    Slides 9 and 10 show a possibly undesirable amount of latency could be involved, if AMD hasn't adjusted more of its fabric's features. Possibly, the data fabric on the CPU chiplets will be simplified by having a number of its clients moved elsewhere. I'm curious if there's potentially some special case handling possible since the coherent CPUs are known to be on one side of an IF package link, and the home agents and routing hardware are on the other.

    I'm not sure if it's necessarily negative, but that seems to be accurate. It's been a very long time since density scaling was linked strongly to performance or power scaling, with 90nm or 65nm being the threshold where a number of vendors got hit by significant problems.
    Density governs how many transistors there can be in a chip, whereas performance in this context is more about the straight-line speed of individual circuits or pipelines. Only a subset of the overall set of transistors can take part in any local block or circuit, and in general more is not better due to each contributing some amount of delay or each increasing the length of wires required to connect them.

    There are other physical conditions and architectural choices that can influence how much power, transistors, or wires come into play within a given time window. The faster a desired clock period, the more costly tradeoffs will become.
    Trying to focus on parallelism can utilize the 50% increase in density without travelling as far up the steep upward curve of pushing clock circuit performance, if that option is available.

    It seems probable there's cache or SRAM arrays on the IO die, at least because unless AMD has changed its system topology the home agents that maintain memory ordering and handle coherence would be there (unless AMD changed this for Infinity Fabric 2.0). How fast some of these processes can be, if they are always a link or more in distance and are on a die not optimized for speed may be a question mark (perhaps more for client-oriented products?).
    A local L3 absorbing traffic before it traverses the IF links and contends for the centralized resources also seems worthwhile.
    There's also PCIe, the PSP(s?), encryption hardware, USB and disk controllers, potentially. AMD promised expanded enterprise features, which may belong closer to the memory controllers and IO complexes. One question I have is whether this changes what happens for some of the error handling, where it used to be that errors pertaining to off-chip links would have a nearby CPU and its local memory to handle them. If there's a link problem with a CPU chiplet, is there a resource on the IO die that can step in?
     
    Deleted member 13524 and vipa899 like this.
  12. DieH@rd

    Legend

    Joined:
    Sep 20, 2006
    Messages:
    6,387
    Likes Received:
    2,411
    Can we extrapolate now the general performance difference between a gen8 single jaguar core and Zen2 core?
     
  13. eddieobscurant

    Newcomer

    Joined:
    Dec 5, 2008
    Messages:
    8
    Likes Received:
    0
    Since amd will need a new i/o die for ryzen 3000 with 2 memory channels, wouldn't it be economically wiser to include a gpu in it (making all ryzen 3000, apus) using the same 7nm chiplets, instead of designing both a new i/o die and and a new apu chiplet (either on 7nm or 12/14nm) ?
     
  14. Laurent06

    Veteran

    Joined:
    Dec 14, 2007
    Messages:
    1,091
    Likes Received:
    491
    The full quote is:
    That's quite specific as far as benchmarking goes :) Have any other CPU benchmarks been revealed?
     
    Ethatron and Lightman like this.
  15. jlippo

    Veteran

    Joined:
    Oct 7, 2004
    Messages:
    1,744
    Likes Received:
    1,090
    Location:
    Finland
    This is certainly something I expect. (As a complete novice on CPU architectures.)
    Or CPU/GPU chiplets without IO chip.
     
  16. hoom

    Veteran

    Joined:
    Sep 23, 2003
    Messages:
    3,262
    Likes Received:
    813
    So what does this imply for consumer level chips?
    1 chiplet & a smaller IO chip? 2 chiplets & same IO chip? APUs only?
     
  17. tunafish

    Regular

    Joined:
    Aug 19, 2011
    Messages:
    627
    Likes Received:
    414
    DRAM latency would become worse than current, because of both an extra hop to the IO die and having to traverse L4 tags before sending request. However, increase of L3 size, and having L4 replying from SRAM would reduce latencies so long as you hit the caches. I think it would be a win for overall latency, but not nearly as big one as the size of the caches make it seem to be.

    There is no reasonable way to put enough pins on a slot/card system to connect to DRAM. Should memory get integrated on package, as some sort of future HBM derivative, a slotted CPU would become feasible.
     
    hoom likes this.
  18. entity279

    Veteran Subscriber

    Joined:
    May 12, 2008
    Messages:
    1,332
    Likes Received:
    500
    Location:
    Romania
    Maybe they would just leave the Ryzen topology unchanged, just pluging in the newer cores, newer IO, newer fabric?
    Likely takes more effort, but is it substantially more ?
     
  19. tunafish

    Regular

    Joined:
    Aug 19, 2011
    Messages:
    627
    Likes Received:
    414
    That would imply 3 dies on 7nm.

    I personally strongly feel that a large part driving the shift to chiplets is that AMD wants to minimize the amount of different dies manufactured at the top-end fabs. I think that consumer Ryzens will be built using those same 7nm chiplets, just with different companion dies.

    Threadripper will use the same IO die as EPYCs, allowing harvesting for faults that would not make the usable in any EPYC product (mostly, memory channel faults).

    Ryzen APUs will use a separate GPU/memory controller/IO die, with a single infinity fabric link connecting to the cpu, 2 DRAM channels. This same IO die can do double duty as a low-end discrete gpu -- infinity fabric links can be reconfigured as PCI-E.

    Ryzen desktop will either use the APU die (if it has a spare infinity fabric link), or it's own with just the DRAM controllers and other io, with 2 chiplets.
     
    iMacmatician likes this.
  20. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,661
    Likes Received:
    1,114
    Are there any specifics on infinity fabric link bandwidth ?

    Ryzen dies had 3 IF links each 2x32 bits wide (running at 4x DRAM command speed). Given the topology of EPYC 2 each chiplet only needs one IF link, so I'd expect it to be at least twice as wide as a single link. I'd also expect the operating frequency to be decoupled from DRAM command rate (because that was never really a good idea). Ie. a 2x64 lane link running at 2GHZ (8GT/s rate) it would have 64GB/s bandwidth in each direction.

    Cheers
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...