AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Discussion in 'Architecture and Products' started by ToTTenTranz, Sep 20, 2016.

  1. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,247
    Likes Received:
    3,447
    They DID. They even released a video showing the settings and the fps at the same time.


    I remember Raja mentioning Ultra settings for SE4 as well.
     
    #1801 DavidGraham, Jun 1, 2017
    Last edited: Jun 1, 2017
  2. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    25 vs 12 TFLOPs against Titan Xp is a bit more than 100%, so it seems that way looking at compute, but TFLOPs isn't everything. Just need to see what other changes they made to boost performance. That 12/25 TFLOPs may not be counting everything if they added flexible scalars or changed the standard FMAs. Then consider the possibility Vega10 could be the mid-range product with a dual Infinity design the high end like with Ryzen. Raja said it was "possible", but they hadn't mentioned it.

    Polaris vs Pascal and Vega vs Volta may just be a coincidence, "poor volta", releases within a matter of months, potentially similar deep learning capabilities, page migration, die size, etc.

    Can't say I've seen that feature showing up in very many games. Just because something "can" be done doesn't mean it should be. It's the more reasonable page size that is significant. Nvidia mentioned access counters for a reason.

    It's more an argument of AMD allegedly pitting two (Polaris and Vega) architectures against Pascal. Seems a bit odd to be constrained on R&D and release two different architectures with different IP together. It seems obvious they skipped a Polaris high end to focus on the next generation architecture covering enthusiast to low-end and APU.
     
  3. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,247
    Likes Received:
    3,447
    What? I can't really tell if you are being serious or not? But you do reallize that 25 number is FP16, FP32 is just 12.5, which barely scraps past the Titan Xp. And even then AMD always had an advantage in pure tflops that never translates into a gaming advantage.
    You do realize that comparison doesn't work in your favor, right? Polaris was never a match for Pascal, how can you state Vega is a match for Volta?! Are we allowing the discussion to mutate into Astrology and oracle like prophecies now?
     
    #1803 DavidGraham, Jun 2, 2017
    Last edited: Jun 2, 2017
    xpea, Razor1 and A1xLLcqAgt0qc2RyMz0y like this.
  4. ieldra

    Newcomer

    Joined:
    Feb 27, 2016
    Messages:
    149
    Likes Received:
    116
    Plus since when did we start estimating performance of products based on alliteration lol, this is ridiculous.

    Now we're looking at packed FP16 throughput and comparing it to FP32 throughput on GP102 and claiming it's a wash... Vega and Volta both begin with the same letter, they must be a match
     
  5. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    Although they did happily show 4K and FPS with Sniper Elite 4 at the Financial Day; was it a single GPU? - it looked to have possibly some kind of higher setting when watching the shadows but as someone mentioned no way to tell specifically what those settings were.
    That demo was running equal to above 60 fps on the 20 seconds segment they did (running along the path-climbing-stabbing an enemy), and even much earlier event they demo'd Star Wars Battlefront 2 at 4K and 60fps (albeit in a more simple map area).
    Not sure if I like the look of Prey or not tbh, just at times feels like it does not present the greatest visual fidelity compared to some other games.
    Cheers
     
    #1805 CSI PC, Jun 2, 2017
    Last edited: Jun 2, 2017
  6. lanek

    Veteran

    Joined:
    Mar 7, 2012
    Messages:
    2,469
    Likes Received:
    315
    Location:
    Switzerland
    I will really dont say that Vega is a match on Volta ( even if in some specific area, it could give it a run for this money, but not for gaming ), Navi and Vega 20 are going to be there for Volta.

    The thing is Vega have take some delay ( due to HBM2, maybe, i dont know ), and after Polaris who was just intended ( in the version released ), to go on low end to "mid"end gear, this have put AMD in a desert on high end.. Maybe they was a big version of Polaris, maybe not, maybe vega was initially the high end of Polaris maybe not...

    We dont know, if they was a big Polaris, or not... we just see the resulting product, not the initial plan.
     
    #1806 lanek, Jun 2, 2017
    Last edited: Jun 2, 2017
  7. xpea

    Regular Newcomer

    Joined:
    Jun 4, 2013
    Messages:
    399
    Likes Received:
    416
    Sorry, people keep repeating that, but I don't see it that way. We can't speak yet in consumer pace for but HPC/hyperscale, Vega is annihilated by Tensor cores for the most important and hyped workflow of the decade. Vega lacks any communication interface between chips (aka NVLink) for better scaling. Vega lacks industry support like HGX format promoted by the biggest OEMs and Microsoft. Of course, because it uses NVlink and SMX2 connector, Vega will never be compatible and I don't even talk about rack computing density where Vega is miles behind. Finally, CUDA ecosystem still has no serious answer from AMD...
    Don't get me wrong, Vega will attract some customers too, especially small ones looking for better value for money (does AMD have a choice?), but it will be nowhere close to the success and, most important, close to the revenue generated by Volta.
     
    pharma and Razor1 like this.
  8. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,380
    AFAIK, this feature is only exposed in CUDA. I assume this is not something you can merge into the existing DirectX driver (and that Vega wouldn't be able to make use of a similar feature without a game making explicit use of it.)

    The counters are mentioned because it helps make to a good decision about which page to swap out to main DRAM.
     
    Razor1 and pharma like this.
  9. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,900
    Likes Received:
    2,226
    Location:
    Germany
    Yes and no. With the Doom-Demo at 2016 RTG Summit you could open the settings dialogue and check for yourself. So, while technically maybe they did not tell "the press" (I cannot confirm to whom each and every AMD employee talked), it qualifies in my very much as "settings and fps".
     
  10. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,854
    Likes Received:
    2,776
    Location:
    Finland
    You make helluva lot of assumptions there, which I'm pretty sure at least some are false assumptions. Only thing true for certain is lack of NVLink, which is proprietary NVIDIA bus, and SXM 2.0 which is AFAIK just as exclusive format by NVIDIA, not an industry standard.
     
  11. xpea

    Regular Newcomer

    Joined:
    Jun 4, 2013
    Messages:
    399
    Likes Received:
    416
    no assumptions but facts that you can easily check (google is your friend).
    Regarding HGX becoming a standard among the hyperscalers:
    https://www.forbes.com/sites/patric...andard-for-aiml-cloud-computing/#7ad5c9fb4d59
    https://www.hpcwire.com/off-the-wire/nvidia-partners-manufacturers-advance-ai-cloud-computing/
    http://www.eetimes.com/document.asp?doc_id=1331798&page_number=1
     
    pharma likes this.
  12. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    How so? The only difference seems to be that Tensor throws a lot more silicon and ALUs at the problem. Strip all instructions not related to FMA from a SIMD and you have Tensor. That's a distinct possibility with a flexible scalar if AMD went that route. Put a pair of 32 bit FMA units capable of packed math in each SIMD lane along with L0 cache and suddenly AMD has 4 Tensor'ish cores per CU with the ability to bond dice with Infinity. So >1000mm2 of silicon per GPU before considering a traditional "dual" part. All in a standard consumer part that can work for graphics and it works with the quoted ops/clock AMD has listed. Wouldn't be too different from Zen's FPU scaled out to a SIMD with SMT if they went that route.

    Besides the big companies seriously into deep learning all made their own custom hardware.

    Infinity Fabric and MCM Threadripper/Naples as a backbone doesn't exist? Even better it doesn't require IBMs Power line of CPUs, so x86 works. That's 8 GPUs per server with direct access to 8 memory channels and potentially better density and perf/watt.

    Covered above, but AMDs 3 petaflop racks seem respectable enough. Plus they aren't locked into deep learning which covers most of the HPC market.

    Guess I didn't realize CUDA was that relevant to HPC. With less than 20% of supercomputers even using a GPU after all. CUDA for CPU acceleration over C/C++, Fortran, and most other languages constituting the vast majority of applications then?

    The HBCC demos would suggest it works on DirectX transparently. Plus the ability to directly interface with x86. It seems like Vega clusters sit on Infinity like Zen clusters with direct access. Only change is less coherent cache, but a CU should be able to read memory pointers with HBCC transparently caching data like LLC on a CPU.
     
  13. ieldra

    Newcomer

    Joined:
    Feb 27, 2016
    Messages:
    149
    Likes Received:
    116
    The point you appear to be missing is that Tensor cores are distinct from the rest of the ALU/FPU units and can run independently .
     
    pharma likes this.
  14. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,247
    Likes Received:
    3,447
    They told everyone, they released a video on the official AMD YouTube channel, hosting an AMD marketing employee talking about the Doom 4K demo, and showing both the Ultra settings and the fps counter.
     
  15. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,854
    Likes Received:
    2,776
    Location:
    Finland
    You seriously think that one motherboard / rack design from one manufacturer will become industry standard over night? There are several OCP compatible designs out there, including AMD's and AMD already has Vega designs with major server companies.
    You don't know how fast Vega is in tensor tasks, you assume it gets absolutely "annihilated".
    Vega does have Infinity Fabric to communicate between GPUs and/or CPUs
    AMD's ROCm support all the major frameworks for machine learning, they support variety of languages etc
    Vega can do 400 TFLOPS in 4U, how many miles ahead is NVIDIA again?

    Has NVIDIA actually confirmed this? At least earlier it's been either FP64 units or FP32 units, not both at the same time, so is there a reason to believe it's now suddenly FP32+Tensor or FP64+Tensor instead of FP32 or FP64 or Tensor?
     
  16. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    I wouldn't say existing capability, but the redesign would in theory have four operands to read and broadcast. Writes absorbed by an accumulator and L0 registers.

    I was thinking more along the lines of two or three 64 wide waves per SIMD per cycle, possibly with the cadence. Similar to the Zen FPU with SMT and a temporal scalar per SIMD running low frequency(integer, SFU, etc) instructions along with the traditional scalar work. In theory the RF could shrink as the L0 would absorb some registers as they wouldn't need written. That would require some compiler work.

    Accumulation should bypass the write back. It may also be possible to interleave matrices. Four Tensors in four clocks, but I need to study that more.

    I didn't mean to imply that there was, but Naples in theory already does that with Infinity so the capability could be there. Even with PCIe the capabilities should be there along with 64 or 128 lanes to connect all the GPUs. So Naples isn't required, it just happens to be really good for IO as an interconnect with that mesh. AMD leans on the CPUs fabric as opposed to NVLink directly connecting processors.

    One aspect of a larger picture. There is more than just deep learning and GPU work in HPC. The vast majority of that market is CPU clusters with the corresponding code backing it up.
     
  17. lanek

    Veteran

    Joined:
    Mar 7, 2012
    Messages:
    2,469
    Likes Received:
    315
    Location:
    Switzerland
    .......................
     
    #1817 lanek, Jun 4, 2017
    Last edited: Jun 4, 2017
  18. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY
    If AMD hasn't talked about tensor tasks, I doubt they will be any good at them. Prior to Volta, if AMD had great tensor performance on Vega, they would have talked about it cause the only other product that is very good at those tasks is google tensor.
     
    #1818 Razor1, Jun 5, 2017
    Last edited: Jun 5, 2017
    CSI PC, DavidGraham, Lightman and 4 others like this.
  19. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,854
    Likes Received:
    2,776
    Location:
    Finland
    AMD hasn't talked about tensor tasks a lot, but they have said they support them. Can't find the quote now but I think it was in the Financial Analyst Day broadcast
     
  20. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,293
    Location:
    Helsinki, Finland
    Vega should be close to Pascal P100 (which is currently selling at 5000$+) in tensor math. It has double rate fp16 and quad rate int8, similar to Pascal.

    Volta is obviously faster, but it not available yet and going to cost even more (rumors say around 10000$). If AMD is slower inference they certainly have room to price their product lower than Volta.

    source: https://arstechnica.com/gadgets/2017/05/nvidia-tesla-v100-gpu-details/
     
    Alexko, ieldra, BRiT and 3 others like this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...