AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Discussion in 'Architecture and Products' started by ToTTenTranz, Sep 20, 2016.

  1. ieldra

    Newcomer

    Joined:
    Feb 27, 2016
    Messages:
    149
    Likes Received:
    116
    Anarchist thinks that Volta MPS is a clear indication of the failure of Pascal (and prior) GPU uArch because it is clearly emulating ACEs which establishes GCN's absolute superiority in the market. I know it sounds like a joke, but I'm very serious.[/QUOTE]
     
  2. Picao84

    Veteran Regular

    Joined:
    Feb 15, 2010
    Messages:
    1,553
    Likes Received:
    698
    For the same reason that Fermi GF100 needed a GF110/GF100b revision? This story with the Infinity Fabric is eerily similar to the interconnect problems Fermi allegedly suffered from according to JHH. I wonder if we might still see a fixed Vega with much better performance (by the way where is Charlie saying Vega is broken and unfixable now? :D He has been awfully quiet of late about GPUs, made for nice drama :( ) I believe more in a reborn Vega RV670 style (but still quite a big chip) than in miraculous drivers.
     
    #3962 Picao84, Aug 31, 2017
    Last edited: Aug 31, 2017
    Lightman likes this.
  3. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,798
    Likes Received:
    2,056
    Location:
    Germany
    To be fair, Intels drivers for a long time did not expose that under DX12 either. Strangely though, it was available under DX11.3 as „
    Pixel Shader Precision 16/32-bit" and „Other Stage Precision 16/32-bit“. (found log files for at least .4352 & .4404)

    Recent drivers (i don't remember which version exactly) made it available in DX12.

    Quite the contrary, every watt or fraction thereof saved in tightly constrained mobile chips frees up some leeway for higher clocks somewhere else on the chip.
     
    #3963 CarstenS, Aug 31, 2017
    Last edited: Aug 31, 2017
    BRiT and entity279 like this.
  4. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,798
    Likes Received:
    2,056
    Location:
    Germany
    While GF100b also enabled full-rate FP16, yes it's eerie. Especially when you take into account, that Fermi was Nvidias first try at fully distributed geometry and Vega is AMDs first chip where geometry can be shared through the shader engines (and i am not talking about that very slim line indicating load balancing in former quad-engine Radeons).
     
    Picao84 likes this.
  5. giannhs

    Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    37
    Likes Received:
    40
    are you sure about that? cause i saw few reddit posts with people running benches along fiji to match the instict based solution and it seems that its "disabled" on dx but on vulkan it shows it clearly enabled
     
  6. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,798
    Likes Received:
    2,056
    Location:
    Germany
    I am sure about the DirectX part, yes. That's the game developers' API of choice and what the discussion was about. Good to know that it's enabled on Vulkan. Hope there's some massive uptake there.

    Seems like you're linking a private reddit? I cannot see it.
    r/realAMD
    /r/AMD is full of anti AMD shills and shit posters, this is the place for real AMD enthusiasts.“
     
    BRiT likes this.
  7. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    It's definitely a hindrance for asynchronous behavior and efficiently accelerating synchronization. Nothing new there and rather obvious with Volta adding the hardware capability for "performance critical" parts of the MPS server now. Interrupting the CPU for every warp dispatch when presented with asynchronous behavior isn't exactly ideal with the latency involved. Same issue with high priority compute tasks. Not exactly a secret Nvidia had been dragging their feet in regards to low level APIs.
     
  8. giannhs

    Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    37
    Likes Received:
    40
    [​IMG]
     
    Grall, T1beriu and CarstenS like this.
  9. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,798
    Likes Received:
    2,056
    Location:
    Germany
    Thank you, I'll take a look at Sandra soon. :)
     
  10. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    There are a few reasons why PC games do not already support fp16. The first reason is that Vega was just launched. It is the first discrete GPU with 2x rate fp16. Most recent Intel GPUs also have 2x rate fp16. There was simply no reason bring fp16 shader code to PC (adds testing cost).

    The second reason is that PS4 Pro was launched last Christmas. It takes time until developers add fp16 support for their shaders. Some AAA devs, such as DICE have already started doing this, but it takes time to modify large AAA shader code bases to support fp16 on PS4 Pro. Vega is actually very good for PS4 Pro fp16 adaptation, since fp16 optimizations now benefit both PS4 Pro and AMDs latest PC GPUs. You can now also test your fp16 code on PC (Vega GPU on development workstation + PS4 Pro devkit), potentially improving your iteration time. I am sure we will see fp16 more in the future (both PS4 and PC). The fact that Intel GPUs also benefits from fp16 code is another bonus. Everybody knows that Nvidia will eventually follow the suit, as they have 2x rate fp16 already on their mobile GPUs and their professional GPUs (*), so putting developer effort to fp16 code will benefit all PC GPUs in the future. Now is the right time to start spending effort to it.

    Packed math is managed by compiler. Compiler handles it similarly than old vec4/VLIW architectures. You don't need to manually write packed (vec2) code. You simply use new half float types (min16float in HLSL) instead of the existing float types. Compiler packs two of them automatically to each 32 bit register. Compiler obviously needs to be clever to pack them in a way that allows most efficient usage of 2x rate packed math instructions. But this vec2 packing is a simpler problem than vec4 (of vec4+1) packing of previous generation GPUs. GPU compiler programmers already have experience about stuff like this.

    (*) Nvidia Volta has new Tensor cores for machine learning. Nvidia doesn't need to keep 2x rate fp16 anymore as a professional feature for machine learning. Tensor cores are better for this task. I would guess that future consumer GPUs simply lack tensor cores (or have them disabled).
     
    Kej, Cat Merc, Grall and 6 others like this.
  11. monstercameron

    Newcomer

    Joined:
    Jan 9, 2013
    Messages:
    127
    Likes Received:
    101
    Intel may have a majority of the market but that doesn't mean gen9 has a majority within that majority.
    How would you define significant?
     
  12. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    Intel Gen9 supports 2x rate fp16. However AAA game devs mostly concentrate on discrete GPUs, since most gamers playing AAA games have a discrete GPU. And non-AAA devs mostly don't care about low level optimizations such as fp16, since only part of the GPUs and OSs support it. Windows XP and Vista do not support fp16 types in HLSL (if min16float is used on these OS, the game crashes). For these devs, broad hardware support is more important than getting some extra performance out of the latest GPUs.

    Intel's 2x rate fp16 support is mostly designed for mobile workloads. OpenGL ES uses fp16 by default (you need to specially define highp if you need more precision). Intel tried to get a foothold of the mobile market with their CPUs and GPUs, but failed. Soon these features will also be useful in modern PC games. I would also assume that Windows 8 and Windows 10 desktop rendering uses fp16 heavily, as it has been optimized for ultraportables and tablets. fp16 desktop composition saves power, and if Windows 10 uses it, it will also be a good thing for Vega.
     
    #3972 sebbbi, Aug 31, 2017
    Last edited: Aug 31, 2017
    Heinrich4, tinokun and monstercameron like this.
  13. DavidC

    Regular

    Joined:
    Sep 26, 2006
    Messages:
    347
    Likes Received:
    24
    They are capable of running them at least on low settings if not better. FP16 would not only make it better but save power in mobile.

    And there's no evidence Ryzen mobile doing it significantly better. They would need HBM2 integrated doing so and such memory is in extremely costly devices(Knights Landing/Tesla/Vega). In that regard Iris parts are just as capable.

    Skylake has been in the market since late 2015. I would think the volume is quite significant. While in terms of total PCs in the world that number may be a fraction, you are still talking in numbers likely close to 100 million.

    I was only replying on the context that integrated AMD parts would help to proliferate the support. It would be way lower in terms of volume.

    Thanks for the explanation.
     
  14. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    I'm not suggesting it will be better as much as more reasonably priced. Iris Pro, like most of Intel's lineup with only four cores, is relatively expensive. Limiting the share of the market that is capable. AMD may have far larger APUs providing performance as well.

    If AMD produces mid range APUs it could definitely proliferate support as it would displace much of the discrete volume. The question is how high they go. Overtaking 580/1060 may not be unreasonable depending on the designs that show up. A system substituting HBM2 for system memory would have a lot of bandwidth and not be that much more expensive.
     
  15. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    From AMD's slides and the ISA doc's diagrams of the memory system, it's not clear if the Infinity Fabric is near any of the disabled features. The GPU's cache system appears to be its own domain, with the fabric between the L2 and memory controllers. That simplified arrangement, and the fact that the fabric is based on a mature protocol, doesn't seem to leave much room for it to cause problems. As a hardware fabric, it should be mostly invisible to software.

    Koduri stated that Vega's fabric is optimized for servers, but I'm not sure what would be limiting it other than perhaps some additional overhead for items like generally unused error correction or expanded addressing. In fact, I'm not sure what "server-optimized" really adds if all the fabric is doing is sitting between memory, GPU, and standard IO.
    There's the flash controller and I think the IO for that, though its impact should be modest.
     
    #3975 3dilettante, Aug 31, 2017
    Last edited: Aug 31, 2017
    Grall likes this.
  16. Picao84

    Veteran Regular

    Joined:
    Feb 15, 2010
    Messages:
    1,553
    Likes Received:
    698
    My comment about the Infinity Fabric was more related to its power consumption, than the disabled features. Altough by association some features could have been disabled because they added to further power consumption (while Infinity is not something you can disable instead). A bit far fetched I know, but Vega suffers from high power consumption just like Fermi did, while introducing some sort of new interconnect, like Fermi did as well.
     
  17. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    It's been some time, so I am not sure which interconnects were cited as being problematic. GPUs have quite a few, with Fermi having at least the intra-SM interconnect, a distribution interconnect for sharing geometry, and the connection to the caches.

    Unless Vega's Infinity Fabric is more invasive than described, it's a mesh with some rather predictable directionality for most of its traffic. Fermi's interconnects had various behaviors and potentially higher degrees of connectivity.
    For GCN, there was always some kind of link between the L2s and their respective memory controllers. It seems like the fabric slots itself as a midpoint between the L2 and controller (HBCC?) block, with some level of perpendicular traffic related to the relatively modest needs of the miscellaneous sections of the GPU. I presume that's more overhead than the prior bespoke connections, but it seems like it's not uprooting the really complex parts.
     
    Picao84 likes this.
  18. Alessio1989

    Regular Newcomer

    Joined:
    Jun 6, 2015
    Messages:
    582
    Likes Received:
    285
    Aida64 showed (https://pbs.twimg.com/media/DH2ebZBXkAIEXs6.jpg:large), as any other application can with a few lines of code, that FP16 support under Direct3D - and not OpenCL - is disabled. cl_khr_fp16 is still supported under OpenCL with AMD GCN3 ISA GPUs (which includes Polaris).
     
    Lightman and pharma like this.
  19. giannhs

    Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    37
    Likes Received:
    40
    but ati was using fp16 back in the days on games wasnt she?
     
  20. jra101

    Joined:
    Apr 6, 2016
    Messages:
    2
    Likes Received:
    3
    Vertex shaders default to highp precision. Fragment shaders don't have a default precision and you must specify highp/mediump/lowp either via the precision statement (precision highp float) or declare each variable in the shader with the required precision (mediump vec4 sum).
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...