Vulkan/OpenGL Next Generation Initiative: unified API for mobile and non-mobile devices.

Discussion in 'Rendering Technology and APIs' started by ToTTenTranz, Aug 11, 2014.

  1. Infinisearch

    Veteran Regular

    Joined:
    Jul 22, 2004
    Messages:
    739
    Likes Received:
    139
    Location:
    USA
    Wow thats pretty big news.
     
    milk likes this.
  2. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    856
    Likes Received:
    260
    SYCL sounds very nice.
     
  3. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    #283 sebbbi, May 18, 2017
    Last edited: May 18, 2017
  4. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    13,167
    Likes Received:
    3,570
    Seems like this news puts Direct3d in a really tough spot. Personally, I'd be happy if Mac supported Vulkan and we could just have every "pc" game show up on Windows, Mac, Linux.
     
    Cat Merc and CaptainGinger like this.
  5. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    Vulkan is Windows + Linux + Android. Consoles have their own APIs. iOS and Mac have Metal.
     
  6. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    13,167
    Likes Received:
    3,570
    Yah, I'm a mac user at home and wish they'd jump on Vulkan. I know they won't, but having all three desktop environments in line would be great for gaming.
     
  7. Alessio1989

    Regular Newcomer

    Joined:
    Jun 6, 2015
    Messages:
    579
    Likes Received:
    283
    I am not fully convinced: would make more sense having a single universal SPIR-V high level interface for me (like an intermediate API, maybe with new rebranded shading language) which can be used in both OpenCl and Vulkan.
    Merging all the rest of the APIs looks like a big no-no for me. Remember also OpenCL is not meant to run on GPUs only.
    Of course I may be wrong, are there any more detailed papers or slides about this?
    edit: is SYCL really the answer? They adding abstraction and overhead above a low-overhead API?
     
    #287 Alessio1989, May 18, 2017
    Last edited: May 18, 2017
  8. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    Agreed. Also OpenCL and Vulkan have different memory models. I just hope that we get a better C++ style shading language to Vulkan with generics and other modern features. Like Metal and CUDA. HLSL and GLSL are designed for DX9 hardware, to describe small (<64 instruction) pixel/vertex shader programs. We need a better language for compute shaders.
     
    BRiT likes this.
  9. Rodéric

    Rodéric a.k.a. Ingenu
    Moderator Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,979
    Likes Received:
    844
    Location:
    Planet Earth.
    I'll disagree here, HLSL & GLSL are C derivatives, although they lack pointers [which reduces their versatility] they by no means are preventing good programming practice.
    You have been forever able to forward declare struct & functions, you can append then to generate your programs and don't have to rely on that horrible pre-processor [that really should be forbidden], it encourages good design. (Well it would w/o that horror that a string pre processor is)
    But I agree a C with generics (and pointers) would be much more powerful and close to my ideal language (be it for GPU or CPU).

    I'm interested in the scan and reduce functions, the pipelines and the ability for the GPU to drive itself to some extent.
    I can't wait to have CPU & GPU living their own lives with some async communication channel. (MPI like conceptually)
     
  10. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    There are lots of stupid limitations. Let me list some...

    - I can't pass references to groupshared memory arrays as function pointers. The only way to access groupshared memory array is by hardcoding it's name (to the global declaration). Thus it becomes impossible to write generic reduction functions scan/sum/avg/etc (even for a hardcoded type).
    - I can't reuse the same groupshared memory region for multiple different typed arrays. If I want to first process int4 data in groupshared mem and then float3 data in groushared mem I need to declare both arrays separately -> much bigger groupshared mem allocation.
    - There's no generic types. For example: I want to do a sorting function, I need to copy & paste the whole implementation for each different type.
    - There's no lambdas/closures. For example: I have a very well optimized (complex) ray tracer loop, but I want to customize the inner loop sampling function. A call to lambda/closure parameter would be a perfect solution, but HLSL doesn't support it. The lambda would obviously be inlined, so the resulting shader would be identical to copy & paste.

    I don't need classes or inheritance or silly things like that. C with generics + lambdas is fine for me. I only need features that make it possible to write reusable basic function libraries for compute shaders. Currently I have lots of macro hacks and copy & paste. I want to get rid of this bullshit.
     
  11. Infinisearch

    Veteran Regular

    Joined:
    Jul 22, 2004
    Messages:
    739
    Likes Received:
    139
    Location:
    USA
    Lightman and pharma like this.
  12. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,710
    Likes Received:
    2,440
    Interesting comparison between AMD and NV in Vulkan on Linux, the site used a 1060 vs 580, using 3 different CPUs, ranging from a Celeron G, Pentium G, and an i7 7700K. They tested 5 Vulkan games (Dota 2, Talos Principle, Mad Max, Serious Sam 3, and Dawn Of War 3). They also tested the OpenGL path as well.

    https://www.phoronix.com/scan.php?page=article&item=kblcpus-gl-vlk&num=2

    Generally speaking NVIDIA was much faster in both the OGL path and Vulkan, it's lead in Vulkan extended anywhere from 20% to 57% depending on the title. Some of the Vulkan games were 10~15% slower on both AMD and NVIDIA than OpenGL though. Others were faster with Vulkan.

    NVIDIA's OpenGL and Vulkan CPU overhead was smaller, and the GTX 1060 was able to extract more performance out of the Celeron CPU in both Vulkan and OpenGL.
     
    tuna and pharma like this.
  13. homerdog

    homerdog donator of the year
    Legend Veteran Subscriber

    Joined:
    Jul 25, 2008
    Messages:
    6,124
    Likes Received:
    901
    Location:
    still camping with a mauler
    I just ran the 3D Mark API Overhead feature test (V2.0) on my i7-3770K (4GHz all cores) + GTX970 (OCed a bit). I'm still on Win7 so no DX12 but I found the Vulkan result interesting.

    DirectX 11 single-thread -- 2 464 400 Draw calls per second
    DirectX 11 multi-thread --- 3 057 329 Draw calls per second
    Vulkan -------------------------- 17 128 458 Draw calls per second

    I've always heard that NVIDIA cards (especially pre-Pascal) don't gain much from DX12/Vulkan but I guess it does make a bit of difference. 3 million vs 17 million lmao.
     
    Lightman likes this.
  14. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    856
    Likes Received:
    260
    That has nothing to do with hardware, that's the CPU overhead of the run-time.
     
    no-X likes this.
  15. willardjuice

    willardjuice super willyjuice
    Moderator Veteran Alpha Subscriber

    Joined:
    May 14, 2005
    Messages:
    1,372
    Likes Received:
    239
    Location:
    NY
    There are two reasons why DX12/Vulkan have been "disappointing" ("don't gain much") thus far:

    1) While you can create micro-benchmarks showcasing the cost of "high" API overhead, in real life this essentially never happens. In fact the vast majority of the time you are bottlenecked by the gpu. And in the cases where you are bottlenecked by the cpu, it's never because of API overhead (usually physics, game logic, etc.). So while the API overhead reductions that DX12/Vulkan have made are very real, we just haven't found a practical use case yet to really utilize that potential. I do blame MS/IHVS/etc. for making this seem like a big issue that DX12/Vulkan could solve. It's not even a small issue at this point in time imo.

    2) Most DX12/Vulkan paths in engines at the moment are just "ports" from their DX11 path. Meaning that they are essentially trying to "emulate" what existing DX11 drivers do (this is an oversimplification but for the moment trust my hand waving). Well guess what? Nvidia/AMD/Intel are better at writing "drivers" than game developers. They've been in the "driver business" FAR longer and have MANY more resources to throw at the problem. They also know their hardware better than game developers. ;-) Many developers will admit in private that without async compute, their DX12/Vulkan paths would always be slower than DX11 (i.e. in completely gpu bound situations, their DX11 path would be faster than their DX12 path). This is where your "(especially pre-Pascal)" comment comes into play. Async compute can only help if you have idle units to spare. If Kepler was already near "full capacity" with DX11, async compute is not going to help much. In fact if your "async compute implementation" has a big enough overhead, it can actually hurt. For async compute to pay off, you need to utilize enough idle units to overcome the overhead that async compute can introduce. In practice, this didn't seem to happen very often for Kepler. AMD has gotten a lot of credit for their async compute support (and to be clear, they've done a great job with it on both the hardware and software side), but really what we are saying is their DX11 driver (for whatever reason) left a lot of units idle (comparatively to the competition). Thus they had the most to gain. The takeaway here is if your current DX11 path is already utilizing the gpu well, your (naive) DX12 path will be slower even with async compute.

    The "overall takeaway" is while Vulkan and DX12 have a lot of potential, at the moment we (as a community) are not in the position to make use of them (at least in a revolutionary way). The reality is developers still need to support DX11 and at the moment it's difficult to formulate an abstraction layer that will support both "DX12 style rendering" and DX11 style rendering" efficiently. It'll be a bit before we can fire on all cylinders, but we'll get there! :grin:
     
  16. homerdog

    homerdog donator of the year
    Legend Veteran Subscriber

    Joined:
    Jul 25, 2008
    Messages:
    6,124
    Likes Received:
    901
    Location:
    still camping with a mauler
    But the GPU + driver is not irrelevant. It's more of a software issue but the hardware does matter to some extent (for example the DX12/Vulkan way of issuing commands would not work on G80).

    On another note, DX11 games are rarely limited by draw calls since there are many ways to reduce them but at the end of the day they still take a significant amount of CPU time. In DX12/Vukan, those tricks can still be used and the amount of CPU time spent issuing commands to the GPU should be minimal right?
     
  17. MDolenc

    Regular

    Joined:
    May 26, 2002
    Messages:
    690
    Likes Received:
    425
    Location:
    Slovenia
    Well... You need to know the circumstances in which the numbers above from 3D API Overhead feature test are achieved. That is pretty much just draw calls without much changing of state in between. At the end of day even in DX11 most of the CPU cost is from building and changing graphics state and not directly from issuing draw calls and that's not something that DX12/Vulkan made significantly faster.
     
  18. homerdog

    homerdog donator of the year
    Legend Veteran Subscriber

    Joined:
    Jul 25, 2008
    Messages:
    6,124
    Likes Received:
    901
    Location:
    still camping with a mauler
    Pardon my ignorance, but is that true for GCN cards as well?
     
  19. MDolenc

    Regular

    Joined:
    May 26, 2002
    Messages:
    690
    Likes Received:
    425
    Location:
    Slovenia
    Of course. I'm not sure why would it be different? You still need to compile DXIL shaders to something thats actually executable by hardware for example. Constructing PSOs is one of the most expensive operations in D3D12. So much so that there is now ID3D12PipelineLibrary object which allows developers to efficiently cache them and store them to disk.
     
  20. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    856
    Likes Received:
    260
    For this kind of test it pretty much is [irrelevant] for DX12/Vulkan. Everything used for drawing are monolithique precompiled object which are on the GPU already, so even if the DX12 runtime would have to check dirty state (which it doesn't) it would have to check about two to three orders of magnitude less CPU-side state than DX11. Also, the [optimal] paradigm to manage dynamic data is so vastly different that you automatically probably have a factor two or three speedup passing vertex/index data to the GPU on every call - depends what they did (which is possible to verify with renderdoc BTW) eg. MAP_DISCARD.

    Of course it would [be faster] - it would either approach the command-processor bottleneck, possibly the vertex-fetch limit, or the ROP fillrate limit, instead of the DX11 run-time limit.
    You might think the speedup is not so much because you are probably thinking of driving a G80 with a 4GHz Skylake. You have to think of an appropriate CPU of that time, and then (kind of like inflation-equalization) the relative factor between DX12-style and DX11-style submission would likely be about the same as today with Pascal or Vega driven by that Skylake. Remember you measured a factor 6 , that'd be about the delta between CPU side draw-cost and GPU side draw-cost. Or maybe not, you still don't know if you hit GPU limit, or if it's still the i7-3770K not being able to scale further up. You could try to find 970s with even higher rate in a public database.

    Most DX11 games and engines are purely CPU-side draw-call limited, that's why they get away with so much brute-force post-processing instead, which only use a few draw-calls to saturate the GPU. You're mistaking cause and effect: you don't see much calls in the games because they are slow in the first place and the games are distorted to hit a given speed before release, assets and source code equally. This makes porting to new paradigms very hard, because you have to scrap 50% of the render-loop workaround hacks on the low-level and have to replace them by decent strategies on the high-level.

    You can use the tricks, but it's pointless to render a 60 FPS game at 120 FPS, you prefer to run the 60 FPS with twice the content.
     
    homerdog likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...