DX11 vs DX12

Discussion in '3D Hardware, Software & Output Devices' started by iroboto, Jan 15, 2015.

  1. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,556
    Likes Received:
    4,729
    Location:
    Well within 3d
    It would help reduce the ways that things can diverge because at least each frame would be consistent within itself and the cards wouldn't need to interact as much.
    Inter-frame dependencies might require some amount of conversion between the vendors, but at least these sync points and data transfers are already present in some form for AFR.
    More variable settings could be employed because the intermediate data within each frame that does not carry over wouldn't need to mesh between vendors. If one card sucks at high tessellation factors, one could freakishly flip between almost no amplification and Crysis 2 levels of optimization every 16-33 ms (or 16-17-66-10-1-1-100-39-22-111-2-pi ms). If one card can't handle the features needed for a good OIT implementation, we can see smoke clip through everything at random.
    But hey, isn't that still some level of "working"?

    You can say your rig is a SLIXfire machine.
     
  2. ToTTenTranz

    Legend Veteran

    Joined:
    Jul 7, 2008
    Messages:
    12,236
    Likes Received:
    7,192
    Scan-CrossLink-InterFire

    Totally worth it for the name alone.
     
  3. liquidboy

    Regular Newcomer

    Joined:
    Jan 16, 2013
    Messages:
    416
    Likes Received:
    77
    just like now with XB1 where we can have multiple display-planes,

    Why not allow something similar ...

    GPU 1 - Display Plane 1 - Sky + Mountains
    GPU 1 - Display Plane 2 - Land + Trees + Vegetation + Water
    GPU 2 - Display plane 1 - Cities/Towns (Interactive World) + Characters
    GPU 2 - Display plane 2 - HUD + Menu System

    Dedicated Compositor - compositing the outputs

    I know the devil is in the details to how this can be accomplished, BUT that's how my mind currently thinks when trying to develop for dx and xb1/pc at the moment ..
     
  4. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,556
    Likes Received:
    4,729
    Location:
    Well within 3d
    That seems like a variation of the separate physics and graphics contexts. The actual tasks being run don't actually significantly interact, almost like playing a game with characters rendered over streamed background or video playback.
    At that level of decoupling, I don't think you'd need DX12.
    Since we're talking about the XBox One, as of right now we know we don't need DX12 to composite things.

    Depending on how concerned the game is with keeping consistency, separate cards from separate vendors would still experience performance variance. The background rendering GPU would probably be best served being given a lower load that keeps it consistently at a high frame rate, so that something can be matched to whatever the more interactive foreground has if there isn't a capped rate.
    As long as the differences two rendering contexts can be explained by geography and they don't need to mix too well, the overlay effect could be minimized.
     
  5. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,380
    It's heart warming to see that game developers are going to be willing to develop games with load balancing code that takes into account all variations of GPUs. :wink:

    Seriously: what's in it for them? What market does anyone think this will open up? Your primarily targeting those with an old GPU of brand X who buys a new one of brand Y without getting rid of the former. It's not as if you'll find an audience of GTX980 users who are going to add an R9 290X as well.

    The comparison with an XB1 is broken: it's a fixed system. In the real world, you'd have to find a way to schedule these tasks such that both GPU have roughly the same workload. Otherwise the speed up will be minimal.
     
  6. liquidboy

    Regular Newcomer

    Joined:
    Jan 16, 2013
    Messages:
    416
    Likes Received:
    77
    totally thinking out of the box right now, and really going into gaga land ....

    I equate games, atleast AAA games, as complex as building operating systems.. The engines at the very least are as complicated as the lower layers..

    As we are seeing with Windows 10, parts of the operating system it self, atleast the UI layer, are now decoupled from the OS deploy and are now part of the application store model.

    eg. is the start-menu ... that is now a store-app and it literally installs via the store..

    Imagine if a game and it's engine were like windows 10 and the store/app-model .. where the HUD, like the start-menu, were deployed as a store-app .. It could get updates, evolve independent of the rest of the game etc.

    You could probably even design the engine/game to be componentized too.. Not sure how far you can take this idea with the current state of the windows store model, i'm hoping we'll see a much more open store at \\build\ this year that will allow this ..

    Anyway, hopefully we can create apps/games like this in the near future. Composing multiple windows-app-model pieces from the store ..

    We've been used to this componentized assembly of apps for a while now in the .NET world with things like nuget and mef .. Id like to see us extend this for games!
     
  7. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,556
    Likes Received:
    4,729
    Location:
    Well within 3d
    I don't quite see the benefit of getting a HUD that doesn't match the game its overlaid over. Is this different from getting a mod that changes the HUD?
    is this something that is related to the API in use?
     
  8. liquidboy

    Regular Newcomer

    Joined:
    Jan 16, 2013
    Messages:
    416
    Likes Received:
    77
    Just thinking that it would be great to somehow devorce the HUD / Chrome , possibly even the Menu-System from the game itself and have that independently evolve ..

    And ofcourse the HUD would still be designed to behave as if it were part of the actual game, just that it wouldn't be part of the actual code base as the game code..

    There would still need fast very low latency communication between the 2 independent apps .. But definitely trying to research/demo if this is even feasible..

    I like how the start-menu in Windows 10 is decoupled BUT extends the OS .. I really want games to evolve the same way where we don't have one big monolithic dump of code, but rather componentized libraries that combine together and can evolve independently. And possibly even be executed using different HW local or remote ..
     
  9. Osamar

    Newcomer

    Joined:
    Sep 19, 2006
    Messages:
    210
    Likes Received:
    36
    Location:
    40,00ºN - 00,00ºE
    I find more usefull to have, for example, the integrated graphic card "rendering" windows, a second surfing monitor, etc. and the main card full dedicated to the game.
     
  10. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    918
    Likes Received:
    1,122
    Location:
    55°38′33″ N, 37°28′37″ E
    Yes, automatic load balancing is an issue to be taken seriously.

    IMHO there can be two options, either

    1) a common pipeline where the two D3D devices can be linked together having the lowest common denominator of feature sets and automatic load balancing between separate cards performed by the API, or

    2) two independent D3D devices each with their own capabilities but one shared swap chain with primary/secondary color and depth buffers residing in the local memory of one card, where the burden of load balancing is on the develper.

    I'd think option 1 would be most preferable for Direct3D developers in a multi-vendor configuration, but I would like know the details on how Mantle is handling multi-GPU programming though...

    Oh, I really doubt anyone will care to implement such microsecond-tight scheduling in their applications. It will take much effort to "only" move to several graphics threads in WDDM 2.0 from the current one-thread model, before developers could be bothered with such low-level programming stuff, if they could be bothered at all.

    Right. Most mid-levels PCs already have a multi-GPU setup, with a discrete GPU on a graphics card and an integrated GPU on the CPU die.

    And yet I'm not aware of any application that currently makes any use of this (except for a small number of Lucid Virtu-enabled motherboards from 2011-2012)....
     
  11. liquidboy

    Regular Newcomer

    Joined:
    Jan 16, 2013
    Messages:
    416
    Likes Received:
    77
    In case this interests anyone, MS implemented hybrid rendering in Windows 8.1 .. using a cross-bar architecture allowed rendering between an iGPU and dGPU ..

    And with modern graphics cards that let you spin up multiple swap-chains (display planes) like with XB1 and the api that came with Dx11.2 "GPU overlay support" ... I personally believe programming against 2 dGPU's not that far out of reach for most devs this year, I look forward to it actually assuming the rumors are true
     
  12. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    918
    Likes Received:
    1,122
    Location:
    55°38′33″ N, 37°28′37″ E
    It's not truely a multi-GPU rendering - only one adapter performs the actual rendering, and the driver then simply copies the result to the shared surface, which is then consumed by the other adapter. So it's rather a memory copy hack.

    That's for rendering the 2D parts in full display resolution while using a smaller resolution for game graphics then scaling them to full display resolution. I'd guess HUD and game interface don't really take so much resources to require a separate GPU.
     
  13. liquidboy

    Regular Newcomer

    Joined:
    Jan 16, 2013
    Messages:
    416
    Likes Received:
    77
    True that ...

    What's interesting thou is the XB1 architecture and how it takes this idea further, with multiple-display planes and

    1. mixing a running 'title' game + snapped application where that snapped application itself could be a game..
    2. streaming a title game to a PC whilst letting someone watch XTV on the Xbox/TV
    3. streaming a 'dx surface' to a smartglass device (via the xdk and smartglass xdk)

    There's a section in the leaked xdk titled "Presentation Queue and Display Planes on Xbox One" that explains nicely the CPU + GPU + Presentation Queue all running in-frame each contributing to that frame.. It's an interesting read if only just to see how it does accomplish rendering .. if you don't have a copy of the xdk let me know and ill paste that section in here for you and others to read..
     
  14. Max McMullen

    Newcomer

    Joined:
    Apr 4, 2014
    Messages:
    20
    Likes Received:
    105
    Location:
    Seattle, WA
    From the point of view of building an operating system that supports multiGPU, at the core what more do you want than a set of primitives to synchronize work across GPUs and copy processor output such as used by the "memory copy hack"? There are a number of secondary design issues such as processor topology enumeration, display scanout configuration, efficient command generation, and GPU engine capability to push and/or pull data across the bus. Nonetheless, at the core multiGPU is data exchange and synchronization with lots of architecture & scenario focused optimizations possible.

    In the case of hybrid on Windows 8.1 the desktop compositor is frequently running on the iGPU making the entire pipeline a 2 GPU operation. I'll be the first to point out it's a simple but common use case. Those are, however, frequently the best cases to dip one's toes in the water.

    Max McMullen
    Direct3D Development Lead
    Microsoft
     
    Scott_Arm, ToTTenTranz and BRiT like this.
  15. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    918
    Likes Received:
    1,122
    Location:
    55°38′33″ N, 37°28′37″ E
    Uhm... this is not about intra-frame sync.

    It's about how the runtime internally uses two swap chains - ie. a collection of front-back-(back2) buffers in a double- or triple-buffering setup - to sync the presentation - i.e. back-front buffer swap when the back buffer is fully rendered.

    Having separate swap chains in a double-buffering setup ensures that one back buffer should be available for rendering at all times, so there is less stalling on either CPU or GPU to wait while the swap chain is being presented.

    Also the developer can specify either a fixed presentation interval (60, 30, 20, 15 Hz), or choose to present the frame immediately, i.e. no VSync. In that latter case the runtime can also blend between the two swap chains (i.e. front buffers) for previous and currently finished frames - so showing 10% of the current frame then 90% of the new frame should be less visible.
     
  16. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    918
    Likes Received:
    1,122
    Location:
    55°38′33″ N, 37°28′37″ E
    I have nothing against "hacks", my point was that capabilities to use multiple GPUs exist for some time in Direct3D 11 and yet we've seen little use of them - so if the Direct3D 12 API makes it easier for game developers to implement simultaneous frame rendering across multiple GPUs, as Tom's Hardware asserts, that may trigger a change.

    OK, but this was probably implemented for power-efficiency reasons.
     
  17. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    918
    Likes Received:
    1,122
    Location:
    55°38′33″ N, 37°28′37″ E
    So, if you watch the GDC video (scroll to 11m 25s) , it's explicit CPU/GPU synchronisation, which I presume the developers can use for realtime load balancing and performance profiling.

    Other parts of the muti-GPU puzzle are multiple parallel queries (37:30-44:30) , bindless resources (binding tier 3) with dynamic memory heaps (6:30-10:30), and multiple DMA engines on the GPU allowing CPU-independent virtual memory access.
     
  18. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    918
    Likes Received:
    1,122
    Location:
    55°38′33″ N, 37°28′37″ E
    Also see this video for details on
    • Parallel execution engines - 11:30-18:00
    • GPU efficiency (queries, predication, execute indirect) - 18:00-25:30
    • CPU overhead (resource binding, multithreading) - 26:05-36:15
     
  19. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    918
    Likes Received:
    1,122
    Location:
    55°38′33″ N, 37°28′37″ E
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...