AMD Mantle API [updating]

Discussion in 'Rendering Technology and APIs' started by MarkoIt, Sep 26, 2013.

  1. WaltC

    Veteran

    Joined:
    Jul 22, 2002
    Messages:
    2,710
    Likes Received:
    8
    Location:
    BelleVue Sanatorium, Billary, NY. Patient privile
    I'm not disagreeing with you, just pointing out the irony. There was a time when GLIDE and OpenGL ruled--3dfx vs. nVidia, respectively (and D3d had not yet been born or else was too immature to matter.) Most people railed against this and heralded D3d as the API that all IHVs could participate in, compete in, and support. Would you want to go back to the former standard? I'm not saying I wouldn't...and not saying I would, either...;)

    Yes, and if D3d development had ended with DX7 then some other API would likely have usurped it years ago. But it didn't; D3d kept on progressing because of Microsoft's support and influence. There's no reason Mantle cannot continue to develop just as D3d has done for all of these years.

    Here's the thing: is Microsoft getting tired of managing D3d? I think there are signs that Microsoft is getting tired of doing a lot of things it has traditionally done (which I think is a mistake, but that's another story), and D3d might just be one of those things the company would be more than happy to hand off to AMD--I say AMD because nVidia's relationship with Microsoft has been rocky for many a year and I doubt it will ever improve. It seems like most of the D3d advances have come from ATi/AMD, anyway, over the last decade. (nVidia is still deep in its proprietary cups with things like PhysX, CUDA, etc.)

    However, even though nVidia stated publicly years ago that it did not officially approve of "the direction" for 3d gaming that Microsoft was charting with D3d--after it put nV3x behind it, nVidia has never had any trouble fully supporting D3d in the years since--inspired by AMD or not, as the case may be. I think that more than answers the question as to whether nVidia could adapt to Mantle as surely as it has adapted to D3d.

    So, what if there's an under-the-table understanding between AMD and Microsoft that AMD will slowly take charge of the API side of the business over the next few years? After all, AMD is in the ideal spot to do so, manufacturing both consoles as it does. Microsoft has two out of three positions cornered, the PC, and the xBone vs. the PS4, so even if the PS4 does well it does not mean Microsoft won't do better. Ah...speculation at this point is premature!

    But if you are a developer and you want to support xBone, PS4, and the PC through a single API, and at a low level nearer the hardware, what else is there except Mantle at the moment? That said, is Mantle any good? As others have asked, how are the developer tools and so on? I think that the idea of Mantle for AMD has legs--but whether the actual Mantle code does is a horse of a different color, certainly.
     
  2. Osamar

    Newcomer

    Joined:
    Sep 19, 2006
    Messages:
    231
    Likes Received:
    43
    Location:
    40,00ºN - 00,00ºE
    From my ignorance and noobism.

    Mantle to me seems to be the base to the metal drivers layer. On top of it AMD have D3D and OGL, so the have decided to polish and publish like an API. Probably because XBone use it or something very similar.

    If I am not wrong Nvidia use something similar in its drivers, a common base and D3D/OGL on top.

    What could do Mantle not posible with OGL extensions?

    We will see DirectX 12 as a high level library on top of to the metal APIs?
     
  3. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    In which case, I don't understand why you are arguing for a sw managed cache over a larger L2?
     
  4. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    I'm arguing for a wodge of globally accessible on-die fast RAM. It's coming anyway, I just expect it'll be a while and until then XBone will have an advantage with whatever developers can find to do with it.

    Large L2 or something else? Unfortunately the evidence (in graphics/games) is still thin on the ground, particularly as Intel ran off with Larrabee. Why did MS go the way it did?

    Was it because it's more similar than different to XB360's EDRAM? Was it AMD's preference or advice? A sort of trial by AMD to see what happens? 32MB L3 too complex/costly for this launch timeframe? etc.

    I'm not defending the choice of architecture for this wodge in XBpne. It's there and unignorable. Crystalwell is a bold step, one that's theoretically always there "making everything faster" even when not coded to be dependent upon a wodge of on-die memory.
     
  5. itsmydamnation

    Veteran

    Joined:
    Apr 29, 2007
    Messages:
    1,349
    Likes Received:
    470
    Location:
    Australia
    where would you put the L3 on a GPU, break up the L2 into multiples and have L3 being central, before the memory controllers in some kind of banked configuration, or as separate caches inside each memory controller?

    The last seems the easiest to me, makes is easy to scale out the uarch but you wont get the power savings like you would if the L3 was closer to the execution units.
     
  6. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    IIRC, the ESRAM can be configured as a hw managed or a sw managed cache, so those tradeoffs don't really apply.
     
  7. itsmydamnation

    Veteran

    Joined:
    Apr 29, 2007
    Messages:
    1,349
    Likes Received:
    470
    Location:
    Australia
    i thought the esram was just a pool of addressable memory, thus purely software managed, not a cache?
     
  8. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    There are hints at some future DF article that might go into detail as to how the eSRAM is used in a compute scenario.
    This might give some answers to my questions concerning the eSRAM's latency numbers.
     
  9. Andrew Lauritzen

    Andrew Lauritzen Moderator
    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,632
    Likes Received:
    1,246
    Location:
    British Columbia, Canada
    Sure, but there aren't a lot of graphics algorithms that are going to be able to rely heavily on that. For 3D stuff, you almost need something like pixel sync to make it useful (i.e. to make the latency relevant). In compute you could argue that if it's coherent enough for atomics and such there could be cases that could benefit a well, but people typically try to avoid writing latency sensitive GPU code because it also tends to be code that doesn't scale well to wider architectures. I agree that there's an area in the middle there with some potentially valuable algorithms, but I don't think the Xbone is going to really pull any super-fancy tricks in practice that won't be implementable on other platforms.

    Are you sure it can work as a HW cache (with what coherency)? That'd be fairly big news if so, and IMHO make it a lot more useful.
     
  10. Still

    Newcomer

    Joined:
    Apr 16, 2008
    Messages:
    38
    Likes Received:
    0
    It's certainly good news that the transaction level API has materialized. AMD was talking about this goal for a couple of years now with their heterogeneous computing strategy if I am not mistaken.

    It will be interesting to see how the iGPU resources can finally be exploited in tandem with a dGPU. But perhaps this won't really take off before the memory bandwidth gap of APUs is taken care of. The potential of interconnected DRAM ICs seems so huge and tangible to me. High cost mass production has already been accomplished AFAIK. I wonder if an investment collaboration between AMD and GLOBALFOUNDRIES regarding TSVs could really put them in an even more advantageous position there.

    What are the chances for an open transaction level API? But don't we all imagine if there are no countries...
     
  11. KKRT

    Veteran

    Joined:
    Aug 10, 2009
    Messages:
    1,040
    Likes Received:
    0
    If anyone is interested those are my drawcalls performance tests from CE3 SDK:

    Video - http://youtu.be/GrSpm2AZWVU (draw calls are listed as DP 3rd row from the top)

    Results, not from video, but from a little more precise testing:
    300 draw calls - 105 fps
    2100 draw calls - 104 fps
    3000 draw calls - 103 fps
    4000 draw calls - 101 fps
    5000 draw calls - 91 fps
    6000 draw calls - 83 fps
    7000 draw calls - 75 fps
    9000 draw calls - 65 fps
    13000 draw calls - 49 fps
    17000 draw calls - 41 fps
    20000 draw calls - 37 fps
    on stock i5 2500k and GTX 560.
     
    #111 KKRT, Sep 30, 2013
    Last edited by a moderator: Sep 30, 2013
  12. Pressure

    Veteran

    Joined:
    Mar 30, 2004
    Messages:
    1,655
    Likes Received:
    593
    Will be interesting to see those performance figures on a GCN card before and after Mantle!
     
  13. Dominik D

    Regular

    Joined:
    Mar 23, 2007
    Messages:
    782
    Likes Received:
    22
    Location:
    Wroclaw, Poland
  14. Still

    Newcomer

    Joined:
    Apr 16, 2008
    Messages:
    38
    Likes Received:
    0
    That's not representing modern OpenGL/Direct3D. You can easily draw 20k unique objects with <5 ms overhead when using indirect drawing and bindless resources.
     
  15. Dominik D

    Regular

    Joined:
    Mar 23, 2007
    Messages:
    782
    Likes Received:
    22
    Location:
    Wroclaw, Poland
    This is not about unique objects, instancing and what not. This is about the number of actual draw calls, which require "some" housekeeping in the UMD for each and every single one (which requires CPU time, which happens to be the topic of this discussion, not the complexity of the scene).
     
  16. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    948
    Likes Received:
    417
    @Still I don't see how you can have less DrawInstancedIndirect() than DrawInstanced() calls. Each indirection just handles 1x draw. If you had 20k normally, you have 20k indirect. Bindless resources aren't available on DirectX, even though GCN would allow you to use them extensively - with luck you might even pass them from fixed function stage to fixed function stage or store them in any place you want.
     
  17. KKRT

    Veteran

    Joined:
    Aug 10, 2009
    Messages:
    1,040
    Likes Received:
    0
  18. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    That's what I remember reading. I could be wrong.
     
  19. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    If you come across the source, I'd be interested in reading it.
    I haven't seen that being offered, specifically.
    This is also not taking into account that upgrading GPUs and untraditional processors to a peer-like status in AMD's heterogenous platform strains a number of assumptions made when using the terms of what has been a traditionally CPU-based discussion.
     
  20. MJP

    MJP
    Regular

    Joined:
    Feb 21, 2007
    Messages:
    566
    Likes Received:
    187
    Location:
    Irvine, CA
    I converted that to usable time units for you, since I'm allergic to FPS.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...