No DX12 Software is Suitable for Benchmarking *spawn*

Discussion in 'Architecture and Products' started by trinibwoy, Jun 3, 2016.

  1. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,526
    Likes Received:
    454
    Location:
    British Columbia, Canada
    To be fair, the developers didn't even test on Intel because even a 580 is a fair bit below the min spec. Given the lack of OpenGL and Vulkan games out there (and general situation with both and extensions) it's unreasonable to assume anything that hasn't been tested will work. I was actually fairly shocked to find that the GL path does work fairly decently as of the latest driver update given how far below the min spec (~2.5tflop GPU @ 720p) even the 580 is. That said, their "min spec" is a ways off consoles so not sure what's up with that...

    Realistically in GL and Vulkan though if you haven't tested it it's fair to assume it's broken - and that goes for every IHV. DX is a much more solid API from that perspective and obviously gets more attention on Windows as the vast majority of games use DX. I don't think most folks would want us to invert those priorities :)
     
    Lightman, pharma, Razor1 and 2 others like this.
  2. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,526
    Likes Received:
    454
    Location:
    British Columbia, Canada
    Regarding Time Spy, it's definitely a bit odd that it goes out of its way to use async compute, but doesn't even support FL12 stuff like bindless that is fairly ubiquitous at this point. That said, benchmarks are mostly useful for being pretty these days as engine designs have diverged quite a bit from the goals of these benchmark vendors, so I'll enjoy it for what it is :)

    And yeah, A-buffer = yuck. Let's all emphasize how much it sucks that AMD doesn't have ROVs yet I guess! :S
     
    Ike Turner, pharma and OlegSH like this.
  3. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    16,254
    Likes Received:
    5,206
    Ah poop I also forgot to mention that FL 12_0 doesn't require either conservative rasterization or rasterizer ordered views, so neither of those would be why Time Spy is FL 11_0.

    As mentioned it's likely FL 11_0 as that's the highest that is supported with Fermi, not sure about Keplar and they'd want to support as broad a selection of relatively modern graphics cards as they can.

    Regards,
    SB
     
  4. Alessio1989

    Regular Newcomer

    Joined:
    Jun 6, 2015
    Messages:
    586
    Likes Received:
    291
    Sepaking about feature level, the lowest commond denominator are called Kepler and Maxwel 1. Those are feature level 11_0 cards. GCN Gen 1 GPUs are 11_1 (they do not support min/max filtering and mssa for sparse/tiled/reserved resources), all other GCN are actually 12_0. Intel Haswell/Broadwell are 11_1, skylake is 12_1 (and provides a more complete feature support then Pascal).
    Pixel Sync (ROVs) and Conservative Rasterization tier 1, are required for feature level 12_1.
     
    BRiT likes this.
  5. Ryan Smith

    Regular

    Joined:
    Mar 26, 2010
    Messages:
    611
    Likes Received:
    1,052
    Location:
    PCIe x16_1
    Kepler and Maxwell 1 are both 11_0 and are likely the reason that's the spec used (as opposed to 11_1 for GCN 1.0). Fermi's DX12 driver was never finished/released, so it doesn't really matter.
     
    Lightman and Silent_Buddha like this.
  6. Alessio1989

    Regular Newcomer

    Joined:
    Jun 6, 2015
    Messages:
    586
    Likes Received:
    291
    I would be curious to see running the benchmark on Intel hardware only.
     
  7. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    16,254
    Likes Received:
    5,206
    Damn, I'd forgotten about that. Yeah, I thought I'd remembered seeing that Kepler and Maxwell 1 were both 11_0, but couldn't remember for sure so didn't want to mention them. As well, I couldn't remember if GCN 1.0 was 11_0 or 11_1. Thanks for the clarification.

    Regards,
    SB
     
  8. OlegSH

    Regular Newcomer

    Joined:
    Jan 10, 2010
    Messages:
    365
    Likes Received:
    257
    Agree, it's hard to imagine a developer which would like to spend 20% of rendering time on low occupancy stuff, at least on PC

    PS I hate inter frame async, why would somebody want to trade a few % gain in framerate on many % loss in input latency?
     
  9. Infinisearch

    Veteran Regular

    Joined:
    Jul 22, 2004
    Messages:
    739
    Likes Received:
    139
    Location:
    USA
    Regarding Time Spy, anyone have any idea what they are doing with 70 million compute shader invocations per frame? It seems rather excessive.
     
    BRiT likes this.
  10. Malo

    Malo Yak Mechanicum
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    7,102
    Likes Received:
    3,167
    Location:
    Pennsylvania
    Maybe it's the AMD version of the jersey barrier? :lol:
     
    Alexko and BRiT like this.
  11. Genotypical

    Newcomer

    Joined:
    Sep 25, 2015
    Messages:
    38
    Likes Received:
    11
    Anyone know why this shows a performance gain for pascal when async is used?

    This is the only case so far where we see gains with nvidia hardware when "asynchronous compute" is used. I expect we'll be back to the usual no gains with actual games but it would be interesting to understand whats happening in this particular case. Developers are still "working with nvidia" to get gains in their games with async, so I would guess futuremark really worked with nvidia to get this result.

    And regarding CR and ROV in 12_1, has anyone tested and implementation of these on maxwell/pascal? It could well be pointless if the performance is worse for the same end result.
     
  12. Ryan Smith

    Regular

    Joined:
    Mar 26, 2010
    Messages:
    611
    Likes Received:
    1,052
    Location:
    PCIe x16_1
    Because even NVIDIA's GPUs have execution bubbles that can be filled.
     
    Kej, Lightman, Razor1 and 1 other person like this.
  13. Genotypical

    Newcomer

    Joined:
    Sep 25, 2015
    Messages:
    38
    Likes Received:
    11
    sure. but what's different about time spy that the results differ from games? Its likely taking advantage of pascal's load balancing and better pre-emption which seem to only help with shader efficiency. Last time I saw that discussed here I think I saw people saying it was not asynchronous compute.

    It would be nice to have an analysis of this rather than what sites seem to be doing, which is taking it on face value. "What is going on here? and will it really be this way in any significant number of games?" Seems a significant discrepancy.
     
  14. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY

    You can still see async benefits in games as well, but at different resolutions you are going to have bottleneck shifts, and that will affect the noticeable async benefits, might even see this with Time spy too, I didn't see any reviews using different resolutions.

    Also depending on what developers have used and how they used programmed for async could affect it on different IHV's hardware, so things like that we won't really know.

    And its not preemption on Pascal that is being used for its dynamic load balancing, preemption defeats the purpose of async compute.
     
  15. Genotypical

    Newcomer

    Joined:
    Sep 25, 2015
    Messages:
    38
    Likes Received:
    11
    I do think it should be possible to see gains in some situations with pascal's load balancing. But do they even have to use the compute queue for this? compute tasks in the graphics queue could be assigned to some clusters while the rest handle graphics? Or assigned to idle clusters while a graphics task is running. without touching the compute queue. My understand of the load balancing is that previously there was a fixed division of shaders within the graphics queue. Some doing compute, some doing graphics for each task. This did not change while the task was running. With pascal the clusters can be changed on the fly to do either compute or graphics to reduce idle states where either graphics or compute finishes before the entire task is done. Since the previous situation was all in the graphics queue, it would be safe to assume what pascal is doing is also in the graphics queue? This would mean its not actually asynchronous compute that's going on with time spy and pascal. It is still concurrent but it sounds like it was concurrent before as well.
     
  16. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    That's probably the number of work items. e.g. 10 kernel invocations, each with 7 million work items.
     
    Razor1 likes this.
  17. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY

    The queues aren't where the problem is, all DX12 hardware must have both a graphics and compute queues and copy queues, so that isn't something they can do without, it has to be there.

    Well two different things, each cluster or block all DX12 graphics cards have no problem with load balancing on a block level

    On a SMX level prior to pascal, once load balancing was done initially, (partitioning of the SMX) is could not change after that, so if in an application the amount of compute or graphics change % wise, that would be disastrous on pre pascal architectures on nV's side because now you will end up with underutilized units, it would actually be better to do things sequentially, and this will happen in a game environment so this is why we see performance deficits when Maxwell 2 is forced to do async compute. The only way for pre pascal architectures on nV's side to change the partioning is everything has to be stopped and then re partitioning can take place, pretty much a context switch on all kernels and queues, which of course is probably even worse then leaving the partition alone lol.
     
    #257 Razor1, Jul 16, 2016
    Last edited: Jul 16, 2016
  18. Genotypical

    Newcomer

    Joined:
    Sep 25, 2015
    Messages:
    38
    Likes Received:
    11
    I think that lines up with what I was saying. The queues exist, the question is if things are being done on both queues (compute and graphics) at the same time. This would be what I understand to be async compute (thought it seems to go beyond shaders).

    Pascals load balancing looks like better use of the graphics queue to do compute and graphics. More efficient concurrent execution.

    Some info from someone at futuremark was posted on anandtech forums.

    http://forums.anandtech.com/search.php?searchid=2805545
     
  19. Ike Turner

    Veteran Regular

    Joined:
    Jul 30, 2005
    Messages:
    1,884
    Likes Received:
    1,759
    From one of the Futurmark developer on the Steam Forums (disclaimer..the thread is a hot mess so click at your own risk..seriously do not click).

    http://steamcommunity.com/app/223850/discussions/0/366298942110944664/

    Replying to claims that Time Spy doesn't supprt "real" Async Compute (whatever the hell that means... meh)
    Regarding the bolded part... isn't the 7970 actually getting some rather substantial gains in Doom with Vulkan? I'm guessing that it's mainly due to lower CPU overhead and Shader Instinsics. Does anyone have Async on/off benches on a 7970?

    Same FM dev at AnandTech forums

    http://forums.anandtech.com/showpost.php?p=38362082&postcount=30

    http://forums.anandtech.com/showpost.php?p=38362194&postcount=46
     
    #259 Ike Turner, Jul 17, 2016
    Last edited: Jul 17, 2016
    Jawed and Silent_Buddha like this.
  20. SimBy

    Regular Newcomer

    Joined:
    Jun 21, 2008
    Messages:
    502
    Likes Received:
    135
    So a simple question. Why is it that when Nvidia disables 'async compute' on a driver level on anything older than Pascal (to prevent performance tanking obviously), TimeSpy score is valid. But when you do the same and disable 'async compute' in TimeSpy benchmark itself, it invalidates your score.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...