Vulkan/OpenGL Next Generation Initiative: unified API for mobile and non-mobile devices.

Discussion in 'Rendering Technology and APIs' started by ToTTenTranz, Aug 11, 2014.

  1. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    642
    Likes Received:
    485
    Location:
    55°38′33″ N, 37°28′37″ E
    Small text here since it's mostly off-topic.

    I'd guess fog table emulation in the DDI9 part of the driver has zero effect on DDI10 functionality used for Direct3D10/11 games.

    Currently none, mostly because current mobile graphics parts only support DDI9 and feature level 9_3.

    I'm talking about a possible future update that takes advantage of modern hardware.

    Compatibility paths will remain. Microsoft got rid of XPDM path only in Windows 8.0 - you could still use XP drivers in Vista/7 (and lose Aero/DWM), and Windows 8.x/10 still runs with WDDM 1.0 and level 9_3 graphics (but they lost Aero in process).

    Microsoft does have a 11on12 layer in Windows 10 which runs on top of either Direct3D 12 runtime or DDI12 in the WDDM 2.0 driver (the docs are not clear on this yet).
    https://msdn.microsoft.com/en-us/library/windows/desktop/dn913195(v=vs.85).aspx

    They could use this layer to emulate Direct3D 11 and below on WDDM 2.x hardware at some point.

    No, doomed if you 1) take a half-assed file versioning implementation from the awful Windows ME, 2) apply it to a completely different NT kernel operating system so it can boot from FAT32 volumes, 3) build the whole Windows Update system on top of it, 4) let the user unistall every update at every time, resulting in unpredictable system state, 5) port it to NTFS using crazy stuff like hard links, which makes file size reporting go nuts, 6) despite inherent instablility of the store, provide no repair or maintenance tools.

    Only recently did Microsoft tooksk some right steps, that is releasing "rollup" update packages that can't be uninstalled and are required for future updates, releasing ISO images of the updated installation media for the general public, and slowly moving to technologies like WIMBboot which make far more sense.


    And their clever part of the solution was to let the bug reign if the application requested it. Not-so-clever part was to build WinSxS into Windows 2000, which was unnecessary as it had a perfectly fine file security system.


    OS X 10.3 worked like charm on our Power Macs G4 and G5. Never had a virus infection or any problem with the system software.


    Vista reset happened in late 2003-mid2004, the same time they prepared SP2 for Windows XP. Whatever their focus was before that, it failed as witnessed by multiple accounts, including one made by Jim Allchin.

    And the fact that lead developers have to perfom support duties has absolutely nothing to do with the quality of MSDN documentation or the design of the API...
     
  2. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    Old features are cheap to emulate on the hardware, but it complicates a performance optimized API implementation a lot. Many seemingly simple DX9/DX10/DX11 state changes require small shader modifications (add some instructions to emulate the state, as the fixed function hardware no longer exists), meaning that the driver needs to create a new version of the shader on fly and recompile/link it. This costs a lot of CPU cycles and complicates the resource life time tracking.
    I only tend to call those instruction sequences "emulation" that used to be done by fixed function hardware in earlier GPU generations and have clear high level support in the API. Sampling cube maps is a good example of this. There used to be 1:1 mapping between the hlsl cubemap sampling instruction and hardware sampler implementation. Now the GPU microcode compiler emits lots of ALU instructions to the shader to calculate the cube face and the UV inside that face. Only the filtering is done by the fixed function hardware nowadays.
    The original intention was 1:1 mapping. At least it was clearly communicated to the developer in this way. DirectX 8 and DirectX 9 (SM 2.0) shaders had tight instruction count limits. And those limits were given in form of DX asm instruction counts. I have been hand optimizing shaders (written in DX asm) to meet the 64 instruction limit of SM 2.0. The limit was exactly the same for all the SM 2.0 GPUs. In DX8, you had separate limits about texturing instructions and ALU instructions, and also dependent texture lookup limits. Emulating features by adding extra instructions was not possible without problems (unless the GPU internally supported much higher instruction count shaders).
    As a console developer, I have been looking at shader microcode for long time (on various architectures). I suggest that you try AMDs new PC shader optimization tools. These tools include a great microcode viewer functionality. You can simultaneously see the microcode (and GPR/SGPR counts, etc) of all the currently sold DX11+ AMD GPUs when you edit your shaders.
    Virtual memory is hard to emulate efficiently without breaking any safety rules. GPU cannot be allowed to break the system security. This rules out emulation of tiled resources (because tiled resources depend on page tables).

    UAVs cannot be emulated if a GPU thread is only able to export data to a single predetermined location in the ROP cache (and there's no other memory output possibility). This was fine for DX9 GPUs, because no API provided any way to write data to random memory locations. Pixel shader always wrote predetermined amount of bits to the same location in ROP cache, and vertex shader had no memory write support. Xbox 360 memexport provided random access output to programmable memory addresses. The GPU however didn't support any inter-thread synchronization or memory atomics, so it couldn't have been able to "emulate" UAVs.

    We can continue the talk about emulating DirectX 12 features when the SDK is publicly available.
     
  3. elect

    Newcomer

    Joined:
    Mar 21, 2012
    Messages:
    50
    Likes Received:
    5
    Then maybe it is time to force devs to agree on one standard
     
  4. Dominik D

    Regular

    Joined:
    Mar 23, 2007
    Messages:
    782
    Likes Received:
    22
    Location:
    Wroclaw, Poland
    It's "standardized", or at least specified by DX (and OGL as well, I bet). It's just that some pieces of HW got it wrong and people working with it haven't noticed. It's not dev's job to validate hardware.
     
  5. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,012
    Likes Received:
    112
    OGL generally is quite a bit more lenient, though +0.0 == -0.0 should be true there too at least nowadays (standard ieee754 float comparison rule). Most of DX rules wrt float rules are the same as ieee754-2008. Sometimes there are some ambiguities (like the result of max(+0.0, -0.0)) but should be pretty rare. I don't know though how any hw could miss the basic signed zero equality as I'm pretty sure the whck tests would catch this (though I don't know maybe you can get an exemption to still get the driver validated).
    OGL though for a long time did not even require any NaN support, and still today the GLSL spec says "NaNs are not required to be generated". GLSL 4.1 also had little gems like "in general, correct signedness of 0 is not required" though that is now gone. Also, d3d10 explicitly forbids denorms being generated, they must be flushed to zero (with some "maybe" exceptions, which involves data movement). GLSL is just saying "can be flushed to zero" instead.
     
    elect and Simon F like this.
  6. Dominik D

    Regular

    Joined:
    Mar 23, 2007
    Messages:
    782
    Likes Received:
    22
    Location:
    Wroclaw, Poland
    Manufacturers do get whql waivers. Some just a few, some a huge pile of them. And I wouldn't be surprised to see some drivers detecting test being performed and patching for cert specifically. There were some cheats like these detected several years back AFAIR.
     
  7. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,012
    Likes Received:
    112
    Well I don't know how exactly that is handled. IMHO for instance if your hw doesn't quite get seamless cube corner filtering right (which is quite hugely complex) I think a waiver for that would be quite ok. But something as basic as float comparisons shouldn't get a waiver, because someone might really rely on that being correct.
     
  8. Dominik D

    Regular

    Joined:
    Mar 23, 2007
    Messages:
    782
    Likes Received:
    22
    Location:
    Wroclaw, Poland
    It's business. If MS has financial incentive to let even something egregious slide - they will.
     
    Lightman likes this.
  9. Infinisearch

    Veteran Regular

    Joined:
    Jul 22, 2004
    Messages:
    739
    Likes Received:
    139
    Location:
    USA
    Any word on extendability, is it going to be like the current opengl or more like directx with versions and feature levels?
     
  10. swaaye

    swaaye Entirely Suboptimal
    Legend

    Joined:
    Mar 15, 2003
    Messages:
    8,451
    Likes Received:
    570
    Location:
    WI, USA
    Heh well there goes all of that talk about WHQL providing happy warm fuzzy feelings of safety and trust. Grall where are you?
     
  11. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    I don't know how strict the compliance tests are, but at least there are some differences between GPUs regarding to texture sampling I have noticed.

    Texel UV rounding has differences in point sampling between GPUs. Example: you do floor(uv * textureDim) to calculate the unnormalized texel coordinate and compare the result to point sampled texel. Intel and NVIDIA cards return the correct result. And so do older VLIW Radeons. New GCN Radeons however return a result that is 1/512 texels off. Also this is 1/512 physical texels off (after mip calculation), so you need to add exp2(mip) / 512.0f to the UV to make the point filtering result comparable to floor (and frac). This is annoying, as it means you have to create a separate shader for these GPUs.

    Also there are precision/rounding differences when sampling normalized integer textures and scaling them to integers (int value = int(sample(uv) * 65535.0f)). Some GPUs/compilers produce correct value for each possible input, but some do not. This is one reason why I advocate adding unnormalized integer sampling to (PC) DirectX.
     
  12. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,142
    Likes Received:
    1,830
    Location:
    Finland
    If it goes the Mantle-way, it will support extensions at least
     
  13. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    856
    Likes Received:
    260
    Well, it's not as bad as Nvidia's driver not able to calculate the mips for a R16G16_SNORM. At all.
     
  14. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,012
    Likes Received:
    112
    It is also always possible the tests really are strict but just don't test something.

    There actually was some bug filed against the open-source radeon driver at some point, though actually the other way round - thinking that the vliw radeons had some 1/512 offset in the round-up direction.
    https://bugs.freedesktop.org/show_bug.cgi?id=25871

    edit: And while we're at it, there's also some precision issue with barycentric interpolation (both gcn and evergreen):
    https://bugs.freedesktop.org/show_bug.cgi?id=89012

    Though honestly I think this stuff is way more subtle than just something bogus with float math, where it is very obvious what the results need to be.


    Interesting. You'd think this really should be accurate.
     
    #134 mczak, Apr 25, 2015
    Last edited: Apr 25, 2015
  15. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    That's utterly ridiculous.

    The compute APIs don't have this problem though, do they? I've never used normalised texture coordinates in compute.
     
  16. arjan de lumens

    Veteran

    Joined:
    Feb 10, 2002
    Messages:
    1,274
    Likes Received:
    50
    Location:
    gjethus, Norway
    This seems like a fairly standard floating-point issue, quite similar to classical floating-point annoyances like "why does 6 times 0.1 not equal 0.6?". Values on the form N/65535 are generally not exactly representable in floating-point, which means that every time you sample such a value, you get not an exact value but something that has been rounded slightly up or down. If it has been rounded down, multiplying by 65535 will result in a value very slightly less than N, which is then rounded down to N-1 when you convert to integer (in C-like languages, inluding HLSL and GLSL, conversion to integer always uses round-to-zero for some reason). Possible workarounds include:
    • Add 0.5 before doing the conversion to integer: (int value = int(sample(uv) * 65535.0f + 0.5f)).
    • Multiply by a slightly larger number before conversion: (int value = int(sample(uv) * 65535.75f )).
    Without these sorts of workarounds, I would expect "correct-value-for-each-possible input" only if the GPU vendor in question has made a specific effort to achieve that kind of result (e.g. doing round-away-from-zero at specific points in the texture sampling process).
     
  17. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,012
    Likes Received:
    112
    Ah right I totally forgot about the round to zero - just thinking there should be more than enough mantissa bits to get an accurate result.
    I guess doing a round before conversion to int should also do the trick. The compiler should recognize at least the add +0.5f or the round before int conversion and use an appropriate float->int conversion with round instead if the chip can do this (for example, southern islands has convert float to int conversion with rounding instruction). Though actually the tie-breaker in this case is different to ordinary round (toward positive infinity), so I don't know if the compiler would actually do it (of course, for numbers coming from unorm format conversion it should not matter, but I don't know if they'd want to recompile the shader based on bound formats). If such analysis is done, the driver could also just recognize this and completely replace the math by just UINT sampling (which actually might be how some drivers get the correct result in the first place with the original formula, since indeed this should be incorrect based on float math).
     
  18. Dominik D

    Regular

    Joined:
    Mar 23, 2007
    Messages:
    782
    Likes Received:
    22
    Location:
    Wroclaw, Poland
    WHQL tests are awesome. It's the lack of waiver transparency that's missing.
     
  19. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    Exactly. Normalized integers are pretty much the wost case for precision. 1/65536 (1/2^16) scaling would be bit precise (for every integer value for integers less than 24 bits). Unfortunately the rules say that 65535 ends up being 1.0 (not 65536). So pretty much every single value cannot be presented exactly with 32 bit floats. (*)

    And that is why I find it odd that some hardware result in exact values, as if the shader compiler would optimize away the scaling and conversion (replace it with direct unnormalized read). I have noticed similar things with conversions. We had a 32 bit packing function (using all the bits), but the programmer wrote a bug and returned the bit packed integer as float. That value was assigned to a integer register later before writing it. The compiler optimized away the int->float->int conversions completely, hiding the bug.

    (*). We also had a fun numerical issue regarding to terrain rendering. The LOD sceme was reducing the detail to 1/3 after every iterations. Floats present values of (1/2)^n precisely. (1/3)^n is never precise. This causes seams that were impossible to fix. Lesson learned: use powers of two. Unfortunately the normalized integers do not use powers of two for scaling, and there is nothing we can do to fix it.
     
  20. willardjuice

    willardjuice super willyjuice
    Moderator Veteran Alpha Subscriber

    Joined:
    May 14, 2005
    Messages:
    1,372
    Likes Received:
    239
    Location:
    NY
    http://www.anandtech.com/show/9509/...ill-use-feature-sets-android-support-incoming

    Good (for Khronos) that they convinced Google to use Vulkan. Bad idea for Google imo.

    So they replaced mobile vs desktop with platform vs platform. Only Khronos could come up with a system that's worse than the status quo. Simply amazing how incompetent these guys are. Perhaps in practice things will align up closely, but "portability" has just become worse in Vulkan imo.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...