Vulkan/OpenGL Next Generation Initiative: unified API for mobile and non-mobile devices.

DmitryKo · Mar 28, 2015

Small text here since it's mostly off-topic.

Dominik D said:
It does support fixed pipeline. That's why you have to implement DDI calls for state changes like fog (you have to implement it) of color palette (I believe this one is a dummy). Some things are taken care of by the graphics stack, some aren't (or are terribly ineffective).

I'd guess fog table emulation in the DDI9 part of the driver has zero effect on DDI10 functionality used for Direct3D10/11 games.

My point was, WDDM driver is still required to support DDI9 and DDI11 alongside DDI12 for that exact reason - to maintain compatibility with old games, since DDI9 is still used by Direct3D 9 path and 10Level9 path in Direct3D 11. This way, compatibility problems remain the responsibility of IHVs, and these problems seem to be huge.

Click to expand...

And the alternative is?

Currently none, mostly because current mobile graphics parts only support DDI9 and feature level 9_3.

That is seemingly the reason why Microsoft are unwilling to repeat what they did in Vista time - i.e. either remap Direct3D 9-11 on top of Direct3D 12, or remap Direct3D 3-9 to Direct3D 11 and reimplement the latter on top of DDI12 in WDDM 2.0. This would be a huge task on its own, but they would also need to maintain compatibility logic - probably by the way of a Direct3D compatibility layer mentioned above.

Click to expand...

And they'd have to accept that only a fraction of machines would run Win10. Brilliant strategy, why haven't they thought about it?

I'm talking about a possible future update that takes advantage of modern hardware.

Compatibility paths will remain. Microsoft got rid of XPDM path only in Windows 8.0 - you could still use XP drivers in Vista/7 (and lose Aero/DWM), and Windows 8.x/10 still runs with WDDM 1.0 and level 9_3 graphics (but they lost Aero in process).

Microsoft does have a 11on12 layer in Windows 10 which runs on top of either Direct3D 12 runtime or DDI12 in the WDDM 2.0 driver (the docs are not clear on this yet).
https://msdn.microsoft.com/en-us/library/windows/desktop/dn913195(v=vs.85).aspx

They could use this layer to emulate Direct3D 11 and below on WDDM 2.x hardware at some point.

Thirdly, they "solved" the problem of perceived "DLL hell" by requiring application to install commonn libraries - which doubled OS storage requirements, multiplicated support matrix, and made further security updates and bugfixes complicated.

Click to expand...

I see, doomed if you don't doomed if you do. Excellent. So if MS ships stuff - bad. If applications do - bad. If stuff doesn't work - bad. I guess the answer is either "you shouldn't have made any code mistakes in the last 20 years" or "just give up". Great outlook!

No, doomed if you 1) take a half-assed file versioning implementation from the awful Windows ME, 2) apply it to a completely different NT kernel operating system so it can boot from FAT32 volumes, 3) build the whole Windows Update system on top of it, 4) let the user unistall every update at every time, resulting in unpredictable system state, 5) port it to NTFS using crazy stuff like hard links, which makes file size reporting go nuts, 6) despite inherent instablility of the store, provide no repair or maintenance tools.

Only recently did Microsoft tooksk some right steps, that is releasing "rollup" update packages that can't be uninstalled and are required for future updates, releasing ISO images of the updated installation media for the general public, and slowly moving to technologies like WIMBboot which make far more sense.

No no no, WinSxS in Windows ME was a terrible idea that tried to solve the problems that Microsoft created mostly by themselves - that is, introducing "bugfixes" and "feature enhancements" to common libraries which break older applications! It was all their fault which they blamed on application developers.

Click to expand...

Sure, it's their fault that they fixed bugs but it'd also be their fault if said bug was exploited on your machine. Clever but doesn't work like this.

And their clever part of the solution was to let the bug reign if the application requested it. Not-so-clever part was to build WinSxS into Windows 2000, which was unnecessary as it had a perfectly fine file security system.

Which piece of code in 2005 (or 2001 if we're discussing XP) was of a much higher quality?

OS X 10.3 worked like charm on our Power Macs G4 and G5. Never had a virus infection or any problem with the system software.

And it's not true that there was no focus on quality and security before Vista reset. I know, because I interned before Vista shipped and there were tons of threat analysis docs from pre-Vista timeframe, procedures and tools aiding development. MS had static and dynamic analysis for ages. App verifier and driver verifier existed in XP timeframe. Prefix and prefast (which was released as OACR) existed for some time too. I appreciate your opinion, the problem is it laughs in the face of facts.

Vista reset happened in late 2003-mid2004, the same time they prepared SP2 for Windows XP. Whatever their focus was before that, it failed as witnessed by multiple accounts, including one made by Jim Allchin.

Most (if not all) of Windows teams spend time with 3rd party developers and help them use APIs and what not correctly. This was definitely true in Vista onwards but my guess is this wasn't new (WinSE guys had these processes as well so these contacts applied to XP sustained engineering as well). And you didn't have to be a huge software shop, you just had to be smart. We've had 2-3 people companies mailing us with questions and visiting us on-site once or twice a year.

And the fact that lead developers have to perfom support duties has absolutely nothing to do with the quality of MSDN documentation or the design of the API...

sebbbi · Mar 31, 2015

DmitryKo said:
Well, that's strange, but again DX8-era features are cheap to emulate on current hardware with has like 200x the processing power of 2001-era parts. I don't see "lots of fixed funtion features" in DDI9 part of WDDM and I don't think they can be a major cause of driver problems in current D3D11 applications.

Old features are cheap to emulate on the hardware, but it complicates a performance optimized API implementation a lot. Many seemingly simple DX9/DX10/DX11 state changes require small shader modifications (add some instructions to emulate the state, as the fixed function hardware no longer exists), meaning that the driver needs to create a new version of the shader on fly and recompile/link it. This costs a lot of CPU cycles and complicates the resource life time tracking.

DmitryKo said:
I wouldn't exactly call these implementation-specific details "emulation" - even the remains of the DX8 era like the hardware pixel fog or shader models 1.0/2.0 which are still required in DDI9 part of WDDM drivers.

I only tend to call those instruction sequences "emulation" that used to be done by fixed function hardware in earlier GPU generations and have clear high level support in the API. Sampling cube maps is a good example of this. There used to be 1:1 mapping between the hlsl cubemap sampling instruction and hardware sampler implementation. Now the GPU microcode compiler emits lots of ALU instructions to the shader to calculate the cube face and the UV inside that face. Only the filtering is done by the fixed function hardware nowadays.

DmitryKo said:
So it seems perfectly legitimate when shader functions are not directly mapped one-to-one to equivalent hardware instructions - if that was the original intention of the designers of the arhitecture in the first place.

The original intention was 1:1 mapping. At least it was clearly communicated to the developer in this way. DirectX 8 and DirectX 9 (SM 2.0) shaders had tight instruction count limits. And those limits were given in form of DX asm instruction counts. I have been hand optimizing shaders (written in DX asm) to meet the 64 instruction limit of SM 2.0. The limit was exactly the same for all the SM 2.0 GPUs. In DX8, you had separate limits about texturing instructions and ALU instructions, and also dependent texture lookup limits. Emulating features by adding extra instructions was not possible without problems (unless the GPU internally supported much higher instruction count shaders).

DmitryKo said:
However I didn't have the chance to run shaders with graphic debugging tools and see how the high level code is mapped to actual machine instructions of the underlying processing architecture, so I'd like to see more details on this.

As a console developer, I have been looking at shader microcode for long time (on various architectures). I suggest that you try AMDs new PC shader optimization tools. These tools include a great microcode viewer functionality. You can simultaneously see the microcode (and GPR/SGPR counts, etc) of all the currently sold DX11+ AMD GPUs when you edit your shaders.

DmitryKo said:
Things like CPU virtual memory addressing (requires virtual page tables and TLBs to support 64-bit memory pointer size), unordered access views (require segmented addressing mode and descriptor tables), tiled resources (require virtual page tables and CPU host-based background memory tracking, Tiers 2 and 3 require specific additions to page table format), conservative raserization and rasterizer ordered views (require programmable rasterizer processor stage), typed resource formats and texture compression formats (require direct support for specific formats in the machine codes), etc.

Virtual memory is hard to emulate efficiently without breaking any safety rules. GPU cannot be allowed to break the system security. This rules out emulation of tiled resources (because tiled resources depend on page tables).

UAVs cannot be emulated if a GPU thread is only able to export data to a single predetermined location in the ROP cache (and there's no other memory output possibility). This was fine for DX9 GPUs, because no API provided any way to write data to random memory locations. Pixel shader always wrote predetermined amount of bits to the same location in ROP cache, and vertex shader had no memory write support. Xbox 360 memexport provided random access output to programmable memory addresses. The GPU however didn't support any inter-thread synchronization or memory atomics, so it couldn't have been able to "emulate" UAVs.

We can continue the talk about emulating DirectX 12 features when the SDK is publicly available.

elect · Apr 13, 2015

Dominik D said:
I don't work on those but my guess would be: no. You can easily find a set of two games of which one wouldn't work if the other did and there were no workarounds in the driver. Take Battlefield (one of them, don't remember which one) and Crysis (the second one, I think, but I'm not sure). One has shaders that depend on the 0 == -0 and the other has shaders that depend on 0 != -0. There's no way you can get both of them working on a single driver w/o having some sort of shader detection/patching mechanism.

Then maybe it is time to force devs to agree on one standard

Dominik D · Apr 20, 2015

elect said:
Then maybe it is time to force devs to agree on one standard

It's "standardized", or at least specified by DX (and OGL as well, I bet). It's just that some pieces of HW got it wrong and people working with it haven't noticed. It's not dev's job to validate hardware.

mczak · Apr 21, 2015

Dominik D said:
It's "standardized", or at least specified by DX (and OGL as well, I bet). It's just that some pieces of HW got it wrong and people working with it haven't noticed. It's not dev's job to validate hardware.

OGL generally is quite a bit more lenient, though +0.0 == -0.0 should be true there too at least nowadays (standard ieee754 float comparison rule). Most of DX rules wrt float rules are the same as ieee754-2008. Sometimes there are some ambiguities (like the result of max(+0.0, -0.0)) but should be pretty rare. I don't know though how any hw could miss the basic signed zero equality as I'm pretty sure the whck tests would catch this (though I don't know maybe you can get an exemption to still get the driver validated).
OGL though for a long time did not even require any NaN support, and still today the GLSL spec says "NaNs are not required to be generated". GLSL 4.1 also had little gems like "in general, correct signedness of 0 is not required" though that is now gone. Also, d3d10 explicitly forbids denorms being generated, they must be flushed to zero (with some "maybe" exceptions, which involves data movement). GLSL is just saying "can be flushed to zero" instead.

Dominik D · Apr 23, 2015

Manufacturers do get whql waivers. Some just a few, some a huge pile of them. And I wouldn't be surprised to see some drivers detecting test being performed and patching for cert specifically. There were some cheats like these detected several years back AFAIR.

mczak · Apr 23, 2015

Dominik D said:
Manufacturers do get whql waivers. Some just a few, some a huge pile of them. And I wouldn't be surprised to see some drivers detecting test being performed and patching for cert specifically. There were some cheats like these detected several years back AFAIR.

Well I don't know how exactly that is handled. IMHO for instance if your hw doesn't quite get seamless cube corner filtering right (which is quite hugely complex) I think a waiver for that would be quite ok. But something as basic as float comparisons shouldn't get a waiver, because someone might really rely on that being correct.

Dominik D · Apr 24, 2015

It's business. If MS has financial incentive to let even something egregious slide - they will.

Infinisearch · Apr 24, 2015

Any word on extendability, is it going to be like the current opengl or more like directx with versions and feature levels?

swaaye · Apr 25, 2015

Dominik D said:
Manufacturers do get whql waivers. Some just a few, some a huge pile of them. And I wouldn't be surprised to see some drivers detecting test being performed and patching for cert specifically. There were some cheats like these detected several years back AFAIR.

Dominik D said:
It's business. If MS has financial incentive to let even something egregious slide - they will.

Heh well there goes all of that talk about WHQL providing happy warm fuzzy feelings of safety and trust. Grall where are you?

sebbbi · Apr 25, 2015

I don't know how strict the compliance tests are, but at least there are some differences between GPUs regarding to texture sampling I have noticed.

Texel UV rounding has differences in point sampling between GPUs. Example: you do floor(uv * textureDim) to calculate the unnormalized texel coordinate and compare the result to point sampled texel. Intel and NVIDIA cards return the correct result. And so do older VLIW Radeons. New GCN Radeons however return a result that is 1/512 texels off. Also this is 1/512 physical texels off (after mip calculation), so you need to add exp2(mip) / 512.0f to the UV to make the point filtering result comparable to floor (and frac). This is annoying, as it means you have to create a separate shader for these GPUs.

Also there are precision/rounding differences when sampling normalized integer textures and scaling them to integers (int value = int(sample(uv) * 65535.0f)). Some GPUs/compilers produce correct value for each possible input, but some do not. This is one reason why I advocate adding unnormalized integer sampling to (PC) DirectX.

Kaotik · Apr 25, 2015

Infinisearch said:
Any word on extendability, is it going to be like the current opengl or more like directx with versions and feature levels?

If it goes the Mantle-way, it will support extensions at least

Ethatron · Apr 25, 2015

sebbbi said:
I don't know how strict the compliance tests are, but at least there are some differences between GPUs regarding to texture sampling I have noticed. ...

Well, it's not as bad as Nvidia's driver not able to calculate the mips for a R16G16_SNORM. At all.

mczak · Apr 25, 2015

sebbbi said:
I don't know how strict the compliance tests are, but at least there are some differences between GPUs regarding to texture sampling I have noticed.

It is also always possible the tests really are strict but just don't test something.

Texel UV rounding has differences in point sampling between GPUs. Example: you do floor(uv * textureDim) to calculate the unnormalized texel coordinate and compare the result to point sampled texel. Intel and NVIDIA cards return the correct result. And so do older VLIW Radeons. New GCN Radeons however return a result that is 1/512 texels off. Also this is 1/512 physical texels off (after mip calculation), so you need to add exp2(mip) / 512.0f to the UV to make the point filtering result comparable to floor (and frac). This is annoying, as it means you have to create a separate shader for these GPUs.

There actually was some bug filed against the open-source radeon driver at some point, though actually the other way round - thinking that the vliw radeons had some 1/512 offset in the round-up direction.
https://bugs.freedesktop.org/show_bug.cgi?id=25871

edit: And while we're at it, there's also some precision issue with barycentric interpolation (both gcn and evergreen):
https://bugs.freedesktop.org/show_bug.cgi?id=89012

Though honestly I think this stuff is way more subtle than just something bogus with float math, where it is very obvious what the results need to be.
https://bugs.freedesktop.org/show_bug.cgi?id=25871

Also there are precision/rounding differences when sampling normalized integer textures and scaling them to integers (int value = int(sample(uv) * 65535.0f)). Some GPUs/compilers produce correct value for each possible input, but some do not. This is one reason why I advocate adding unnormalized integer sampling to (PC) DirectX.

Interesting. You'd think this really should be accurate.

Jawed · Apr 26, 2015

sebbbi said:
Texel UV rounding has differences in point sampling between GPUs. Example: you do floor(uv * textureDim) to calculate the unnormalized texel coordinate and compare the result to point sampled texel. Intel and NVIDIA cards return the correct result. And so do older VLIW Radeons. New GCN Radeons however return a result that is 1/512 texels off. Also this is 1/512 physical texels off (after mip calculation), so you need to add exp2(mip) / 512.0f to the UV to make the point filtering result comparable to floor (and frac). This is annoying, as it means you have to create a separate shader for these GPUs.

That's utterly ridiculous.

The compute APIs don't have this problem though, do they? I've never used normalised texture coordinates in compute.

arjan de lumens · Apr 26, 2015

sebbbi said:
Also there are precision/rounding differences when sampling normalized integer textures and scaling them to integers (int value = int(sample(uv) * 65535.0f)). Some GPUs/compilers produce correct value for each possible input, but some do not. This is one reason why I advocate adding unnormalized integer sampling to (PC) DirectX.

This seems like a fairly standard floating-point issue, quite similar to classical floating-point annoyances like "why does 6 times 0.1 not equal 0.6?". Values on the form N/65535 are generally not exactly representable in floating-point, which means that every time you sample such a value, you get not an exact value but something that has been rounded slightly up or down. If it has been rounded down, multiplying by 65535 will result in a value very slightly less than N, which is then rounded down to N-1 when you convert to integer (in C-like languages, inluding HLSL and GLSL, conversion to integer always uses round-to-zero for some reason). Possible workarounds include:

Add 0.5 before doing the conversion to integer: (int value = int(sample(uv) * 65535.0f + 0.5f)).
Multiply by a slightly larger number before conversion: (int value = int(sample(uv) * 65535.75f )).

Without these sorts of workarounds, I would expect "correct-value-for-each-possible input" only if the GPU vendor in question has made a specific effort to achieve that kind of result (e.g. doing round-away-from-zero at specific points in the texture sampling process).

mczak · Apr 26, 2015

arjan de lumens said:
This seems like a fairly standard floating-point issue, quite similar to classical floating-point annoyances like "why does 6 times 0.1 not equal 0.6?". Values on the form N/65535 are generally not exactly representable in floating-point, which means that every time you sample such a value, you get not an exact value but something that has been rounded slightly up or down. If it has been rounded down, multiplying by 65535 will result in a value very slightly less than N, which is then rounded down to N-1 when you convert to integer (in C-like languages, inluding HLSL and GLSL, conversion to integer always uses round-to-zero for some reason).

Ah right I totally forgot about the round to zero - just thinking there should be more than enough mantissa bits to get an accurate result.

Possible workarounds include:

Add 0.5 before doing the conversion to integer: (int value = int(sample(uv) * 65535.0f + 0.5f)).

Multiply by a slightly larger number before conversion: (int value = int(sample(uv) * 65535.75f )).

I guess doing a round before conversion to int should also do the trick. The compiler should recognize at least the add +0.5f or the round before int conversion and use an appropriate float->int conversion with round instead if the chip can do this (for example, southern islands has convert float to int conversion with rounding instruction). Though actually the tie-breaker in this case is different to ordinary round (toward positive infinity), so I don't know if the compiler would actually do it (of course, for numbers coming from unorm format conversion it should not matter, but I don't know if they'd want to recompile the shader based on bound formats). If such analysis is done, the driver could also just recognize this and completely replace the math by just UINT sampling (which actually might be how some drivers get the correct result in the first place with the original formula, since indeed this should be incorrect based on float math).

Dominik D · Apr 26, 2015

swaaye said:
Heh well there goes all of that talk about WHQL providing happy warm fuzzy feelings of safety and trust. Grall where are you?

WHQL tests are awesome. It's the lack of waiver transparency that's missing.

sebbbi · Apr 27, 2015

arjan de lumens said:
Values on the form N/65535 are generally not exactly representable in floating-point, which means that every time you sample such a value, you get not an exact value but something that has been rounded slightly up or down.

Exactly. Normalized integers are pretty much the wost case for precision. 1/65536 (1/2^16) scaling would be bit precise (for every integer value for integers less than 24 bits). Unfortunately the rules say that 65535 ends up being 1.0 (not 65536). So pretty much every single value cannot be presented exactly with 32 bit floats. (*)

And that is why I find it odd that some hardware result in exact values, as if the shader compiler would optimize away the scaling and conversion (replace it with direct unnormalized read). I have noticed similar things with conversions. We had a 32 bit packing function (using all the bits), but the programmer wrote a bug and returned the bit packed integer as float. That value was assigned to a integer register later before writing it. The compiler optimized away the int->float->int conversions completely, hiding the bug.

(*). We also had a fun numerical issue regarding to terrain rendering. The LOD sceme was reducing the detail to 1/3 after every iterations. Floats present values of (1/2)^n precisely. (1/3)^n is never precise. This causes seams that were impossible to fix. Lesson learned: use powers of two. Unfortunately the normalized integers do not use powers of two for scaling, and there is nothing we can do to fix it.

willardjuice · Aug 10, 2015

http://www.anandtech.com/show/9509/...ill-use-feature-sets-android-support-incoming

Good (for Khronos) that they convinced Google to use Vulkan. Bad idea for Google imo.

anandtech said:
In traditional Khronos consortium fashion, Khronos is going to leave the feature set definitions up to the platform holder rather than define those sets themselves.

So they replaced mobile vs desktop with platform vs platform. Only Khronos could come up with a system that's worse than the status quo. Simply amazing how incompetent these guys are. Perhaps in practice things will align up closely, but "portability" has just become worse in Vulkan imo.

Vulkan/OpenGL Next Generation Initiative: unified API for mobile and non-mobile devices.

DmitryKo

sebbbi

elect

Dominik D

mczak

Dominik D

mczak

Dominik D

Infinisearch

swaaye

Entirely Suboptimal

sebbbi

Kaotik

Drunk Member

Ethatron

mczak

Jawed

arjan de lumens

mczak

Dominik D

sebbbi

willardjuice

super willyjuice

Similar threads