Vulkan/OpenGL Next Generation Initiative: unified API for mobile and non-mobile devices.

mczak · Mar 11, 2015

sebbbi said:
DX9 hardware (including last gen consoles) already supported centroid interpolation. You could actually become interpolation (fixed function hardware) bound if you had too many interpolants (or used VPOS).

If all the modern PC/mobile GPUs do the interpolation in the pixel shader using barycentrics (like GCN does), it would mean that cross vendor SV_Barycentric pixel shader input semantic would be possible. This would be awesome, as it would allow analytical AA among other goodies on PC (assuming DX12 and/or Vulkan support it).

Nvidia should do it since G80 (the interpolation is done in the SFU, the whitepaper saying "Plane equation unit generates plane equation fp32 coefficients to represent all triangle attributes". Intel always did (since gen4, i965) interpolation in the shader (with the help of a PLN instruction), however they only switched to barycentric with Gen6 (Sandy Bridge).
Mobile could be different though. From the freedreno sources, it seems adreno (since 300 series) also does some kind of barycentric interpolation, but from a quick glance the barycentric coordinates may not be available as ordinary registers (the driver issues OPC_BARY_F instructions). Not sure though it was a very quick glance, and no idea about others...

Not surprising that Intel is leading the pack. Do you know how fast their integer divider is (2 bits per cycle? = 16 cycles total?). That would be 3x+ faster than emulation. Still, I wouldn't use integer divides unless there's a very good reason.

I don't have any idea. Intel doesn't publish throughput numbers for the extended math functions ever since these were no longer a true external shared unit (the dreaded MathBox), so not since Sandy Bridge. The last numbers thus are for Ironlake - the docs are saying throughput 3 rounds per element for quotient, 4 for remainder. Note this is for a _scalar_ element, so dead slow (at least i965 only had one MathBox for the whole gpu, IIRC a very frequent bottleneck). For comparison, sqrt was also quoted to need 3 rounds per element, and things like sin/cos, pow being much slower still. (The docs are actually saying one round has 22 clock cycle latency, I don't know if I should believe what I think this really means, as it would be really awful...) In any case it's probably best not to extrapolate anything from these values to more modern chips, wouldn't be surprising imho if some operations now have less or more throughput compared to others.

Dominik D · Mar 11, 2015

@ToTTenTranz: Sure. Still: it was about how (supposedly) DDIs for previous D3Ds hinder driver robustness and D3D12 on its own would be the right approach.

eastmen · Mar 11, 2015

Dominik D said:
about supporting D3D12 DDI. My bet is that plenty of HW out there won't get D3D12 drivers, regardless of Win10 install base

What hardware ? AMD already supports it on all the cards its currently supports through its drivers. Nvidia is doing the same. I'm also fairly sure intel will support it through at least the core line.

What else needs to support it ?

Dominik D · Mar 11, 2015

If you're saying that AMD and Nvidia support D3D12 (with whatever feature level) for most of the cards out there, then I stand corrected.

willardjuice · Mar 11, 2015

Dominik D said:
If you're saying that AMD and Nvidia support D3D12 (with whatever feature level) for most of the cards out there, then I stand corrected.

But will anyone think of all the DX11 VIA igpus!?!

Deleted member 13524 · Mar 12, 2015

It seems that John Carmack is also concerned with Google being absent from Vulkan's supporters:

https://twitter.com/ID_AA_Carmack/status/575693181519429633

willardjuice said:
But will anyone think of all the DX11 VIA igpus!?!

Going through VIA's page is like a trip to the past!

VIA processors of at least 800MHz are recognized by Microsoft as a processor option in a Windows Vista Capable PC.

Vista? Meaning it could even play Halo 2??

And their top-of-the-line x86, the Via QuadCore:

With their low power design, they also offer industry-leading energy efficiency with the VIA QuadCore 1.2+ GHz processor delivering a TDP (Thermal Design Power) of only 27.5 watts.

Such performance, much efficiency. Wow.

sebbbi · Mar 12, 2015

ToTTenTranz said:
It seems that John Carmack is also concerned with Google being absent from Vulkan's supporters:

https://twitter.com/ID_AA_Carmack/status/575693181519429633

Oh please, no! We don't need another RenderScript. Vulcan has everything I need (= compute shaders + multidraw). Google API would certainly miss many important features and be way too generic.

willardjuice · Mar 12, 2015

sebbbi said:
Oh please, no! We don't need another RenderScript. Vulcan has everything I need (= compute shaders + multidraw). Google API would certainly miss many important features and be way too generic.

Why do you think Google is incapable of developing a competitive graphics API?

Deleted member 13524 · Mar 12, 2015

willardjuice said:
Why do you think Google is incapable of developing a competitive graphics API?

Perhaps not as incapable as unwilling.
99.99% of Android's games are either 2D or use very simple 3D graphics.

willardjuice · Mar 12, 2015

ToTTenTranz said:
Perhaps not as incapable as unwilling.
99.99% of Android's games are either 2D or use very simple 3D graphics.

Ah you can't have it both ways; if we only care about Angry Birds then ogl es 3.1 is far more than you need! In this case Google wouldn't need to bother with Vulkan or a custom API.

Regardless, that wouldn't change my question. Let's pretend the android store only has AAA console games: why do you think Google is incapable of developing a competitive graphics API?

liquidboy · Mar 12, 2015

Sadly the only people that want standards are those that are behind in their own ecosystems, developers like the ones here on this forum! ...

Once you have your own strong ecosystem, like Apple, Microsoft, Google, Samsung (almost) you'll want to control everything all the way down to api's and HW ..

So its only a matter of time till Google does what it did with forking Webkit and go it alone with a low level graphics api (like Apple did recently with Metal)..

It's the sad truth of the situation where ecosystems want to make it as hard as humanly possible to port apps/games between ecosystems..

p.s. I bet you there are meetings going on behind closed doors in Google land talking about how to prevent Microsoft from running android apps and chrome extensions/apps in their ecosystems .. Right now its looking like MS is going to do this ...

lanek · Mar 12, 2015

liquidboy said:
Sadly the only people that want standards are those that are behind in their own ecosystems, developers like the ones here on this forum! ...

Once you have your own strong ecosystem, like Apple, Microsoft, Google, Samsung (almost) you'll want to control everything all the way down to api's and HW ..

So its only a matter of time till Google does what it did with forking Webkit and go it alone with a low level graphics api (like Apple did recently with Metal)..

It's the sad truth of the situation where ecosystems want to make it as hard as humanly possible to port apps/games between ecosystems..

p.s. I bet you there are meetings going on behind closed doors in Google land talking about how to prevent Microsoft from running android apps and chrome extensions/apps in their ecosystems .. Right now its looking like MS is going to do this ...

Its not that they want to control the HW/API/everything .. they want to control the ecosytem and the consumers/ professional ecosystem who will buy or developp in it..

Google will developp an Android " low level API".... Apple provide an API for their ecosystem.. included for develop games, because it was and it still an easy way to get developpers ( and non developpers ) involved on developping apps, games for their ecosystem ( mostly to create a new market at start, but this have change since the smartphones / mobiles games have rise so well on money side ( this said, apps are so bad coded, games are so badly optimized, because here the word is, make small games, who cost the minimum possible to developp and bring a lot of money ( Candycrush as example )..

For Apple this is a money cash system. Google will love to get this as a cash system too. This said, i hope this market will change quickly.... its so brutal how innovation have take a break in this domain.

The mobile ecosystem made by the actual brands ( ) is the worst things who have never been created for innnovation developpement and technology access.. everything is closed to proprietary systems, each actors want to control everything... for keep the consumers on their road of milking them up.

The only left innovations are on the hardware side, because they all need it.

And they will continue to try again and again to get the solution who will make that everyone want to only buy their device.. If its only available on our ecosystem, and if you need it, you will need to get our devices ... Dont wait anything from them, from this sector on this side.. Technology innovation is here, but not used for make is way to everyone, just for make a maximum of money.

Look the payment system of Apple: you get NFC, you get standard bank card system ( National bank, private bank, shops, credit card system ) who use all one system, well implemented and available ( i can pay with simple NFC by linked by visa card on my smartphone, on every shop who accept visa, mastercard card ).. you get the smartphone market who is divided with a 85% for Android bases system, and 11-12% for Apple ecosystem.. you still see Apple release a payment system by their smartphone after refused since 4 years to include NFC in their smartphone ( still their new system is based on NFC and Token, similar to other ), trying to licence, make their payment NFC system the only one possible to use when you go on a shop, just for create a need to have their devices... outside Asia who use NFC payment since 10years ( even for metro, bus, shops, administration etc ), in other country, you end now with different systems of payment and the adoption rates is really really slow. ( All based on the same technology ).

WHy will you imagine they think differently for " damn small, low cost 4$ games " on their system ? Do you really think, they want to drive the next evolution on gaming ? On the financial side, money yes.. on the quality ? innovation ? .. lol, all the rest is only marketing..

I have take the example of Apple, as it was easy, but i dont attack Apple, the whole mobile industry is the same...

We have pass from Innvoation on technology to a sad scenario, well worst than the MS vs Linux vs Open vs.. well you get the message. Thinking that their decisions about Vulkan will be drived by open standard ", innovation, simplicity or made for the developpers and for offer new possibility on the mobile market is a completely wrong assumption.. they just want to do money.

Sorry was a bit offtopic.

sebbbi · Mar 13, 2015

willardjuice said:
Why do you think Google is incapable of developing a competitive graphics API?

They certainly do have the performance programming talent (just look at their search algorithms they use in their data centers). I am not questioning about that.

What I am questioning is their history of preferring to hide all the hardware details on low power mobile devices, where every cycle counts. It is a strange choice that Java was chosen to be the core application development language on performance and power constrained devices running on a small battery. It took awfully long time until even simple apps were running at smooth 60 fps on Android. Random garbage collection stalls still occur even on devices equipped with 8 core CPUs and 3 GB of memory. Google prefers their own high level RenderScript (Java API) instead of the industry standard OpenCL, because OpenCL is too hardware specific.It wasn't a problem for them that RenderScript was not even available for NDK, since Java development is the main way to write apps for their platform. Call me cynical, but I don't have high hopes for Vulkan on Android. It would only benefit the NDK developers, and NDK has never been a high priority for Google. Historically the compiler support for NDK has always lagged behind other platforms (for several years). C++ was not even officially supported for a long time (NDK officially only supported C). Things have lately improved on the compiler front, and that makes me happy. I am hoping that Carmack keeps pushing Google to the right direction, since he understands the meaning of efficient CPU and GPU usage and the implications of input/display latency. Latency is very important for VR technology. Oculus + Samsung is pushing their joint Android mobile VR tech heavily, meaning that they likely give Google lots of good feedback how to improve their platforms for high performance (and low latency) VR development. All these improves should benefit games (and NDK development) in general on Android.

VR is a big buzzword right now, and if Carmack says VR needs Vulkan, it might be hard to disregard that.

forumaccount · Mar 13, 2015

sebbbi said:
VR is a big buzzword right now, and if Carmack says VR needs Vulkan, it might be hard to disregard that.

What justification would he even have for making a statement like this? Vulkan doesn't offer anything that can't be done in OpenGL already. It's just a better-thought-out API which should result in better CPU performance. So "VR needs Vulkan" translates to "VR requires weak CPUs"?

Rodéric · Mar 13, 2015

forumaccount said:
What justification would he even have for making a statement like this? Vulkan doesn't offer anything that can't be done in OpenGL already. It's just a better-thought-out API which should result in better CPU performance. So "VR needs Vulkan" translates to "VR requires weak CPUs"?

It would rather mean that VR requires higher performance to be good and therefore any API that waste cycles is a bad idea.
(ie even on your i7 behemoth you need to spare as many cycles as you can and get CPU & GPU to run asynchronously instead of traditional master & slave relationship.)

Alexko · Mar 13, 2015

Plus, I don't think you can assign one GPU to each eye in OpenGL or D3D11, so there's that. It's still a matter of performance, but not CPU-related.

sebbbi · Mar 13, 2015

forumaccount said:
What justification would he even have for making a statement like this? Vulkan doesn't offer anything that can't be done in OpenGL already.

Explicit resource management makes it possible to run the GPU and CPU closer to lock step, reducing the latency. In high level APIs there's a lot of buffering happening (in driver threads) for various reasons. DX 12 and Vulkan will certainly reduce the latency on PC and mobiles (closing the gap significantly to consoles).

High level APIs also have various stalls (frame spikes) because of automatic resource management, shader compilation (because of state changes), data transfers and format conversion abstraction.

Also Vulkan is much more CPU efficient compared to OpenGL. This is critical when you are rendering two scenes (both eyes) simultaneously. On a power constrained SOC, the TDP is shared, meaning that Vulkan can clock the GPU higher, allowing higher frame rates (60 fps is the absolute minimum for reasonable VR experience).

firstminion · Mar 13, 2015

forumaccount said:
What justification would he even have for making a statement like this? Vulkan doesn't offer anything that can't be done in OpenGL already. It's just a better-thought-out API which should result in better CPU performance. So "VR needs Vulkan" translates to "VR requires weak CPUs"?

It means that VR needs low latency and predictable performance and Vulkan aims for those.

rpg.314 · Mar 14, 2015

Importance of sorting draw calls by state in vulkan/dx12?

Is it still important? IMG used to recommend sorting by state changes instead of depth for obvious reasons, but is that going away?

DmitryKo · Mar 28, 2015

Sorry for a late reply.

sebbbi said:
Pretty much every modern GPU emulated the alpha test in DX9 by adding pixel shader clip instruction to the shader. Intel and PowerVR GPUs emulate alpha blending by adding the blend instruction sequences and the back buffer read to the end of the pixel shader (allowing nice additional things such as pixelsync / programmable OpenGL blending extensions).

Well, that's strange, but again DX8-era features are cheap to emulate on current hardware with has like 200x the processing power of 2001-era parts. I don't see "lots of fixed funtion features" in DDI9 part of WDDM and I don't think they can be a major cause of driver problems in current D3D11 applications.

AMD GCN emulates lots of fixed functions features. The interpolation and fetch of the vertex shader outputs is manual. VS outputs to LDS and the pixel shader begins with instructions to fetch the transformed vertices and ALU instructions to interpolate those. The same is true for vertex data fetching in the vertex shader. Complex input data can add 10+ instructions to either shader easily. Pixel shader output to 16 bit per channel RTs is also emulated (added ALU instructions to pack two 16 bit values to 32 bit outputs). Cube map sampling has been emulated for long time on ATI/AMD GPUs at least. The GPU adds extra ALU instructions to normalize the source vector and to select the cube face (find the major axis). All the current GPUs emulate integer divide by a sequence of instructions (tens of instructions per divide). SAD is also emulated if missing. The new OpenCL cross lane operations are a prime candidate for emulation on GPUs that lack the required vector lane swizzle instructions. Emulation is the key to make your GPU compatible with all the existing APIs, while allowing the GPU to be more general purpose and future proof.

I beg to differ.

With latest GPUs like GCN, we have massively multi-core, general-purpose MIMD CISC processors, and the WDDM driver has to interpret the intermediate shader language to remap it onto the processor architecture used: things like physical registers, machine instructions, virtual memory tables and descriptor tables.

In regard to machine instructions - which are decoded internally to very simple ALU and/or load-store micro-ops - there is offten very little incentive to biuld complicated "do it all" operations into machine codes, unless the operation is extremely common and warrants the effort for designing high-performance processing units just for a couple of specific micro-ops.

So it's often preferable to just require several successive machine codes for some complex operation, and design micro-op decoders and schedulers for effective parallel execution, or use large caches to hide the memory latency. In effect, you will have essentially the same performance as you would achieve with a very complex machine instruction, but you don't risk going over your micro-op budget and over-engineering your scheduler - all to account for a very small percentage of usage cases.

So it seems perfectly legitimate when shader functions are not directly mapped one-to-one to equivalent hardware instructions - if that was the original intention of the designers of the arhitecture in the first place. Of course, it's only OK as long as that arhitectural decision doesn't severely impacts real-world performance - which could be a problem when a certain shader code requires tens or several dozen machine instructions. That was my point about "killing the performance", so it's probably a matter of finding a good balance between complexity and performance.

I wouldn't exactly call these implementation-specific details "emulation" - even the remains of the DX8 era like the hardware pixel fog or shader models 1.0/2.0 which are still required in DDI9 part of WDDM drivers.

However I didn't have the chance to run shaders with graphic debugging tools and see how the high level code is mapped to actual machine instructions of the underlying processing architecture, so I'd like to see more details on this.

Still I'm certain you positively can't or shouldn't try to "emulate" features that require not just a few more machine instructions to translate your shader code to, but certain physical hardware blocks, registers, page/descriptor tables, micro-ops and operands.

Things like CPU virtual memory addressing (requires virtual page tables and TLBs to support 64-bit memory pointer size), unordered access views (require segmented addressing mode and descriptor tables), tiled resources (require virtual page tables and CPU host-based background memory tracking, Tiers 2 and 3 require specific additions to page table format), conservative raserization and rasterizer ordered views (require programmable rasterizer processor stage), typed resource formats and texture compression formats (require direct support for specific formats in the machine codes), etc.

I wouldn't insist if I wasn't taught Processor Design during my engineer degree studies... I still have my lecture notes around and the basics still hold for any von Neuman architecture CISC processor, even advanced one like GCN...

Vulkan/OpenGL Next Generation Initiative: unified API for mobile and non-mobile devices.

mczak

Dominik D

eastmen

Dominik D

willardjuice

super willyjuice

Deleted member 13524

Guest

sebbbi

willardjuice

super willyjuice

Deleted member 13524

Guest

willardjuice

super willyjuice

liquidboy

lanek

sebbbi

forumaccount

Rodéric

a.k.a. Ingenu

Alexko

sebbbi

firstminion

rpg.314

DmitryKo

Similar threads