We really don't know until they post their findings, will we?That's assuming there is a problem in the first place. Expect a max 10% bump in Pascal performance when they enable and fine tune async compute. Expect 0% bump on anything older.
We really don't know until they post their findings, will we?That's assuming there is a problem in the first place. Expect a max 10% bump in Pascal performance when they enable and fine tune async compute. Expect 0% bump on anything older.
It would be nice if people would stop conflating async compute with instruction scheduling. How many times has to be explained/repeated that they are not related?!? If you think they are you still don't get what async compute is.
So I am making a technical argument and you reply with that? Please put me on your ignored list so you can save time and effort.
It's quite likely that people are just using the same terms that the developers themselves are using. In the specific case of Doom and Vulkan, the developers are stressing Async Compute and Shader Intrinsics.
If you want people to stop using the terms as they are using them, you'll need to get the developers to stop using the terms in those ways, especially when interviewed by gaming/tech sites.
Regards,
SB
Not sure why you took it personally. In your technical argument you said you're waiting for someone to start claiming 'but GCN gains more' and then saying it's just exposing inefficiencies in GCN uarch. That was a prevalent argument before Pascal was released. Maxwell being more efficient. Nvidia not needing async compute.
Now that Pascal actually shows significant gains with async compute I wonder how much backtracking we will see here.
Well let's imagine you have a scene containing 8.75 million vertices that draws 93 million pixels. Now let's imagine you have a GPU that contains some special processors that can only process vertices, let's call them vertex shaders. This GPU also contains some special processors that can only process pixels, let's call them pixel shaders. Say that such a GPU runs at 350MHz and has 6 vertex processors and 16 pixel processors. Said GPU will be able to render the scene at 60 frames a second. Now let's imagine another GPU which still runs at 350MHz and contains 24 general purpose processors that can process either vertices or pixels. How much faster will such a GPU be on the above mentioned scene? What if the scene changes though and now contains 17,5 million vertices?I would not call the gains of NVidia significant. They jsut show that NV does gain a bit from new APIs and that there is no performance penalty for NV hardware, but it is still a long way off from the jump in performance that CGN is showing
I would agree that the gains are mostly a sign of not so good utilisation of processing power with AMD in the old APIs and not so much a sign of superiority in the new APIs.
I'm not too sure something memory heavy is actually a good candidate for this. Have you tried it? As you said latency is high which means loads of pixels in flight to hide it. Which further means high register pressure within CU. So running a compute kernel with its own register and threads in flight requirements alongside might not actually be the best idea.Not just shadow maps there can be lot of other shaders which are memory bound. Storing gbuffer and rendering lights is also memory bound in a deferred renderer. At that point the shader cores are actually doing nothing. So what you do? Push more work in between. It's just like how new 'active' warps are scheduled by HW to hide memory latency where ALU:TEX ratio is low. Now with async you're just adding more work to cover the latency cost even more - or say decrease the chance that cores would be idle. Especially FuryX has a low frequency memory (HBM) so the latencies are pretty high. (I've actually tested this using an OpenCL memory latency benchmark), so the shaders were sitting idle quite a few number of times. I agree the gains depend on the amount of compute concurrent work you can push but then again with the complexity and variety of shaders typically in a game frame there's always something to overlap.
A PetaScale?I personally never expected any kind of magic performance to come from Nvidia GPUs as it is my opinion that their utilization was near or at capacity already and that AMDs architecture is only now finally showing its capabilities. I'm just glad that AMD is now able to compete on a whole new scale.
Remind you mean? It's a coincidence, it's APU, it has nothing to do with any discrete part aside the iGPU architecture, which you could implement with (almost, there's obviously upper limit due die size) any amount of CUs you want36CU, does that remain you something ?
Remind you mean? It's a coincidence, it's APU, it has nothing to do with any discrete part aside the iGPU architecture, which you could implement with (almost, there's obviously upper limit due die size) any amount of CUs you want