Next gen lighting technologies - voxelised, traced, and everything else *spawn*

Again: My goal is to improve at least the API with criticism, not to start useless vendor flame wars.
You are not making any API better by using baseless claims. This is the first time any one here heard that AMD has TWICE the compute performance of NVIDIA, let alone 5 times. This is not shown in any public compute benchmark. If AMD has twice the compute power or more, then their chips must be a complete fail because they just can't seem to beat NVIDIA for multiple generations.

If you are going about criticizing APIs with such wild claims, I begin to wonder about the validity of the rest of you API criticism.

calculate perf per dollar.
Performance per dollar is a meaningless metric when speaking on a strictly technical and academic level. Because it changes all the time, and on no less than a whim.
 
Last edited:
You are not making any API better by using baseless claims. This is the first time any one here heard that AMD has TWICE the compute performance of NVIDIA, let alone 5 times.

If this is really new to you, then i doubt as well it has much sense to discuss APIs here. It is quite a difference to look at public benchmarks with closed source, or to measure constantly while developing.
I have seen a factor of two quite often in public benchmarks. But also the other way around. Probably a result of specific algorithm, bad optimization, or whatever else. Maybe it's just a coincidence all my work in the recent 5 years runs faster on AMD. But anyone focused on compute and not rasterization usually agrees with me, though not always of course. If you have any data, let me hear please.

Some factors i remember exactly, and that's not per dollar but just the runtime:

FuryX vs. 1070: 1.66 (no async compute, poorly saturated)
5850 vs. 670: 5
rx280 vs. first Titan: >2 (measured about five years ago)

Notice that games utilize rasterization much more, so NV usually wins games benchmarks.

If you are going about criticizing APIs with such wild claims, I begin to wonder about the validity of the rest of you API criticism.

If you use APIs yourself then you can follow my criticism and agree or disagree. Otherwise you don't know what i talk about and validity does not matter.
I work with Vulkan, OpenCL and OpenGL. Those APIs have surprising differences in compute performance as well, contradicting 'political' assumptions. Factor of 2 appears multiple times. NV varies wildly, AMD much less.

Never ever anyone doubted my numbers. It's always interesting to share them, because most people have no opportunity to test out every API themselves for large projects.

Do you wonder and attack somebody who says a game has 180 FPS on NV but only 100 on AMD? That's the same time ratio!

Final comment on vendors other than NV, which this thread is about.
 
Notice that games utilize rasterization much more, so NV usually wins games benchmarks.
That's a blanket statement, running a game is a combination of various factors other than compute: texturing, geometry, fill rate, scheduling, ..etc. What you are suggesting is that AMD is so far behind NVIDIA in ALL of these factors to justify them losing in games when they are so far ahead in compute already. A suggestion that I find VERY hard to believe.

I have seen a factor of two quite often in public benchmarks. But also the other way around. Probably a result of specific algorithm, bad optimization, or whatever else.
It goes both ways in either direction indeed.
5850 vs. 670: 5
Maybe it's just a coincidence all my work in the recent 5 years runs faster on AMD.
I am beginning to believe this is the reason. This is not a coincidence, when you compare a VLIW5 arc to a super scalar one that is TWO generations ahead in performance, and then get a result like that, It means your algorithms are tightly tailored to ATi/AMD much more than NVIDIA. The problem is your algorithm, not the hardware.
Never ever anyone doubted my numbers. It's always interesting to share them, because most people have no opportunity to test out every API themselves for large projects.
Please do share them, I am looking forward to learn about it.
 
Do you wonder and attack somebody who says a game has 180 FPS on NV but only 100 on AMD? That's the same time ratio!
Sure thing, without knowing which GPUs are being compared and at what settings, this statements will hover anywhere between logical and complete fabrication.

Final comment on vendors other than NV, which this thread is about.
We are not outside the scope of this thread, you said AMD has two times the compute performance, which will enable them to maintain ray tracing on compute. I am casting doubt on both statements, because: 1-no evidence has been given and 2-because there are results that are against your assumption. Here is Turing delivering 200% more performance in RadeonRays.
https://www.computerbase.de/2018-09...agramm-gpu-compute-radeon-pro-renderer-baikal
 
Sure thing, without knowing which GPUs are being compared and at what settings, this statements will hover anywhere between logical and complete fabrication.
I did list specific GPU models. You keep insinuating me to bullshit but it is you who lacks evidence and meaningful contribution.

Here is Turing delivering 200% more performance in RadeonRays.
RadeonRays is a trival implementation of classic raytracing not tuned to GPUs. Each thread traverses the BVH independently. No shared work or effort in optimal batching - at least i did not find anything in the source. RadeaonRays is the primary reason i say AMD lacks software experience.
NV is better with random memory access like this, so i'm not surprised. On the next benchmark (Luxmark) Vega is close to 2080 and beats its competitor the 1080 and also the 1080 ti.
Maybe the advantage is completely gone now? I don't know because AS ALREADY SAID i have not tested the most recent GPUs.
Sorry if my data is a bit out of data and i did not mention this on my very first, short, speculative comment.

I do not think AMD has a chance to BEAT NV with compute raytracing. But it may just perform better than expected, if they manage to improve upon RadeonRays. Pure speculation.
It is also pure speculation that neither AMD nor Intel have started on RT cores yet, and until their next or following generations they have no other option than implementing RT on compute.
To option to improve current CUs for good raytracing instead building dedicated cores is pure speculation too, but the thought is maybe more interesting than any of your recent posts.
Comparing vendor compute performance in such detail IS out of the scope here IMO.
 
I did list specific GPU models. You keep insinuating me to bullshit but it is you who lacks evidence and meaningful contribution.
The problem here is you're an unknown person, so you can't expect people to trust what you say verbatim. You may be highly experienced and have great data, but you could also be some crazy fanboy spreading FUD. From your posts, I think you're posting from a far more experienced position than most in this conversation, but you still need to provide solid, reliable data beyond personal anecdotes, or links to reliable sources.

Comparing vendor compute performance in such detail IS out of the scope here IMO.
It'd certainly be worth discussing! If you could provide scientific data on that in a new thread, I for one would be very interested to see it.
 
The problem here is you're an unknown person, so you can't expect people to trust what you say verbatim. You may be highly experienced and have great data, but you could also be some crazy fanboy spreading FUD. From your posts, I think you're posting from a far more experienced position than most in this conversation, but you still need to provide solid, reliable data beyond personal anecdotes, or links to reliable sources.

Hmmm... well, i see. Makes sense. I'm new here and this situation is new to me as well.
I can't give any proof, because i have to keep my work secret for some longer time. Personal opinion and experience is all i have to offer and it's up to anyone to belief or doubt. I'm not used to the request on proofs and i'm unsure which people post here. Maybe it's more about business than development and i'm a bit wrong here.

Sorry @DavidGraham for getting rude already. Appologize for that.

The problem here is you're an unknown person, so you can't expect people to trust what you say verbatim. You may be highly experienced and have great data, but you could also be some crazy fanboy spreading FUD. From your posts, I think you're posting from a far more experienced position than most in this conversation, but you still need to provide solid, reliable data beyond personal anecdotes, or links to reliable sources.

It'd certainly be worth discussing! If you could provide scientific data on that in a new thread, I for one would be very interested to see it.

Not sure if i would dare to contribute :) Data from closed source is never guaranteed to be trust worthy anyways.

I can't say much more than i've already said. But one thing is that my code is highly optimized to the way GPUs work. This is rarely the case. For many algorithms (like classical raytracing) it's practically impossible and there is a need for changes in hardware.
So my numbers are related to peak performance, not average or practical performance we see most of the time. This is probably the reason my numbers appear extreme. (I don't get there because i'm so awesome - i've spent just a lot of time to taylor the algorithms to that.)
That said, AMD has in my experience higher peak performance. I assume this still holds with recent GPUs - at least for my project. But i admit projecting my own numbers to classical raytracing was not well thought - i'm not sure they would hold there too.

However, let NV expose their new work geneartion shaders, and if they work with compute in a way i hope, my numbers might change significantly and NV may take the lead. I'm no fanboy of anything else than Led Zeppelin maybe.
 
But as soon as you hit the box, there is no way to proceed with compute shaders, because the custom intersection shader is called from and returns to the raytracing pipeline, not the compute pipeline.
The intersection shader then runs one ray per thread in isolation. This rules out any kind of parallel algorithm within the custom box, and rendering the custom data with compute shaders to the frame buffer would exclude it from reflections or raytraced shadows.

Can you provide actual proof of any of your claims? You've made a lot of bold claims that I don't think you're fully qualified to make. You got to back it up...

RadeonRays is a trival implementation of classic raytracing not tuned to GPUs. Each thread traverses the BVH independently. No shared work or effort in optimal batching - at least i did not find anything in the source. RadeaonRays is the primary reason i say AMD lacks software experience.

This is factually wrong. RadeonRays is considered to be one of the most solid GPU ray tracing solutions out there. Even Unity is starting to use it for their editor! Since it's open source, care to show us exactly where the "unoptimal parts" are? @DeanoC (who wrote a bulk of it) I'm sure would like to know. :p
 
To be clear: I don't have anything against raytracing or hardware acceleration. It's welcome. I only dislike the extremely restricted implementation.
Also it makes me angry that rays can launch rays, but compute shaders still can't launch compute shaders

I’m pretty novice like and you have a much better grasp of it, but doesn’t CUDA allow for compute shaders to launch compute shaders. And to some lesser degree executeindirect do it as well but with more restrictions ?
 
Can you provide actual proof of any of your claims? You've made a lot of bold claims that I don't think you're fully qualified to make. You got to back it up...

Check DXR API documentation yourself. I got it by downloading the SDK somwhere from MS, which required registration. Maybe it's public now.

This is factually wrong. RadeonRays is considered to be one of the most solid GPU ray tracing solutions out there. Even Unity is starting to use it for their editor! Since it's open source, care to show us exactly where the "unoptimal parts" are? @DeanoC (who wrote a bulk of it) I'm sure would like to know. :p

Radeaon Rays is fully functional and well suited for tasks like baking. But - AFAIK - it is not optimised for realtime purpose. It's a start.
See this traversal kernel as an example:


https://github.com/GPUOpen-Librarie...ys/src/kernels/CL/intersect_bvh2_skiplinks.cl

This is naive traversal of BVH per thread. No shared work or data, LDS is not utilized at all. For realtime you would do something more complex, e.g. sorting all rays to BVH branches, cache the whole branch to LDS within a CU, loop over all potential intersecting rays in batches.
After some time, remove terminated rays, compact, resort remaining rays. There are dozens of other optimization ideas. You read 'Nvidia' on many research papers in the field. They have much more experience here.

I’m pretty novice like and you have a much better grasp of it, but doesn’t CUDA allow for compute shaders to launch compute shaders. And to some lesser degree executeindirect do it as well but with more restrictions ?
I know it for sure only for OpenCL 2.0 (which NV does not support on consumer hardware). I've heard Mantle did support it too, and i assume Cuda as well.
ExecuteIndirect does not help, and NVs Vulkan extension to build command buffers on GPU neither (not possible to insert barriers).
So both DX12 and VK lack support although every hardware seems to have the functionality.

DXR with its advanced sheduling options seems to extend this. Think of the complexity to emit rays and return back to the calling shader. Under the hood there must be some nice mechanism to queue equal materials or similar rays to common warps, process the warps, return to calling shader after all results have been computed, allowinng even for recursion.
This is the final missing piece to fully unlock the potential of GPUs. No more brute force - you can implement good algorithms without restrictions.
I really really want this to be exposed to compute as well.
 
I know it for sure only for OpenCL 2.0 (which NV does not support on consumer hardware). I've heard Mantle did support it too, and i assume Cuda as well.
ExecuteIndirect does not help, and NVs Vulkan extension to build command buffers on GPU neither (not possible to insert barriers).
So both DX12 and VK lack support although every hardware seems to have the functionality.

DXR with its advanced sheduling options seems to extend this. Think of the complexity to emit rays and return back to the calling shader. Under the hood there must be some nice mechanism to queue equal materials or similar rays to common warps, process the warps, return to calling shader after all results have been computed, allowinng even for recursion.
This is the final missing piece to fully unlock the potential of GPUs. No more brute force - you can implement good algorithms without restrictions.
I really really want this to be exposed to compute as well.
@Max McMullen any thoughts here (if you are still roaming this forum?) Is there something in the pipeline perhaps?
 
DXR is officially out today with the Windows 10 Fall update:

https://blogs.msdn.microsoft.com/di...acing-and-the-windows-10-october-2018-update/

DirectX Raytracing and hardware trends
Hardware has become increasingly more flexible and general-purpose over the past decade: with the same TFLOPs today’s GPU can do more and we only expect this trend to continue.

We designed DirectX Raytracing with this in mind: by representing DXR is a compute-like workload, without complex state, we believe that the API is future-proof and well-aligned with the future evolution of GPUs: DXR workloads will fit naturally into the GPU pipelines of tomorrow.
 
Also interesting was their comments on the DirectML API:
In addition to the progress we’ve made with DirectX Raytracing, we recently announced a new public API, DirectML, which will allow game developers to integrate inferencing into their games with a low-level API. To hear more about this technology, releasing in Spring 2019, check out our SIGGRAPH talk.

ML techniques such as denoising and super-resolution will allow hardware to achieve impressive raytraced effects with fewer rays per pixel. We expect DirectML to play a large role in making raytracing more mainstream.
Edit: In the SIGGRAPH Talk video link at 15:45 there is a demo using Forza 3.
 
Last edited by a moderator:
As it is now, it's primarily to the advantage of NV, not to ours. It opens up new possibilities, yes. But it locks a lot of others by reducing chip area for more flexible GPGPU.

That’s a bold statement to make without knowing how many flops were sacrificed at the altar of RT hardware. It’s an absolute certainty though that those lost flops would be woefully inadequate for real-time RT.

How do we know that? Because we already have 16+ teraflop monsters that are woefully inadequate at real-time RT. Anyone who wants to build solutions on programmable hardware already has an abundance of options today.
 
It's not about the flops traded, but the possibility for devs to experiment. Nvidia's first implementation is just too much of a black box.
 
What matters ultimately is DXR, not RTX.
If we are talking about consoles adopting it, than the actual implementation matters a lot as console devs will inevitably get APIs (or versions of the in case of MS) that expose more of the underlying operations under the hood. If the hardware implementation is hardly programmable, that's a whole generation of missed opportunities for inventive alternative solutions and experimentation with different optimizations.
 
That’s a bold statement to make without knowing how many flops were sacrificed at the altar of RT hardware. It’s an absolute certainty though that those lost flops would be woefully inadequate for real-time RT.

How do we know that? Because we already have 16+ teraflop monsters that are woefully inadequate at real-time RT. Anyone who wants to build solutions on programmable hardware already has an abundance of options today.

I do not make 'bold' statements. I only say my personal opinion and share my experience, aiming for discussion, not to proof speculative thoughts.
To see how much GPGPU 'flops' is lost, just take a die shot and look up the area RT and Tensor cores take. I do not say those cores are useless - i only speculate: 'What could we do with more general purpose cores instead?'
That's just a question. I am not sure myself which option i would choose, because we likely never know how the alternative would progress (assuming other vendors follow with specialised, fixed function cores).

Terflop monsters however are NOT inadequate at real-time RT. I know this from personal experience. I don't need to proof this to myself, i only need to look at my screen here. I will not proof it to you.
However, the kind of RT that i do is very different from the classical approach we discuss here. My approach has opposite strengths and weaknesses. RTX can solve my remaining problems, fill up missing details, that's why i'm interested.
So, RTX and my work should form a nice couple to approach realtime photorealism further. I'm not here to argue against RTX.

From my perspective, seeing my solution already solving the GI problem with good details, it makes sense to question the need for dedicated fixed function hardware just to 'fill up missing details'.
It also makes sense to question the need for two implementations of BVH.
It makes sense to request programmable hardware so sharing work would become possible.

From my perspective it makes sense to question the acceleration of classical raytracing at all. Because for me it is eventually an outdated idea. But i'm not sure because i did not start work on solving the missing details myself, and now with RTX i probably never will.
 
So, RTX and my work should form a nice couple to approach realtime photorealism further. I'm not here to argue against RTX.
Sorry if you already said it but when do you think you will be able to show us your stuff?
 
Back
Top