Understood. Good luck!Thanks, but no i won't show off to the public soon. Actually i work on necessary preprocessing tools, then i have to finalize LOD, and only after that i will start work on actual demo renderer. (Actually i only have debug visualization of surface probes.)
Goal is to sell it to games industry, and if that fails make some game myself using it.
It will be interesting to see how any performance related differences hold up at 5nm, or if it will be a one-sided story.If we relate performance to Watts, it turns out AMD does better. I did not know power differences are that big currently and had ignored this, but Watt is no bad measure to compare, since TF makes no more sense after NV doubled FP units.
Which facts to ignore?Only if you ignore facts:
View attachment 5678
Are you saying a RTX 3090 pulls double the wattage of a 6900 XT?
If not...you need to redo your math about performanc per Watt as NVIDIA outdoes AMD by nealy 100% here.
Which was clear already when the debatte has started.After all the debating, were still at the point where HW rt is faster than software.
Which facts to ignore?
This benchmark does not include power draw, and you don't even tell the workload / game it is about.
The Crytek Benchmark did include power draw. But the difference is no 2x.
However, i don't want to ignite another useless fanboy war. Even compute benchmarks are almost useless for me, i have to measure with my project to change my own, personal impression of AMD having still the edge, eventually.
Ofc. RTX is faster in a DXR game. I know this as any other does. It might be interesting to figure out how RT Cores help to reduce power draw, versus AMDs approach of offloading RT work to CUs.
But not to me. At this point, API restrictions is all i'm concerned about. Neither performance nor power draw of monster GPUs.
But my 'claim' was in context of a compute raytracing benchmark, and the perf / watt advantage is as true as those numbers shown from the benchmark are correct.
Well, forget about it.
While we nitpick about perf, RTX and Rolls Royce vs. Bentley, end users do not even notice a difference after accidently using iGPU for 4 years: hihihihi
While we nitpick about perf, RTX and Rolls Royce vs. Bentley, end users do not even notice a difference after accidently using iGPU for 4 years
Even compute benchmarks are almost useless for me, i have to measure with my project to change my own, personal impression of AMD having still the edge, eventually.
UE5 doesn't show anything of the sort.The point was: Software RT starts with smaller achievement due to performance, but may progress faster in the long run because no restrictions or conventions which turn out bad.
And sadly, this point now is relevant, as EU5 shows. Already now, only 3 years later.
I don’t think it’s helpful to the discussion to ignore tons of public data while claiming that your special workload is the only thing that is shaping your opinion on compute. It would be really great if you can share some more details about the workload and the performance you’re seeing on different hardware. You said your solution isn’t like DXR or Crytek or SDF or voxels. So what is it like?
The workload is pretty various. Overall it's similar to the Many LoDs paper. The scene is represented by a hierarchy of surfels. Initially i generated those at lighmap texel positions from quake levels, using mip maps to form the hierarchy.I don’t think it’s helpful to the discussion to ignore tons of public data while claiming that your special workload is the only thing that is shaping your opinion on compute. It would be really great if you can share some more details about the workload and the performance you’re seeing on different hardware. You said your solution isn’t like DXR or Crytek or SDF or voxels. So what is it like?
The workload is pretty various. Overall it's similar to the Many LoDs paper. The scene is represented by a hierarchy of surfels. Initially i generated those at lighmap texel positions from quake levels, using mip maps to form the hierarchy.
There are dynamic objects too, and like DXR i rebuild the top levels every frame to have one BVH for the entire scene.
Each surfel is an irradiance probe. They interreflect with all other surfels to solve rendering equation, utilizing caching in the probes to get infinite bounces for free. Interreflection also utilizes a 'lodcut' of the scene, like ManyLODs or Michael Bunnels realtimje AO/GI did before. Visibility is resolved with raytracing. I use the actual surfel representation (so discs, not detailed triangles) for occluders. One workgroup updates one probe, so all rays have the same origin, and scene complexity is small due to lodcut. Enables some optimizations of the raytracing problem and is good enough for GI.
Tu support BRDF, i also generate a small spherical cap environment map for each surfel (4x4 or maybe now 8x8, but's its prefiltered). That's lower res than the Many LOD paper has used, so my reflections are pretty blurry and support for smooth materials is limited.
Spatial resolution of probes was 10cm (near the camera), targeting last gen. Shadows are blurry too as they come from just 10cm discs. Fine for large are lights, but bad for small / point lights. Traditional shadow maps thus are still necessary if we need those details. RT could care for sharp reflections. (On the long run i think RT should replace SMs entirely, and personally i'd like to use it for all direct lighting.)
So that's all standard stuff, which i can talk about without revealing any 'secrets'. Maybe something like 20 compute shaders. I started with OpenGL, then OpenCL (twice faster on NV, surprisingly), finally Vulkan (twice faster again for both due to prerecorded command buffers and indirect dispatch.)
Initial hardware was Fermi / Kepler. Then i bought some 280X to test AMD, which was mid range GCN, and it was twice faster than a Kepler Titan GPU. That's when i became very convinced about AMD compute. After some AMD specific optimizations, mostly about trading random VRAM access vs. caching in LDS, the difference even became larger.
Latest test on that old HW was a 5970 vs. GTX670 using Vulkan, and AMD was a whopping 5 times faster with those GPUs (similar price and also game performance otherwise).
There is nothing special on my work. No frequent atomics to VRAM for example. And i saw this ratio accross all my very different shaders, so it seems no outlier. The project is too large for that, but it surely does not represent typical game workloads either.
Latest new HW i have tasted was FuryX vs. GTX1070. AMD still had a 10% lead if normalizing on teraflops, but obviously NV has fixed its shortcomings about compute perf.
And the improvements on Turing looked promising as well. I just remain sceptical about a lead due to experience. I assume both vendors do pretty similar currently, which is good if so. Kepler vs. GCN was maybe just a bad start for NV, but now the impression is deeply founded in my mind.
Ofc, all this is subjective and just one data point, and you have to weight it yourself when adding to all others.
5970 doesn't support Vulkan though?Latest test on that old HW was a 5970 vs. GTX670 using Vulkan, and AMD was a whopping 5 times faster with those GPUs (similar price and also game performance otherwise).
I did not notice how my 'it's better' would be perceived, and when i realized, it was too late. Sorry for the fuzz and rudeness.I have never seen anyone claiming console/amd rt solution to be 'better' in any way (maybe not flexibility), and if so, it doesnt even remotely come close to make up for the speed/performance Ampere (and turing) has to offer.
Notice this holds true no matter if we talk about Series S or 3090. Perf. limits are always there and can't be changed.Devs will have to be creative on consoles (like usual...), see rift apart for example (just upscaled reflections). But, your going to be limited still, dev magic can only do so much.
Devs want to be able to be creative on PC as well. Also this argument is independent of the initial performance we start from.Devs will have to be creative on consoles (like usual...)
Did you use shared memory (LDS) atomics extensively in your shaders?here is nothing special on my work. No frequent atomics to VRAM for example.
Yep, but the reason it's fast is something new. I should get done to limit the risk it will die with me in secret...Sounds a little like a hybrid between DDGI and Lumen with the added benefit of LODs within the surfel cache.
Waiting for reasonable prices. Though, it's just compute and nothing has change here since ages aside subgroup functions and fp16.You should definitely try to get your hands on some newer hardware. Fury and Pascal aren’t really relevant today.
No, 59 series was 1st or maybe 2nd gen of GCN on PC, Vulkan works. Though maybe i remember the model number wrongly. Could be 5950, or even 5870. But i'm sure it's a 3TF GCN model. I still use it for benchmarks.5970 doesn't support Vulkan though?
And I wonder what is it you're doing in compute for VLIW4 to be 5x faster than scalar.
Yes, often but not extensively. I have learned Kepler did emulate this in VRAM(?), so i assume it's the major reason of it being so slow for me.Did you use shared memory (LDS) atomics extensively in your shaders?