Impact of nVidia Turing RayTracing enhanced GPUs on next-gen consoles *spawn

Status
Not open for further replies.
What other RT solutions could AMD entertain? Does the BVH or equivalent have to be on GPU? Seems not to me, with nVidia sticking it there as that's their only part in a PC. AMD however could put an acceleration unit in the CPU, or indeed elsewhere in a system. What would the ideal be?

BVH acceleration can make sense because it's useful for other things like physics too. But ofc the nodes have to be exposed as accessible data structure. Also building the tree should be left to the user, flexible branching factor (number if max children per node) would be nice if possible.
Shooting rays against the BVH would be useful, but i'd prefer to keep control of ray batching. So i wanted to shoot a compute wavefront of 64 rays against BVH, not single threaded isolated rays.
Further i would be fine with getting the results later in some form of query. I do not want the wavefront to be stalled while waiting on results of a prcocess that has long non constant runtime.

I would be fine with just the BVH. Triangles, or whatever other form of geometry we use should be implemented by users instead, again within regular compute to utilize parallel algorithms. No single threaded custom intersection shader crap!
To help with this, some special instructions to accelerate triangle - ray test would be worth it.

I do not want to call surface shaders per thread, execute them and return the results. All this makes it easy but it's not how GPUs work. Instead leave it to the users to batch materials and to limit divergence. More work for us, but more potential for best performance.

Additionally we need options to generate work directly on GPU (!!!!), so all the above can be done without inflexible requirement of command buffer generation on CPU.
Mantle already had support for conditional command buffer execution and loops. I guess consoles have this already and it might be good enough. But it needs to be extended to async compute as well.
EDIT: ofc generting commands directly from compute would be much better.

Most of all this can be done very easily by AMD. (BVH in HW traversal is no requirement either)
 
AMD's position regarding consoles, has put them in a comfortable spot of being the gate keepers of what HW features are the industry standard for consumer games, and Nvidia can't complain because they left that space open of their own choosing.
The situation in the PC ecosystem is massively different than consoles, NVIDIA holds the majority of PC GPUs, as such gets the lion share of engine optimizations and game support. For nearly a decade AMD enjoyed complete dominance on consoles, yet that never translated into anything meaningful on PC, and it never made them compete head to head with NVIDIA or hold back it's march of new features, in fact AMD regressed further than ever compared to NVIDIA on PC, on all fronts.

If AMD continues to no offer RT RT on their PC lineup, this will ensure that RTX remains the "defacto" standard for RT on PC for a long time, all RT capable hardware will be NVIDIA only, as NVIDIA will notoriously hammer their RT advantage in all of their 7nm GPUs, this will drive developers more toward developing RT for RTX only (as they do now), the common RT platform on PC. While AMD will be treated like the new late comer to the block, getting the short end of the stick and repeating the exact same situation as we have now, where AMD's only hope for the adoption of their version of RT on PC, comes from whatever sticks from the consoles implementation as it is ported to the PC. This is far from an ideal solution.

Consoles are a fixed platform for relatively a long time. PC is not, architectures change and improve rapidly on PC, meaning whatever consoles have now, will not be easily transformed into whatever is on PC in a specific future time. Which is why AMD is not able to fully exploit their consoles optimizations advantage. They always have to start optimizing on PC from a relatively "from scratch" position, as their PC arch is not really the same as the 6 year old console arch.

AMD being late with RT means AMD repeating the same mistakes that lead NVIDIA to hold all the cards with CUDA, AI and VR. This is a dangerous gamble that almost always never pays off. AMD relied too much on Mantle/DX12/Vulkan to get them out of DX11 optimization sink hole, only for that gamble to massively fail, and they ended up with a still not up to snuff DX11 performance and a broken, scarce DX12 implementation.
 
AMD however could put an acceleration unit in the CPU
I do not know much about HW, but if you use it on GPU, you likely want to generate it there as well, if you implement FF for it.

But a modified question would be: Would it make sense to generate BVH on CPU? I would say yes, especially if CPU and GPU memory is shared. It would make sense to generate top level hierarchy on CPU, but do the refitting of the low level branches on GPU for example.
 
A comment from Ben Archard in DF's Metro interview:

In terms of the viability of RT on next generation consoles, the hardware doesn't have to be specifically RTX cores. Those cores aren't the only thing that matters when it comes to ray tracing. They are fixed function hardware that speed up the calculations specifically relating to the BVH intersection tests. Those calculations can be done in standard compute if the computer cores are numerous and fast enough (which we believe they will be on the next gen consoles).
 
BVH tests need to be performed on the GPU if part of raytracing, and it should be a very parallel problem (?). However, BVH construction could be performed anywhere, and that's slow. RTX builds a black-box BVH - I guess that could be done on CPU at the moment. Do we know where RTX performs BVH builds? As you say, BVH could (should!) be versatile in structure in too, so the parts (traversal and building) should be fast and flexible.
 
One more point is the API itself - what's wrong with DXR?

DXR has the philosophy of classical CPU raytracing, which is like this, e.g. for path tracing:
Trace a ray, trace another ray at the intersection towards light, continue recursively with a random reflection direction. Envetually stop the recursion after some time by a random decision. Eventually continue with two rays, one for reflection, another for refraction.
DXR makes it easy to implement this just in the same fashion as we did on CPU.
This fits both companies very well. Both try to bind users to their ecosystem, their proprietary solutions. The goal here is never best performance, it is much more making it easy to adopt but hard to replace after that. IMHO.

Now if we look how GPUs work, the above is quite the opposite of what we know to be efficient. They might implement efficiency under the hood, they might not, but they miss a lot of options only the user could utilize.
On GPU you implement above algorithm more likely like this: Start rays from all pixels, segment them by distance to select LOD, trace the sepments closest to origin. After ALL rays found intersections, process all intersections in parallel, remove the terminated rays. And then continue. The difference is that you do all steps for ALL rays (or at least a larger batch), not a single one. And very likely you do not even care for refractions because there is more important stuff on the plate, and you do not care for paths of random length because you use faster surface caching if you want multiple bounces to get them for free.
So the flexibility offered by DXR is neither necessary nor welcome if you care for performance and have realistic expections for games. It is helpful only for noobs (or offline, ofc!) IMHO.
 
So my point was, RTX is not the only way to get RTRT out there.

True.

Someone might come out with a fast software version (on very fast compute hardware) and I’m sure at some point AMD will have their method of DXR acceleration. Or we might just end up having a hybrid of both. What’s exciting is that there is potential out there to do things in different ways and we’re all here for it. Waiting. Patiently.

Very fast compute hw. Yes we will see what the future brings us. Amd should have their design ready if its going to be in a 2020/21 console.

I had a 280x and it was more than twice faster than GTX Titan

Ok thats quite abit faster. Had a 7970 in 2012 and it had rather fast performance coming close to a 680 sometimes. At least equal to a 670. In non RT/compute things at least :)

So how can you present your assumptions, drawn only from observation and listening to marketing blah as facts?

Im looking at facts, things that happen. Right now a 3000 dollar gpu gets beaten by a 2080 or even 2070/60 in RT apps. I dont care for marketing or promises, results do give me impressions.
Those RT cores seem to have an advantage right now. What the future will do, as another said "patiently waiting".

You want it all

Who doesnt :) I do understand, but i dont think lower then (reconstructed) 4k is going to be a thing. Fps i dont know but going after YT/forums etc ppl seem to be screaming after 60fps. A rather large amount also expects RT.
But i agree.

Just because some rich kids are stupid enough to pay 3k for a GPU, this does not mean that GPU is any faster than the regular consumer topmodel, and you know this.

Titan V is a monster, without RT. Also not everyone is stupid per se cause they bought it.

A comment from Ben Archard in DF's Metro interview:

Posted couple a times before :p
We know RT can be done on any gpu, somewhere your giving in performance though.
 
BVH tests need to be performed on the GPU if part of raytracing, and it should be a very parallel problem (?). However, BVH construction could be performed anywhere, and that's slow
Hmm... one could also say the opposite: Tree construction is more parallel friendly than traversing! I really see it this way. Ofc it always depends...
 
Im looking at facts, things that happen. Right now a 3000 dollar gpu gets beaten by a 2080 or even 2070/60 in RT apps.
The problem is that Volta did not had a consumer counter part. If it had, the Ti model would have been almost equally fast as usual, and you could not use the price as an argument.
 
If tree construction is parallel, what GPU considerations would be best for accelerating that? What tweaks can be made to the compute units to improve BVH construction and intersection tests without needing FF HW?
 
If tree construction is parallel, what GPU considerations would be best for accelerating that? What tweaks can be made to the compute units to improve BVH construction and intersection tests without needing FF HW?
It's one of those infinite cases where all you need is work generation on GPU.

Here an example algorithm, assuming each node has 8 children like an octree.

1. Compute bounding box over entire scene and make it the root node.
2. parallel for each object: determinate child octant by relating object center to bbox center. Increase counter or each octant.
After that you know which child needs to store how many objects. Then create child nodes, and sort the objects to children, so each child has its objects in linear memory.
3. Continue with step 2 until some criterion like max object count or max tree levels is met.

The problem is, at the very start you have only one node, later 8, then 64 etc... This is not enough work to saturate a GPU, but because the results of the previous step are input of the next, you need to insert a barrier which stalls the whole queue.
The only option to keep the GPU saturated it async compute with other unrelated tasks.

Also you do not know when the process terminates - maybe after 5 steps all is done? But you need to prerecord commands for all tree levels, maybe 20, and you need to execute all useless barriers for nothing at each frame.

If GPU can create its own work on demand and care for sync itself, the win would be very big.


Another option for BVH construction is conservative rasterization. So like with VCT you raster all triangles to bin them to a unifrom grid, and you build a BVH from that.
But we already have FF HW for this, the remaining problem is again the same as mentioned above.


Note that BVH is not necessary at all, you can use uniform grid as well (also multiple levels). But you need to know your bounds and density, so a uniform grid is too restircted for a general purpose solution / API. For games the simple grid can be the fastest solution very likelly.


Of course you can make FF to build trees. Example above is the simplest way to do it maybe. There is infinite research in RT which tree structure is fastest. But if the scene is dynamic, usually it's better to build the tree quickly than to build a high quality one.
So using FF you could make a better tree and still faster. But it's very hard to say what is a good tree for any kind of scene.
(Personally i would not want BVH hardware.)
 
Were not the ones thinking amd/consoles are "gonna change the world"
I don't think consoles will change the world. I just think the world of gaming will not change untill consoles have caught up. That's how it has been for the last 20 years.
 
The situation in the PC ecosystem is massively different than consoles, NVIDIA holds the majority of PC GPUs, as such gets the lion share of engine optimizations and game support. For nearly a decade AMD enjoyed complete dominance on consoles, yet that never translated into anything meaningful on PC, and it never made them compete head to head with NVIDIA or hold back it's march of new features, in fact AMD regressed further than ever compared to NVIDIA on PC, on all fronts.

We are arguing different things. I never claimed AMD's presence in consoles would magically make it sell more on PC. Anyone who follows the real time graphics scene knows that hasn't been the case for them.
I do claim, though, that consoles are the baseline by which almost everything that ends on PC was designed. If consoles don't have HW RT features, HW RT will continue to be a bolted-on afterthought implemented and financed mostly by Nvidia just like it is now.


If AMD continues to no offer RT RT on their PC lineup, this will ensure that RTX remains the "defacto" standard for RT on PC for a long time, all RT capable hardware will be NVIDIA only, as NVIDIA will notoriously hammer their RT advantage in all of their 7nm GPUs, this will drive developers more toward developing RT for RTX only (as they do now), the common RT platform on PC.

No. If consoles don't have HW acceleration for RT, RTX will remain the "defacto" niche afterthought, just like AMD's HW tesselation was until consoles actually adopted it, by that time, Nvidia came in late to the party and bruteforced a solution that outperformed AMD's that had more years of experience.
The difference being, unlike Nvidia, AMD didn't have the resources to send engineers to every major studio to build tessellation patches for their PC ports to use as a marketing tool later. Nvidia actually did do that once THEY implemented tessellation on their cards, because they had the money to do it. Still, even now that modern consoles do support tessellation, it is not a generation defining feature. Some games use it here and there, some use it more extensively, and the majority does not use it at all. Such could be the future of bolted-on RT HW on next gen consoles.

AMD being late with RT means AMD repeating the same mistakes that lead NVIDIA to hold all the cards with CUDA, AI and VR. This is a dangerous gamble that almost always never pays off. AMD relied too much on Mantle/DX12/Vulkan to get them out of DX11 optimization sink hole, only for that gamble to massively fail, and they ended up with a still not up to snuff DX11 performance and a broken, scarce DX12 implementation.

CUDA is only still a thing on professional software. It is irrelevant for consumer games, which is what I'm discussing. For games, it only ever was a thing in the same fashion RTX is now: bolted-on patches built mostly by Nvidia themselves. Games really only started using GPU compute effectively once it became an ubiquitous GPU feature, which was highly determined by it's presence on those pesky consoles and their talking dog.
 
Last edited:
I dont think that Raytracing needs consoles for adaption. Unreal Engine 4 supports "high level" effects directly in the engine.
It supports dozens of things no game ever utilizes, or if it does, it is as an inneficient ultra settings feature that changes nothing substantial about the gaming landscape.
 
It supports Raytracing as an engine features. No extra developing time is needed.
It's not like you can just click "raytrace" and be done with it, of course it requires extra development time no matter if it's in the engine already or not
 
It supports Raytracing as an engine features. No extra developing time is needed.
What does it take to add raytracing then? Let's say someone has a UE4 game that's been in development a couple of years and is due to release later this year. What does it take to add RTX support to that game?

Edit: Seems you can just switch it on. ;)

Only shadows and reflections though?

unknown.png


unknown.png
 
I really should have gone back an extra page before posting shouldn't I. :oops:

Hehe everyone does it sometimes we cant see everything, was just pointing out people have seen it couple of times :)

The problem is that Volta did not had a consumer counter part. If it had, the Ti model would have been almost equally fast as usual, and you could not use the price as an argument.

2080Ti is quite comparable to a Titan V in normal rasterization, its with RT enabled games that the difference becomes huge.
aHR0cDovL21lZGlhLmJlc3RvZm1pY3JvLmNvbS9QLzgvNzk3OTQ4L29yaWdpbmFsL1Jpc2Utb2YtdGhlLVRvbWItUmFpZGVyLUZQUy0yNTYweDE0NDAtRFgxMi1TU0FBLVZlcnktSGlnaC5wbmc=


Titan V has much more cuda cores, 5120. A 3072bit bus, more ROPS and more ram. It shouldnt be slower then a 2080 in Q2 or BFV etc. Those RT cores are helping out for sure.

I don't think consoles will change the world. I just think the world of gaming will not change untill consoles have caught up.

Well, RT has finally made it to gaming, before any console did.

That's how it has been for the last 20 years.

No thats not how it has been, not tech or hardware wise atleast. When PS2/Xbox/GC where around pc gaming saw the titles like HL2, Doom 3, Far Cry (their xbox variants werent even close) etc, with far more advanced graphics then console games. Things where moving fast. With 7th gen Crysis came around the corner, it was by far the most advanced game of its time. With the arrival of DX9 games sure did change, so did for DX10. Its not that the world waited for consoles to be exactly.

I dont think that Raytracing needs consoles for adaption. Unreal Engine 4 supports "high level" effects directly in the engine.

Exactly, consoles havent been holding back anything, to be fair. RT has come now even though PS4/One dont have it, hell we dont even know what PS5 will be doing. We have MS, many seem to forget that? They are both in the console and PC space, anything xbox lands on PC. Their next Halo supposedly is being developed with pc in mind, we might see some intresting things there. Xbox and Windows are more and more becoming united platform, i think thats one of microsofts strengths.

If consoles don't have HW RT features, HW RT will continue to be a bolted-on afterthought implemented and financed mostly by Nvidia just like it is now.

You dont know that, besides, MS has a bigger chance of DXR implementation then Sony, and they share with the PC platform. Perhaps, MS is going to support DXR on pc as an extra feature over xbox, or for X and Windows? All wild guesswork just like yours.

No. If consoles don't have HW acceleration for RT, RTX will remain the "defacto" niche afterthought

IMO, its not niche allready now if theres rather many games going to support it (if that list is correct/true), for being such a new tech, im quite suprised how many its. BFV, Tomb Raider, Atomic Heart, Metro Exodus etc
https://www.forbes.com/sites/marcoc...mes-that-will-support-nvidias-rtx-technology/

Then we have things like Quake 2 RT which could land more such games. With the launch of thje 2060, install base has a bigger chance of growing then with just 2070 and up.

A 2060 can perform very decently with RT, it gives you the ability to enjoy RT if one wants, quite many titles support it as per the link i showed. Just the Q2 RT i think is very impressive.


 
Status
Not open for further replies.
Back
Top