Current console API is better, not HW.
Ok, don't think anyone is arguing with you there.
Current console API is better, not HW.
More programmable than the "normal" cores? So a second full programmable co processor just for raytracing?
The question is what's the cost of this flexibility? Would you be okay with RT performance dropping 2-3 times universally just so some (not all) engines be able to use h/w RT in a "nice" way (as in instead of relying on hacks like all graphics do)? What would that give us? Nanite native mesh RT at performance lower than that of compute based Lumen?I also don't understand the discussion. Are people really against offering more programmable hardware? I see the consoles as having more flexible APIs in the RT space, even if the differences are fairly small. Like if DXR1.2 comes out and allows developers to write some kind of custom traversal, are people going to argue against it? Like, asking people to prove programmability is good is kind of weird.
It makes sense with "fixed function" hardware at first (NVIDIA's RT performance over AMD shows that current "compute units" do not have sufficient power when compared") and then migrate it over when the " compute units" are capable enough.
But that's always been the case. Consoles allow you to spend dev cycles programming to the metal because there's only one or a handful of hardware platforms to support. With PCs you need abstraction, which is a boon for many but a bane for ninja devs. Always has been, always will be.Ok, don't think anyone is arguing with you there.
Dont understand this. You can not make "general compute units" more capable for raytracing. Professionel ISV have used GPUs for raytracing since years. nVidia has a pure software solution with Optix. And yet nearly everyone has adapted hardware accelerated raytracing.
Obviously "new" cores or compute units are necessary for hardware accelerated raytracing.
No, in context of open BVH and Nanite that's not the question, because it does not come up:The question is what's the cost of this flexibility? Would you be okay with RT performance dropping 2-3 times universally just so some (not all) engines be able to use h/w RT in a "nice" way (as in instead of relying on hacks like all graphics do)? What would that give us? Nanite native mesh RT at performance lower than that of compute based Lumen?
I think @JoeJ 's complaint is that DXR's abstraction level is even higher than usual. I could see that, it's wrapping a fairly complex set of hardware primitives. The API will evolve with time (as will the hardware), but it'll never reach console-level control.
Yes. But that's solvable, e.g. using abstracted shader language structures and/or functions to access nodes and set child pointers, and then running some post process for vendor compression.The irony is that in order to give developers flexibility you have to limit the flexibility of the hardware implementation by mandating a specific acceleration structure. This would probably also mean mandating how compression of that structure works. This is no different to mandating the structure of a triangle strip or a texture compression format. The only difference is that triangle strips have an obvious “best” representation. This isn’t the case for RT acceleration structures so Microsoft decided to punt. Or the IHVs demanded control.
We only need to modify the BVH data to update partial mesh clusters to switch geometry resolution. The result is again static and regular triangle meshes.
Yes. But that's solvable, e.g. using abstracted shader language structures and/or functions to access nodes and set child pointers, and then running some post process for vendor compression.
Such post process could even handle conversion from BVH4 to BVH8 for example, to make it really easy for the devs.
Though, personally i think this would end up again compromising performance or having limitations still, not sure.
Seems better to get started with vendor extensions, and make DXR API after it turns out what's the differences, practices, problems, etc.
We see on console, where there is only one vendor, it's quite easy. And treating all vendors specifically seams easier than forcing conventions on them yet.
Yeah, in theory. But then we have much too many BLASes, and building huge TLAS every frame takes too long.If we treat each cluster as its own BLAS then we can accomplish LOD today with DXR. Just delete/rebuild the BLAS as needed.
The data structure we need is a node, it's child pointers / triangle indices. The developer knows which node refers to which patch of triangles by linking this to his own tree he uses to select LOD.What data structure would allow you to do this? Unless you treat each cluster as its own BLAS there is no practical way for a developer to reference a specific node or treelet within the BVH. How would they even know where to look for the geometry they want to modify?
The question is what's the cost of this flexibility? Would you be okay with RT performance dropping 2-3 times universally just so some (not all) engines be able to use h/w RT in a "nice" way (as in instead of relying on hacks like all graphics do)? What would that give us? Nanite native mesh RT at performance lower than that of compute based Lumen?
There's a reason why current APIs are limited. This reason is performance. Full flexibility gives you general compute, good luck using it for per pixel RT.
Console APIs are not better either since if that would be the case we would have seen the advantages they would provide already.
Yeah, in theory. But then we have much too many BLASes, and building huge TLAS every frame takes too long.
That's bad, we still talk about static geometry, se we can keep the node count for TLAS as small as is.
Now we could maybe add some more levels of interaction here, like TL, BL0, BL1, BL2. Maybe this is what Karis has in mind from is twitter posts. IDK, my own proposals here are just my personal visions and may differ from Epics ideas.
The data structure we need is a node, it's child pointers / triangle indices. The developer knows which node refers to which patch of triangles by linking this to his own tree he uses to select LOD.
(Notice this means we still have duplicated AS data this way: ours and RT BVH. A traversal shader in a flexible future could eventually work using just one, totally custom AS. But that's far fetched, and not sure if we ever want this.)
Edit: Ofc. we also need to set bounding boxes per node, and most difficult: We need to generate / delete nodes, involving memory management and compaction problems, and eventually expected memory orderings from HW. Depending on HW, this can be quite a big problem.
Thus i want vendor extensions first, to be sure.If an IHV chooses to go their own way they will have to pay the cost of constantly converting back and forth from Microsoft’s data structure. Not worth it.
You have to understand to politics of the situation, only NVIDIA seems to regard RT highly, and they think their current approach is the optimal one.Thus i want vendor extensions first, to be sure.
Going back two decades, we had two generations of fixed function hardware TnL, before some more configurability was introduced and full programmability came even later. Maybe it will be the same with RT. A few generations with limited functionality just to get developers accustomed to and establish best practices all the while building an installed hw base. Then expanding RT into more flexible approaches while at the same time the compute part might be fast enough so we can do away with hybrid approaches.
I also don't understand the discussion. Are people really against offering more programmable hardware? I see the consoles as having more flexible APIs in the RT space, even if the differences are fairly small. Like if DXR1.2 comes out and allows developers to write some kind of custom traversal, are people going to argue against it? Like, asking people to prove programmability is good is kind of weird.
Current console API is better, not HW. Console perf. being less than high end PC is nothing new and can be scaled as usual. It's not relevant to the API topic at all.
i'm out of this for some while.
I don't think people are against customization at the lowest possible of levels. I think the debate is around perspective of what needs to arrive first: faster generic speed, at the cost of customization, or slower generic speed with customization.
As many have stated earlier, hardware TnL is how we started before moving to compute. I mean admittedly, a lot of games still leverage the 3D pipeline even though compute exists. So there is some benefit (to teams) to having a faster fixed function pipeline (as an option) I suspect, even though I believe in fully compute based engines. And in the same path, I believe RT does in fact need to go in the direction of generic compute over time. I think the debate is whether it needs to move there at the starting line, or to go there over time. As @JoeJ ends here, we took our learnings already from compute so why take the step back. I definitely see the argument from both sides, and imo there is no clear winner here that I can see, perhaps it just comes down to preference.
Perhaps it's not for us to debate, this is probably something that IHV may need to chime in. Nvidia knows precisely why they went one route, AMD another, and MS attempting trying to create an API to support all IHVs. I wish Max McCullen was still here, perhaps he could chime in.
Imagining the hypothetical scenario where you could install and run full Windows10 on a PS5 or Series X, I doubt that the console APIs offer less performance than DXR for exactly the same hardware. Whatever additional flexibilty the console APIs have probably does not come with any overall performance cost. So I don't really know what people are arguing. We have the hardware. I don't know if there are features that are not exposed in DXR. I'm not exactly sure what extra options you have for manipulating the ray tracing data in the console APIs. I just think it's kind of plainly true that over time the DXR api will expose more access to hardware features and more opportunities to manipulate the data. This seems non controversial. I'm not talking about weird hypothetical scenarios like what if DXR never existed and we could have a do-over and redesign the ray tracing hardware to work differently and be a generalized compute function.
I'm far from optimistc.if you think they are going to waste time on creating custom extensions on PC for RDNA2 after all of that, then I would say you are unrealistically optimistic.
hehe, ok - maybe i'm more optimistic than i admitNo your not