Next Generation Hardware Speculation with a Technical Spin [2018]

Status
Not open for further replies.
Just some questions.

Does Nvidia ever done "deals" whether shady, legal, etc. to incentivize their adoption of certain tech like PhysX?

Could they also be doing this with RT?

It such common practice within the industry?
Of course it's common practice, but there's nothing shady about paying (either in money, engineer manhours or both) for adopting certain technologies
 
VR requires a decent GPU + headset. Inconvenience with the headset, massively limiting adoption.
RT requires only a decent GPU. Which means massively wider adoption as the tech claws it's way through lower GPU tiers in it's second iteration.

ie, VR is not the same as RT.

RT is the same thing as DX11 when it was new. Only decent GPUs were useful for DX11 games. Then lower end GPUs started getting better and better. Now they do DX11 well. DX11 games are now the standard in the industry.

And we are not talking about RT only games here, we are talking normal games + RT extras. Which doesn't block RT from reaching wider audience at all, in fact this massively increases it's chance of being adopted.

I’m on the fence with RT on next gen, as per previous posts, but I would have to agree with this.

RT - VR/3D is not a very good analogy, as it’s just not very logical.

We are all working under the assumption that we will buy next gen consoles. And that they will work on the TV we already have, or will buy at some point or another if we upgrade.

Under that assumption, RT might or might not be a feature in the GPU. That’s that really.

With VR, that assumption breaks. Simply because we are no longer assuming that we would buy new consoles (to be played on our TVs), but ALSO a VR headset on top of that, and in my case also the bloody camera and Move controllers, cause I’m stupid like that.

It’s just not a great analogy.
 
True.
Flexible units are slower versus dedicate hardware, but if the algorithms they run are faster, it's a win. This is why GPUs have moved to compute rather than pushing a certain number of techniques even faster, and it means we can have games like Dreams using SDF and tracing rays which would have been impossible if shaders had remained fixed looking at shading vertices and pixels.
Ray tracing against triangles might not be the fastest algorithm, but you see, triangles are and work everywhere, SDFs... not. Ease of adoption is just as important as speed at the beginning. You'll still have compute units to do constrained experimental things, they're not going anywhere.
 
SDFs is just an example of a technology nobody would have predicted when designing the current-gen console, enabled by them being flexible. We've no knowing what might be developed over the coming years, which is where flexibility comes in. That's the basis for wanting a software rasteriser, and increased flexibility in shaders and compute.

Look at XB360 and its smart eDRAM. That was great for MSAA on tiled rendering, but games started to move towards deferred rendering with fat buffers and that design was constrained to a particular philosophy. Or, look at the Amiga with its great 2D performance in custom hardware which meant 3D graphics were slow and inefficient. Building hardware around a specific software solution generally does not work well long-term when software ideas move on. You want to give devs the best hardware 'tools' and options. That's not saying RT hardware isn't, but it's saying RT might not be. It might have a place, or it might be a mistake to commit to it.
 
SDFs is just an example of a technology nobody would have predicted when designing the current-gen console, enabled by them being flexible. We've no knowing what might be developed over the coming years, which is where flexibility comes in. That's the basis for wanting a software rasteriser, and increased flexibility in shaders and compute.

Look at XB360 and its smart eDRAM. That was great for MSAA on tiled rendering, but games started to move towards deferred rendering with fat buffers and that design was constrained to a particular philosophy. Or, look at the Amiga with its great 2D performance in custom hardware which meant 3D graphics were slow and inefficient. Building hardware around a specific software solution generally does not work well long-term when software ideas move on. You want to give devs the best hardware 'tools' and options. That's not saying RT hardware isn't, but it's saying RT might not be. It might have a place, or it might be a mistake to commit to it.
A technology nobody would have predicted and pretty much nobody uses.

You've said it before, we should base our predictions on things that we know exist instead of unknown hypotheticals. That's even more true for console hardware designers. Specially in that world, do you think they'd favor flexibility for a handful of games over speed for the majority? I think not.
 
I think it’s increasingly clear that Ray Tracing, or DXR was poorly marketed and the understanding of how it works is incomplete in most circles.

In this post I will attempt to reset the baseline of understanding for our discussions here, starting with Machine Learning, discussing DXR, and finally how the solution comes together. Although I may (**will**, but not willfully ;)) post some inaccurate statements that will be corrected by more knowledgeable members, the big picture of Ray Tracing in graphics is the message I want to get across, and hopefully I’m able to change your opinion of what to expect in Ray Tracing, and be able to shed light on why graphics hardware and solutions will undergo a dramatic change in direction from our standard rasterization path.

To begin, before we talk about Ray Tracing, we really need to start at Machine Learning and Neural Networks. Without getting into a deep discussion, Neural Networks and ML are solutions that absolutely excel at solving computationally exhausting problems through accurate approximation and trained modelling. TLDR; it can solve problems that would take an enormous or impossible amount of computation with a fraction of the resources and time. To put this into perspective, ideal problems for creating AI is to solve problems that are computationally impossible. Google challenged themselves to create the best possible AI for the game GO. What makes GO special is that there are more permutations of the game than number of atoms in the known universe, possibly universes, I believe it’s 361^(2.08 × 10^170) or 10^720 possible games for every single atom in the known universe. So the idea of brute forcing an AI is out. Through neural networks Google was able to make a multi policy setup that would go on to destroy the worlds best GO champion 4 games out of 5. (There is a documentary on this called Alpha Go on Netflix, worth watching). The same AI, Alpha Go, would then go up against the #1 chess AI Stockfish 8. And in 4 hours of training itself chess, it beat Stockfish 8 (28 wins, 72 draws, and zero losses).

What’s of importance in this comparison is the following:

* Alpha-zero evaluates 80k moves per second
* Stockfish evaluate 70 million moves per second

With each bout that Alpha-zero plays Stockfish 8, the gap between wins and draws widens. Stockfish 8 has yet to beat alpha-zero despite its ability to evaluate magnitudes more moves per second.

There are sorts of other big success stories of neural networks and AI, mainly computer vision and self driving cars, but we’ll leave that for you to research. One thing to recognize is that a majority of running these ML solutions have been done on GPUs and CPUs. Self driving for instance, is done on a Tegra K1/X1 on Tesla vehicles.

The history here is important to note because when we discuss computationally impossible things to calculate, neural networks naturally becomes a solution to the problem. When we look at things like 4K resolution, 8K resolution and Ray Tracing, all of these share the fundamental problem of being computationally intensive to the point of inefficiency, or currently out of reach. For instance a completely path traced image at 4K is just not feasible. But there are also some other items that have been out of our reach, in particular fluid dynamics, tessellation, cloth physics, destruction etc. All problems that can be solved using neural networks to approximate the output.

So if we integrate Machine Learning into video games, then naturally we should be able to solve all those computational problems with accurate approximation. Though you should asking yourself that if we have compute today why hasn’t Machine Learning been able to take part of video games for the last 5 years or so? Well there are problems with machine learning, namely the libraries and Apis that support them are frankly way too slow and way too black boxed to be used in real time operations.

Enter Direct ML.

Direct ML is developed specifically for real time ML applications, and is marketed as the DX12 of machine learning APIs. With the full release of Direct ML in Spring of 2019, we can finally begin to see ML applications in games. But that isn’t to say they haven’t been working on it already. There are currently 3 main marketed uses of ML in games today, denoising, anti aliasing and AI resolution upscale. 2 we’ve seen live with nvidia presentations, the 3rd can be found here:

http://on-demand.gputechconf.com/si...-gpu-inferencing-directml-and-directx-12.html
(FF to 15 minutes to live demo)

In this presentation of Direct ML the goal is to get a very normal computer with a normal GPU to upscale Forza Horizon 3 from 1080p and low quality AF, to 4K with higher AF settings. And it does this fairly well if you watch the video. It’s important to note that they were able to move 4x the resolution with a relatively normal GPU. I’m going to assume normal doesn’t not mean 1070+. This all completed on normal compute hardware which makes it impressive, but is there a way to make this go faster for less?

Enter Tensor Cores.

Before the invention of 3D accelerators (GPUs) CPUs had the role of generating graphics. And GPUs came into the industry as being simpler processors but could do large amounts of math in parallel. And as GPUs continued to evolve, they have been more and more capable at doing specifically that type of work such that machine learning solutions became available. Until recently our latest professional GPUs with high amount of compute power were the best hardware available for neural networks, until Tensor cores were created. Tensor cores are to neural networks, as GPUs are to 3D graphics. Tensor cores are even simpler cores, whose purpose is to accelerate the speed of computing neural networks; so while a Titan V is capable of 20+ TF of half precision math, Tensor cores are hitting closer to 500 TF of half precision power (not a good comparison due to silicon size but w/e). Tensor cores are quickly the new rage in the ML field, and we get to see them enter the consumer market with the introduction of Turing, or the RTX line of nvidia GPUs.

Now that we have established the roles and purpose of ML in games, we can make the statement that it is Machine Learning that is being used to solve Ray Tracing, not the Ray Tracing cores. As ray tracing computation continues to increase there is no hardware that would be able to solve that problem in a real time method. Thus we use denoising to enable developers to use less rays per pixel and leverage Machine learning to fill in the rest. And that is where RT cores/hardware comes to play. Neural networks need to work off inputs before it can generate the approximations. So the RT cores are there to assist in building a bare bones RT model for approximation to take over.

The RT cores alone is insufficient in having enough power to generate the type of images the consumer wants for next generation graphics, but with the assistance of ML, this problem is solvable. But what are RT cores? Today we know them as BVH accelerators, add-ons onto the compute engine whose sole purpose to hold and update ray data so that the whole scene does not need to be fully computed over each frame. That being said, my understanding that outside of BVH, the compute of GPUs is still processing ray tracing.

pt 2 below.
 
If we are able to combine AI-up-resolution and denoising, we are given the ability to upscale a 1080p high quality RT image up to 4K, or with AI anti-aliasing meeting the demands of different clientele in the consumer space. If this is possible I do not know, but the pipeline is not straight forward for either scenario and likely more complex when combined. But I’m almost positive someone is researching how it’s done.

The last topic of discussion will need to come down to Turing itself. Did Nvidia shoehorn ML into gaming? Are Tensor Cores and RT cores too fixed function to work for modern gaming and did nvidia make the right move to waste silicon for tensor and RT?

If you’ve been following me up to now, then you should make the connection that rasterization is to 3D graphics, as neural networks is to processing vast amounts of data. It’s inclusion into the hardware is without a doubt an important aspect of the solution thus, I do not believe that Nvidia was shoe-horning their solution onto the industry. The inclusion of tensor cores and what will be possible with Direct ML, what can be produced will be vastly superior to a pure compute solution, things like physics, clothing, fluid simulations are all things compute can do, but machine learning excels at. And such, tensor cores will be able to chew through a ML model significantly faster than half precision compute. Higher resolution and RT is once again, completed significantly faster with less hardware than compute cores, so the need for Tensor is clear. If we are able to maximize Tensor and Machine Learning, the future of graphics will continue to weigh more in favour of machine learning hardware than it will compute power as the applications for ML continue to increase.

This bodes well for the industry, as we can also use Direct ML for non tensor hardware as we saw with the Direct ML demo. It’s possible that older platforms like Xbox One and PS4 could survive in the next generation of graphics by leveraging ML to AI up res or Anti Alias up to 1080p. And mid-gen refreshes doing something similar to get up to 4K. Leaving next generation to be about actually having the tensor cores to lift an incredible amount of ML applications to provide a level of fidelity worth of being called a next generation experience.

And even if it didn’t, a 30% reduction in silicon to support Tensor would still enable developers to push the graphical envelope at 1080p and have tensor anti-alias and up-resolution to 4K, and any other purposes — still ultimately providing a better graphical fidelity than if that silicon was directly purposed towards just more compute.
 
Last edited:
You've said it before, we should base our predictions on things that we know exist instead of unknown hypotheticals. That's even more true for console hardware designers. Specially in that world, do you think they'd favor flexibility for a handful of games over speed for the majority? I think not.

We don't know, since we know nothing about AMD's plans for RealTime RayTracing support.
 
We don't know, since we know nothing about AMD's plans for RealTime RayTracing support.
Software-wise, what they've been doing with Radeon-Rays is quite interesting, though.

https://pro.radeon.com/en/software/radeon-rays/
I would like to see some RTX cards running Radeon-Rays, since it's not exclusive to AMD hardware. :D

I wonder if we will know soon if AMD will eventually include some kind of HW support on their cards in the near future.
 
A technology nobody would have predicted and pretty much nobody uses.

You've said it before, we should base our predictions on things that we know exist instead of unknown hypotheticals. That's even more true for console hardware designers. Specially in that world, do you think they'd favor flexibility for a handful of games over speed for the majority? I think not.

Think of it another way. If we had no generalized shader processor hardware in GPUs, we wouldn't have all the myriad rendering techniques that are being used now.

Imagine if we were still using VLIW hardware with inflexible hardare T&L.

We wouldn't have have the advances in AA that we've had the past 5-10 years. We likely would still be using forward rendering.

While it's certainly nice that NV are getting things kickstarted with the RT based portions of RTX, it's a dead end. It's something that should go away and needs to go away. Just VLIW, fixed function T&L, etc.

Just like fixed function hardware T&L got things kickstarted with that sort of thing, and eventually disappeared for the good of 3D rendering.

Locking people into one way of doing things isn't going to see graphics rendering advance in anything other than the very very short term.

It's even worse when it's a black box and developers don't even know what's going on, much less have the ability to adapt to newer or more efficient algorithms.

RT on RTX is what it is. It's a start, but it's also a dead end. The future for RT isn't fixed function hardware, it's flexible hardware that developers and researchers can use to find more efficient and more novel ways of doing things.

If Graphics rendering has taught us anything over the past 20 years, it's that brute force rendering (which is what current RT is) is not what you want to be doing in games.

RT in games and consoles of the future won't be brute forcing it via fixed function hardware, at least not the smart consoles. They'll be using more flexible approaches that allow developers to more easily adapt to new algorithms for implementing RT.

If one console adopts a faster but inflexible fixed RT solution while another console adopts a slower but more flexible RT solution, the slower RT console will likely be the one with the best RT at the end of that console generation. And it'll also likely be faster in non-RT games than the fixed function RT console.

Regards,
SB
 
Tensor cores are to neural networks, as GPUs are to 3D graphics. Tensor cores are even simpler cores, whose purpose is to accelerate the speed of computing neural networks; so while a Titan V is capable of 20+ TF of half precision math, Tensor cores are hitting closer to 500 TF of half precision power (not a good comparison due to silicon size but w/e).

Are we gonna go full circle with de-unified-but-unified-shaders in the future? :p
 
We don't know, since we know nothing about AMD's plans for RealTime RayTracing support.
Sure but I mean in terms of algorithms. RT we know exists. An algorithm that would be better and faster running on compute is just a hypothetical.

Think of it another way. If we had no generalized shader processor hardware in GPUs, we wouldn't have all the myriad rendering techniques that are being used now.

Imagine if we were still using VLIW hardware with inflexible hardare T&L.

We wouldn't have have the advances in AA that we've had the past 5-10 years. We likely would still be using forward rendering.

While it's certainly nice that NV are getting things kickstarted with the RT based portions of RTX, it's a dead end. It's something that should go away and needs to go away. Just VLIW, fixed function T&L, etc.

Just like fixed function hardware T&L got things kickstarted with that sort of thing, and eventually disappeared for the good of 3D rendering.

Locking people into one way of doing things isn't going to see graphics rendering advance in anything other than the very very short term.

It's even worse when it's a black box and developers don't even know what's going on, much less have the ability to adapt to newer or more efficient algorithms.

RT on RTX is what it is. It's a start, but it's also a dead end. The future for RT isn't fixed function hardware, it's flexible hardware that developers and researchers can use to find more efficient and more novel ways of doing things.

If Graphics rendering has taught us anything over the past 20 years, it's that brute force rendering (which is what current RT is) is not what you want to be doing in games.

RT in games and consoles of the future won't be brute forcing it via fixed function hardware, at least not the smart consoles. They'll be using more flexible approaches that allow developers to more easily adapt to new algorithms for implementing RT.

If one console adopts a faster but inflexible fixed RT solution while another console adopts a slower but more flexible RT solution, the slower RT console will likely be the one with the best RT at the end of that console generation. And it'll also likely be faster in non-RT games than the fixed function RT console.

Regards,
SB
In the long run we'll probably have fully programmable RT. Not now. It's a progression. Also, while the acceleration hardware is fixed function, it doesn't mean it is as restrictive as something like T&L. We already know it can be used for many things other than just lighting calculations such as adaptive anti-aliasing, collision detection and audio simulation. Not only that, the API itself is not fixed function and can run on compute units. In other words, we'll see all kinds of clever hybrid algorithms with some parts running on the RT cores and others on regular compute units. Even in its current manifestation (RTX) it's quite flexible already. It will do for a while.
 
Sure but I mean in terms of algorithms. RT we know exists. An algorithm that would be better and faster running on compute is just a hypothetical.
You've repeatedly said that the first demos of RT aren't the best it can do and it will improve as devs get used to it. That certainty is built on the same concepts by which we know new algorithms will appear, because devs always find better ways to use hardware and always find improving algorithms.

In the long run we'll probably have fully programmable RT.
In the long run, we won't have ray tracing. Ray tracing is just a means to sample volume. A more generalised approach to solving that problem, like getting rid of Vertex and hull shaders, and domain and geometry shaders, and replacing them with task and mesh shaders, which are more generalised.

Not now. It's a progression. Also, while the acceleration hardware is fixed function, it doesn't mean it is as restrictive as something like T&L. We already know it can be used for many things other than just lighting calculations such as adaptive anti-aliasing, collision detection and audio simulation. Not only that, the API itself is not fixed function and can run on compute units. In other words, we'll see all kinds of clever hybrid algorithms with some parts running on the RT cores and others on regular compute units. Even in its current manifestation (RTX) it's quite flexible already. It will do for a while.
That's not in doubt. The question is how much better is than just compute? Looking at Turing, AFAICS the Tensor and BVH blocks are about 50% of the chip, which tallies with the die size versus core count. How much collision detection, audio simluation, and whatnot could be performed on that as just compute? Would it be half as much? 20% less? The same amount?

That's the real question.
 
In the long run we'll probably have fully programmable RT. Not now. It's a progression. Also, while the acceleration hardware is fixed function, it doesn't mean it is as restrictive as something like T&L. We already know it can be used for many things other than just lighting calculations such as adaptive anti-aliasing, collision detection and audio simulation. Not only that, the API itself is not fixed function and can run on compute units. In other words, we'll see all kinds of clever hybrid algorithms with some parts running on the RT cores and others on regular compute units. Even in its current manifestation (RTX) it's quite flexible already. It will do for a while.

Yes, but the point I'm making is that more flexible but slower RT is likely preferable to inflexible but faster RT. Right now, RTX fits into the latter category. Purely compute based solutions lie at the far end of the first category.

What would be good for consoles isn't what RTX is offering it's something more towards the flexible end of the spectrum if it is slower at RT, as long as it isn't also slower in general compute and rendering.

And when I say slower/faster than RTX, I'm talking in terms of performance per mm^2. No console maker with any bit of sanity is putting in a monolithic die even remotely as large as a RTX 2070, for example.

Right now RTX is massively slower in terms of perf/mm^2 than the previous generation GTX cards of an equivalent tier. That wouldn't cut it for consoles.

The future may be RT, but the next console generation isn't it. The next console generation will likely still be mostly traditional rendering with some forays into RT. Going with fixed function RT that can't be used for anything else, would be console suicide.

What's needed for next generation is an architecture that can use its resources for either RT or general rendering or both. Especially as RT algorithms are likely to change and evolve over the course of the consoles lifetime.

A console that is meant to last 5+ years would be committing RT suicide by adopting fixed function RT hardware, IMO.

That's fine for PC where you can easily swap out video cards as algorithms change, but you can't do that with a console.

Regards,
SB
 
I think it’s increasingly clear that Ray Tracing, or DXR was poorly marketed and the understanding of how it works is incomplete in most circles.
...
The history here is important to note because when we discuss computationally impossible things to calculate, neural networks naturally becomes a solution to the problem. When we look at things like 4K resolution, 8K resolution and Ray Tracing, all of these share the fundamental problem of being computationally intensive to the point of inefficiency, or currently out of reach. For instance a completely path traced image at 4K is just not feasible. But there are also some other items that have been out of our reach, in particular fluid dynamics, tessellation, cloth physics, destruction etc. All problems that can be solved using neural networks to approximate the output.
Good post, except missing a significant perspective that contrasts this theory.

If you throw a neural net at antialising a black line on a white background, you could train and execute a model that could get great results, but if you use Xiolin Wu's algorithm, you can draw that line perfectly with very little processing requirement, because the problem can be distilled down to a mathematical solution.

It's not at all given that ML is the best way to solve upscaling. No-one's performed a decent comparison of nVidia's system versus Insomniac's, and Insomniac's upscaling is running on a poky APU after rendering the entire game. There's a good case to be made that conventional algorithm based upscaling on compute could get the same results (or better) as ML based solutions with a fraction of the silicon cost. Which one of the RTX demos wasn't using nVidia's denoising but was running it on compute? The fact I can't readily remember shows it didn't look substantially worse than the ML solution.

So perhaps given two GPUs with 300 mm², on one you'd have 150 mm² of compute and 150 mm² of machine learning and BVH for noisy tracing followed by ML denoising and upscaling, and on the other you'd have 300 mm² of compute that is slower at the ray tracing but solves lighting differently and upscales with reconstructive algorithms, and the end results on screen may be difficult for people to judge.

A GPU 2x the power of PS4 (3.6 TF) could render a PS4-quality game (HZD, Spider Man, GoW) from a secondary position to create reflections, which would look a lot closer to RT'd reflections and not need RTing or denoising at all... ;) And before the pro-RT side says, "but that perspective wouldn't be right," of course it wouldn't, but the question is would it be close enough? Side by side, RT console versus non-RT console, would gamers know the difference? Enough to buy the RT console, all other things being equal?
 
Of course it's common practice, but there's nothing shady about paying (either in money, engineer manhours or both) for adopting certain technologies

Thanks.

Does this mean AMD is gonna have a much harder time since Nvidia has way more money and can moneyhat most devs to focus on Nvidia's RT than whatever AMD's gonna make?
 
That's not in doubt. The question is how much better is than just compute? Looking at Turing, AFAICS the Tensor and BVH blocks are about 50% of the chip, which tallies with the die size versus core count. How much collision detection, audio simluation, and whatnot could be performed on that as just compute? Would it be half as much? 20% less? The same amount?
That's the real question.

Much better. Chaosgroup gets 60% more performance from their first implementation: https://www.chaosgroup.com/blog/profiling-the-nvidia-rtx-cards
 
No. Not in consoles. (Which is the forum this topic is in.)

Could you please expand on that? I'm not very knowledgeable on the matter.

Is it because Sony is partnering with AMD and developers beat mostly to the drum of consoles?

Is MS sure to partner up with AMD as well?

Thanks.
 
Status
Not open for further replies.
Back
Top