Impact of nVidia Turing RayTracing enhanced GPUs on next-gen consoles *spawn

Status
Not open for further replies.
well not really. Someone in another forum calculated the size of the chip, frequency and the average performance in benchmarks of the 2080TI. Yes the card is faster (nothing new here) but if you calculate performance per mm² the performance was almost reduced by 30% per mm².

Turing != Pascal with RT and TensorCores.
GP100 was worse than GP102 for gaming, too. The whole architecture is much more future proof.

Also DLSS has many flaws (btw, also TAA). There are quite pretty upscaling techniques on consoles with only minor flaws. Really don't know why nvidia didn't invest in those. DLSS is really a waste of resources, not that easy to implement after all (only one title so far) and has really mixed results (from "it doesn't do anything at all" to "flickering" to "compression"-artifacts) with a heavy performance hit (compareable to TAA @1800p vs 1440p DLSS (to 4k)).
nvidia is just trying to invent something new, something that isn't optimized for the use-case just to be the first.

Innovation comes from trying. DLSS is brand new. It is the first time that somebody is recreating pictures in real time without programming it.
And most upscaling techniques on consoles are still worse than DLSS with more blur and more artefacts...
 
DLSS has been shown to be "not utterly crappy" only when trained on pre-recorded content (Final Fantasy Bench, Infiltrator tech demo, Porsche Tech demo & now Futurmark bench)….which kind of defies the whole point of it...unless only "playing" benchmarks on your brand new GPU is the new hip thing to do.
Yup, that's a fair remark. If the training data and the test data is the same, we can achieve 100% accuracy, it's not even really pushing the system. A little cheap to present that.

Having said that, they did also showcase Atomic Heart DLSS + RT which is an actual game.

We will know real performance when we see BFV and Anthem and their DLSS implementations and more important when image quality sites get their hands on it.
 
Killer feature would be procedural character animation IMO. Although i work on this myself and i use only physics but no AI, at some point things become complex and fuzzy.
But that's just an example. AI might be useful and spur new innovations we can not predict yet.

Intresting, perhaps more so then upscaling. Its a good thing those tensorcores can be put to use to more then just dlss.
 
To be clear, I am only describing upscaling games, which DavidGraham seems to be unaware of the state of the art
I am well aware of them, we've seen temporal reconstruction used in actual PC games already, in Rainbow Siege and Watch_Dogs 2. I tried them, and they were bad. With edge shimmering and very modest AA capabilities.

DavidGraham's enthusiasm is misplaced because he wasn't aware what was already happening with clever developers and the options fully programmable compute have opened up this gen.
I am aware, I was just bringing Quantum Break as an analogy for a game that uses advanced GI + advanced upscaling to obtain good performance from modest hardware. NVIDIA is basically doing the same with ray tracing, especially for mid range GPUs. I was also reaffirming the success of this model on consoles if they will be using ray tracing.

Having said that, they did also showcase Atomic Heart DLSS + RT which is an actual game.
The full Final Fantasy 15 game already supports DLSS in a "beta" state, supporting only 4K at the moment. Results are not bad so far.

On another note, Radeon VII apparently supports DirectML. Though it does it in "software" mode.

https://www.dsogaming.com/news/amd-...to-nvidias-deep-learning-super-sampling-dlss/
 
The full Final Fantasy 15 game already supports DLSS in a "beta" state, supporting only 4K at the moment. Results are not bad so far.

On another note, Radeon VII apparently supports DirectML. Though it does it in "software" mode.
Great find.

Yea I should be clear that having tensor cores is not the only method to getting AI acceleration performance. it is 1 method. But it does not have to be the only one. This is a good sign, and in all fairness, I would not necessarily consider it 'software' mode. It's is traditionally how we do AI over GPUs and there are things/ways AMD could have changed the architecture to obtain significantly more performance without having to do fixed function hardware like a tensor. Looking at MI60 for instance.
 
there are things/ways AMD could have changed the architecture to obtain significantly more performance without having to do fixed function hardware like a tensor. Looking at MI60 for instance.
AI performance with the MI60 pales in comparison to something like a Volta GPU for example. The Volta is 3 times faster because of it's Tensor Cores.

AMD-MI60-TESLA-V100-Tensor-ResNet-Benchmarks.jpg
 
On another note, Radeon VII apparently supports DirectML. Though it does it in "software" mode.

https://www.dsogaming.com/news/amd-...to-nvidias-deep-learning-super-sampling-dlss/

That should be kind of a given since shader/compute units are a more functional programmable ALU vs the less capable tensor units. ;) It's because of the fixed function nature that makes the tensor units small enough that nV can shove 10 shitloads of them to boost the FP16 throughput (for particular workloads descended from the heavens).
 
AI performance with the MI60 pales in comparison to something like a Volta GPU for example. The Volta is 3 times faster because of it's Tensor Cores.

AMD-MI60-TESLA-V100-Tensor-ResNet-Benchmarks.jpg
Agreed. I think i've seen this before.
This is going to be especially true for neural network setups. When I meant changing the architecture for more performance, I should have been precise in comparing it to a standard GCN compute unit, not a tensor core ;) That's like asking a GPU to compete with an ASIC bitcoin miner... no chance in hell.

2 caveats though we should consider specifically
a) this graph is about DL training performance, and on console we expect to be running the models, not training them :)
b) machine learning outside of 16 bit precision, the MI60 should be very competitive.

The other aspect I would consider is trying to figure out how if possible the MI60 configuration coujld be usable in the console space. something akin to seeing if MI60 compute setup is sufficient to do ML and graphics tasks. I would say to some degree, without knowing the future exactly, the greatest advantage of tensor is its performance. The greatest disadvantage as that it's fixed for 1 type of model, in this case NN tensor models. What if there are more effective and easier algorithms like SVM? random forests? Do we need to always use a deep learning neural network?

I'm not entirely sure what the future of AI is for consoles at least from an application perspective, but these are considerations that, if I were say MS, these are things worth discussing.
It's good to have both.
 
I suppose it's a little curious that nV went to such an extreme on die size (and then R&D for no less than 3 separate ASICs), and shoving in RT/tensor as opposed to more of the traditional GPU bits on there unless there were mitigating factors to scaling up the GPU that way (power density, bandwidth).
The argument there is they made the GPU to sell to lucrative ML and pro imaging markets, and then having designed the silicon for that purpose, put it in a gaming card and tried to find uses for the bits designed for completely different markets. "We need to put in lots of Tensor cores for machine learning. Now how the heck are we going to use that silicon for gaming??"
 
a) this graph is about DL training performance, and on console we expect to be running the models, not training them :)
That's the key point. Consoles aren't likely to be machine learning, so they just have to execute models. Googles new tiny AI chip shows that that can be handled cheap and efficiently. Whether it's worth putting in a tiny bit of tensor silicon, or just supporting the ML implementations in compute, is down to the GPU designers to figure out.
 
That's the key point. Consoles aren't likely to be machine learning, so they just have to execute models. Googles new tiny AI chip shows that that can be handled cheap and efficiently. Whether it's worth putting in a tiny bit of tensor silicon, or just supporting the ML implementations in compute, is down to the GPU designers to figure out.
That part I mulled over as well. Yea it’s going to come down to the designers to best decide what the future holds. A little bit like the async compute queues. I don’t think we ever heard a story of all 8 being filled out and used versus the 2 on Xbox. And in this scenario, not sure how much power is needed to run AI models, like the expectation of it. I can assume that more power will run things faster ie higher frame rate or higher resolution. But it’s not exactly a 1:1. If you put too much power it may never be used.

But then again, it will also come down to architecture of how it’s setup.
 
Given the ubiquity of matrix processing in mobile devices eventually, I can foresee GPUs getting at least a small AI block to parallel the video blocks as something that'll be universal across devices and techniques. It's very hard to say what ML could bring to computer graphics without it being the lowest common denominator and so developed for. We'll likely have a few years (generation) of research papers and small examples until ML is ready to hit mainstream processing and be as much a part of games as compute has become.

Maybe something like having an AI block for computer vision and VR tracking stuff and background removal sort of thing, and then devs repurposing it for interesting game applications. I think the hardware designers are going to want to have a clear, useful purpose in mind for the silicon rather than just chucking some in there and seeing where devs take it, with the risk nothing happens and it's waster silicon.
 
Aren't the tensor cores a key component in their RTRT solution for de-noising?

I don't think BFV uses them, Dice said they used their own solution in compute.

https://www.eurogamer.net/articles/...-battlefield-5s-stunning-rtx-ray-tracing-tech

While DICE is getting some great quality of the rays it shoots, the unfiltered results of ray tracing are still rather noisy and imperfect. To clean that noisiness up, a custom temporal filter is used, along with a separate spatial filter after that to make sure reflections never break down and turn into their grainy unfiltered results. Interestingly, this mean that DICE is not currently using the Nvidia tensor cores or AI-trained de-noising filters to clean up its ray traced reflections.
 
I don't think BFV uses them, Dice said they used their own solution in compute.
Yeah but this thread isn't about BFV's implementation. I'm just referring to their existence in Turing in general.
 
Aren't the tensor cores a key component in their RTRT solution for de-noising?

BFV does not use tensors here - they implemented their own denoising in compute. (Why?)
But does anyone know if Port Royal uses NVs denoisers?

Personally i would say upscaling alone does not justify dedicated hardware. It is not so much work to do and does not need to be perfect. (However i am just not interested in high res at all - i can't see a difference between 4k and 2k+ with my blurry eyes :) )
But if it can do upscaling, RT denoising, and maybe motion tracking for cheap VR to save a lot of expensive sensors...?

And for next next gen there may have been found other applications as well. Synthesized speech maybe - imagine together with locomotion we could get rid of scripted and pre-animated characters completely... revolutionary! And reducing dev costs for thousands of NPCs.
It may also open new options for game design. Currently, controlling a human avatar with limited input devices adds a lot of restrictions. A third person player that acts smarter but still feels responsive would be a huge win. Smart enemies anyways.
AI could even bring entire new genres of games eventually. And this is what we would need the most, with everything becoming quite a bit boring at the moment.
 
BFV does not use tensors here - they implemented their own denoising in compute. (Why?)
A possible theory is that there is too much overhead on ML, which is why we're waiting for directML to be released. (and also the reason DirectML was created in the first place)
All that additional overhead won't get you the results you need in the frametime required.
https://developer.nvidia.com/using-ai-slide

WinML And DirectML


For example, the inference portion of an object recognition AI would be what allows it to recognize objects that it already knows, while the training portion would be what allows it to recognize new objects. Having inference running locally means your PC is able to use an AI without having a connection to a cloud computing service. In the past, inference was too computationally intensive and had to be done on a cloud-computing farm, but with new GPUs and even SOCs integrating specialized hardware, local inference is becoming a possibility.



It is specifically because GPUs, both new and old, are very well suited for processing inference tasks that Microsoft created DirectML, an extension to the Direct3D graphics platform used by most Windows games. DirectML implements inference tasks with compute instructions that GPUs can understand, and is currently the faster choice for WinML tasks. For machines without GPUs, WinML can also send inference tasks to the CPU. To make even more efficient use of GPUs, however, Microsoft is working with GPU designers to implement inference instructions directly into their low-level programming interface.
https://www.tomshardware.co.uk/microsoft-windows-machine-learning-gaming,news-58087.html
 
Status
Not open for further replies.
Back
Top