If true, that's make DLSS as an upscale solution even more silicon inefficient.
To be clear, I am only describing upscaling games, which DavidGraham seems to be unaware of the state of the art, and which we know works wonderfully well in compute. So the case for DLSS
on Tensor (Turing implemention, as this thread is about Turing) is can it upscale better/cheaper? That means comparing results to silicon costs. And if, as you say, DLSS can be performed on compute, then it's value needs to factored into that.
The major argument you're presenting is Tensor cores are worth it anyway and if they're in there, they can be used for upscaling, which may be a fair argument, but that doesn't disprove Tensor cores are an efficient solution just for upscaling.
lol from this perspective yes I suppose you are correct. To counter point I would have to bring in the analogy of dedicated hardware blocks such as HVEC decoding/encoding vs Compute or SHAPE audio blocks vs True Audio on Compute. In both instances choices were made to leverage dedicated silicon to create a specific functionality to run at higher speeds or use less resources than the compute based method. I have no doubts that we could do HVEC decoding and amazing audio over compute, we know it works, but the hardware was created none the less, likely because the pay off in performing these actions over fixed function hardware, given the available silicon made a lot of sense, even for some features that are not necessarily used all the time.
If I take that concept and draw it over to TensorCores, specifically, let's just assume that TensorCores could not support anything else but DLSS, then we're looking at tensorcores as a fixed function hardware whose main objective is to upscale and antialias at the cost of silicon vs using that space for more compute for instance.
So for the sake of discussion there are some assumptions I need to make, the first that Temporal AA and Temporal Injection should be similar enough in nature such that the performance cost and output should be similar. The second that, we take JHH demo at value, and that he's is not attempting to deceive the audience.
That being said, here we see TAA vs DLSS. And in summary, without watching the video, they present that DLSS has better upscaling and enough performance increase to be considered as at least a deviation, or from what I can see, or 2 better than what we're seeing on TAA.
Now, I don't have enough data points to prove this, but there are other assumptions we need to make as well. One is, because the tensor cores are separate from compute, they are doing their thing without trashing things on the compute side of things. It should in theory be able to go full-tilt without trashing things on the compute side. On the TAA side, i have to assume there is, to some degree, some drawbacks on the pipeline because you're asking compute to do everything from rendering to upscaling etc. There may also be some implementation restrictions I may not be fully aware of so perhaps it may be a reasoning point as to why we may not yet be seeing widespread adoption across all titles.
That being said if we can see that DLSS performance is indeed better in both quality and it can process faster, thus the increase in the frame rate, the only remaining question is how much silicon is being used to support tensor cores which I don't know either. But the question then becomes whether that space with even more compute could perform equally as well here running DLSS or say TAA. And we still need to consider things like architecture, and being able to feed the compute, shared caches etc. It's not so simple as just beefing up the compute values. Whereas tensor cores are a dedicated unit that don't necessarily need to interact with the compute environment as far as I understand. If we go back to the original example of dedicated HVEC and Shape; you can't just add more CUS, like for Xbox can't go from 12 to 13 just because we remove the dedicated blocks. Perhaps you'd move to 16 Compute units (2 groups of 8) and make 1 redundant to have 14 CUs. But perhaps you don't have space for that either. But enough space for say... a small dedicated block.
Quick throwback here:
But Xbox has 14 CUS, and PS4 has 20 CUs. Even if you removed all the esram, we're looking at only fitting 3 more pairs of CUs in there to get to 20. And so I don't believe we can just generalize too easily that removing the dedicated blocks for more compute is straightforward.
To round out the discussion, it also does more than just DLSS. And I think that's an important aspect as well. As we find more ways we can leverage NN for games, I think having an AI accelerator makes a lot of sense. And I think you know, derailing the thread, there is sufficient evidence we see the industry moving in this direction.
Machine Learning Graphics Intern:
https://www.indeed.com/viewjob?jk=0...g+in+Games&tk=1d18ti15d5ico803&from=web&vjs=3
Machine Learning Engineer at Embark - new studio founded among many senior devs including Johan repi, who posts here from time to time
Machine Learning Engineer
Machine Learning at Ubisoft MTL
https://jobs.smartrecruiters.com/Ubisoft2/105304105-machine-learning-specialist-data-scientist
Senior AI Software Engineer
https://ea.gr8people.com/index.gp?opportunityID=152736&method=cappportal.showJob&sysLayoutID=122
EA has an entire AI/NN ML division within Seed apparently
https://www.ibtimes.com/ea-e3-2017-ai-machine-learning-research-division-launches-2551067
This is a small list, but I suspect to see it continue to grow over the years. With so much being poured into this area of research I am not opposed to dedicating silicon to support this entirely separate function for faster performance in this area. Yes compute can do it, but compute cannot do it faster than tensor cores can at least when concerning running tensorflow based models.