NVidia Ada Speculation, Rumours and Discussion

Status
Not open for further replies.
Which Ampere GPU do you think is the proper comparison point for each Ada product?
Performance wise you can as I've said make an argument that anything above 3080 10GB can be compared to 4080 12GB.

Price wise it's pretty clear what should be compared to what although top most 30 series cards can have sizeable discounts in practice right now making these a bit moot.
 
4080 16GB has a 16GB of VRAM and likely brings similarly sizeable generational improvements over any Ampere part.
The only card which looks "weak" is 4080 12GB - but mostly because all Ampere lineup above 3080 10GB was condensed into a +-20% of performance. Which makes 4080 12GB vs 3090Ti performance comparisons as relevant as 4080 12GB vs 3080 10GB really.
Too many variables, plus the high prices (even though I understand why) make it possible to craft many narratives around the 4080s, most of which would be negative. Not so with the 4090. It’s all good, but of course it’s catering to a much smaller segment.

But I do see your point about the GA102-heavy Ampere lineup. It was unusual for sure, and was undoubtedly made possible by the cost advantage of the Samsung node. This primo TSMC silicon just stings.

We seriously need more material scientists and solid state physicists than a gazillion freaking machine learning engineers. But that’s not where the glamor is at.
 
It would be hardware based, but you have to leverage it through the api. Similar to the two new hardware accelerators included in the RT cores that require particular sdks to leverage.

Weird, why would it need Nvapi and not just dxr 1.0? Intel didn’t mention anything about proprietary extensions being required for their shader sorting. If that’s true then all of the most interesting Ada features (SER, DMM, OMM) are gated behind extensions and won’t see widespread usage.

DLSS3 is cool tech but I don’t see myself using it especially if Reflex hardware is required. At lower frame rates the motion interpolation is probably noticeable.
 
Performance wise you can as I've said make an argument that anything above 3080 10GB can be compared to 4080 12GB.

Price wise it's pretty clear what should be compared to what although top most 30 series cards can have sizeable discounts in practice right now making these a bit moot.
3080ti to 4080 16gb(matching launch MSRP) is like 25-30% performance increase. Not very substantial IMO. Only the 4090 provides what can be considered a generational improvement.
 
I think Ada is more like Ampere v2 according to the information disclosed so far, unlike the leak by Kopite(and he seems bit confused now).
 
I think Ada is more like Ampere v2 according to the information disclosed so far, unlike the leak by Kopite(and he seems bit confused now).
It's the same base architecture which Nvidia introduced with Volta really. So it's Volta v4 or at least Turing v3 from that perspective. The differences with Ampere are there but their impact on performance will only be apparent in benchmarks.
 
It's the same base architecture which Nvidia introduced with Volta really. So it's Volta v4 or at least Turing v3 from that perspective. The differences with Ampere are there but their impact on performance will only be apparent in benchmarks.
I mean, the only Hopper-derived technology inside Ada seems to be the 4th gen tensor core. We will know the truth when the white paper is released anyway.
 
Video frame do not have motion vectors in them.
Uhm, actually: Yes they do. That's one of the main ways modern video compression works.

Ada should be vastly more powerful than any TV video processor, though it remains to be seen if this game-frame interpolation tech brings any tangible benefit besides inflated PR numbers.
 
I mean, the only Hopper-derived technology inside Ada seems to be the 4th gen tensor core. We will know the truth when the white paper is released anyway.
What makes you think that Hopper is any different on the SM level? It doesn't have RT and has FP64 ALUs instead but that may be the extent of its difference with Lovelace.

Uhm, actually: Yes they do. That's one of the main ways modern video compression works.
These aren't exposed to the TV in any way and cannot be used for frame interpolation.
The big unknown with Ada frame generation is how it is done since the current frame presumably has motion vector data accumulated over the previous several frames for DLSS. Theoretically these can be used for frame generation predictively instead of interpolating between two rendered frames.
Again we need proper testing.

I also thought that it may be interesting to use this motion predicition to help DLSS AI in composing the reconstructed frame instead of generation of a new one. A "temporal DLAA" if you will, improving DLSS reconstruction through OF predicition but without any performance gains (if we can even call frame generation that).
 
Last edited:
Weird, why would it need Nvapi and not just dxr 1.0? Intel didn’t mention anything about proprietary extensions being required for their shader sorting. If that’s true then all of the most interesting Ada features (SER, DMM, OMM) are gated behind extensions and won’t see widespread usage.
Unused hardware bits is normal ...

Tessellation (nearly no modern benchmark suites uses it today), Mesh Shaders (some vendors implement a separate geometry pipeline for it), programmable blending/geometry shaders (performance problems), other VR accelerated rendering features (even if high-end PC VR is clearly dead), and many more etc. ...

Ampere introduced HW accelerated ray traced motion blur too even though the vast majority of developers are going to implement the effect as some post-process pass with motion vectors and call it a day. Hardware VRS might not even matter in the future since it has less utility in a deferred renderer and it can't be used with compute shaders either ...

Who are we to judge IHVs (some of which who pride themselves more than others on excess hardware) since they're their own experts who believe that these features do give themselves a competitive advantage ?
 
Tensor Memory Accelerator, Distributed Shared Memory and Thread Block Cluster(or CPC).
The fact that Nvidia highlights some set of features during their marketing unveil of gaming h/w doesn't mean that this h/w doesn't have the other set of features too - they may just be mostly irrelevant for gaming markets.

This should be more apparent in CUDA feature levels I think.
 
Because they are the same launch MSRP. Why would you compare to a GPU it costs 70% more than?

Cost went up alot in that range, however talking hardware progression-wise 3080 launched 2020 to the 4080 16gb 2022 its quite the improvement even in raster. The 4080 12gb is according to many not a 4080 but a 4070.
 
These aren't exposed to the TV in any way and cannot be used for frame interpolation.
Yeah, no. That is exactly what they do. The decoder uses the motion information between blocks of the encoded frames to interpolate/dream up new frames to put in between them.
 
For me, DLSS is a stopgap until RT performance catches up. No matter the setting, the drop in IQ when using DLSS is obvious in pictures and even more so in motion.

In the same way I like source accuracy in my video hobby, I want native image quality in my games.

So the areas I’ll pay attention to are rasterisation bump over my 3090 and then RT bump in CP2077 Psycho mode. Both without DLSS.
 
This new fake frame thing will also cause weird and unique situations.
Imagine Flight Simulator's super CPU intensive regions, such as being on a Boeign cockpit on JFK Airport in NYK. This combination, from what I remember, becomes CPU bound near 60-70 FPS on a 12900k. Now, in that region with ultra settings, 3090ti is almost able to get 4K/60.
Now 4080 will suddenly provide 120+ FPS in that place (fake, predicted or whatever it is).
That kind of performance will not be able to be replicated with anything on AMD. AMD can push 5x power, they won't still be able to budge above 70 FPS in that location due to huge CPU limitations.
This is also apparent with their Spiderman video. Supposedly they're getting 200 framerate or so with DLSS+Ray Tracing. Even the mightiest 12900k chokes around 90-100 in super CPU intensive areas in the game. This creates the problem where even if AMD provides a super duper Ray Tracing/Raster powered GPU, once they get into that very high refresh rate/frame rate territory they will appear to lag behind due to huge CPU limitations.
I gather either AMD will have to invent this gimmick for themselves, or they will have to rely and bank on that it will be a flop for end user. If it does not end up as a flop and most people like and accept and embrace it: then it will be really troublesome for AMD sadly.
If it ends up being reallly good and reaaly usable, it will also create a weird situation with reviewers, I'd assume.
 
Yeah, no. That is exactly what they do. The decoder uses the motion information between blocks of the encoded frames to interpolate/dream up new frames to put in between them.
They do nothing of the sorts. Decoder sends uncompressed video to the screen in it's native fps. Otherwise how would frame interpolation even work with games and PC signal?
 
Status
Not open for further replies.
Back
Top