Nvidia GeForce RTX 40x0 rumors and speculation

I'm not sure what results you're expecting? It's 10% more SMs with the same power limit. There's also going to be a lot of tests, especially at 1080p, that were already at diminishing returns with the 4070ti even due to likely stalling outside of the GPU.

This was always likely going to be the situation -

4070s - largest performance gain

4070ti s - VRAM

4080s - price drop
People saw that it's the same chip as in 4080 and assumed that it will be the same performance even though what really matters is the configuration of the chip.
Also the years long whine about the "lack" of 12GBs of VRAM meant that everyone was expecting something big from 16GBs while in reality 12GBs are fine and enough.
 
Computerbase has R&C numbers.
8.6% diff at 1080p with Raytracing, skyrocketing to 44.5% at 4K. It looks like DLSS Quality is used at both resolutions.

So yeah, the extra 4GB can certainly make a huge difference under the right circumstances.

Great find! And yes, exactly what I would have expected. Interesting that it's the only game showing such a big difference though.
 
People saw that it's the same chip as in 4080 and assumed that it will be the same performance even though what really matters is the configuration of the chip.
Also the years long whine about the "lack" of 12GBs of VRAM meant that everyone was expecting something big from 16GBs while in reality 12GBs are fine and enough.
I have to agree as far as the current games are concerned. You have to really make an effort to find cases where 12GB is the problem.

But to be fair longevity or expected future disadvantage has always been a part of that argument.
 
But to be fair longevity or expected future disadvantage has always been a part of that argument.
Sure but here you should also consider the level of performance the card nets you at the moment. It is not a given that a 4070-class h/w will be able to run games which will require >12GBs at an acceptable performance level, and once you start to turn settings down you also lower the VRAM usage.

Anyway 16GBs is the biggest advantage the 4070TiS has over 4070Ti. Getting this for the same price even with a minor performance bump will be attractive to many pre-Ada owners I'm sure.

An illustration to the point above:

rt-alan-wake-2-3840-2160.png


Technically you see the card doing several times better than the 12GB models but in practice you're getting < 30 fps.
 
Last edited:
BIOS issues perhaps skewing some cards negatively aside, what these results indicate to me is that they end up basically validating Nvidia's approach in at least the performance aspect wrt to Ada's bus architecture. That being, a narrow bus + cache is indeed an engineering optimization that can adequately compete against a wider bus with significantly more brute raw bandwidth.

The biggest advantage that the increase to a 256 bit bus brings is vram capacity apparently. But before these results, I would have expected the ti Super would show some of the biggest improvements, simply because of course it's being strangled by its paltry 192 bit bus - even with the meagre shader count improvements, the GPU must be thirsty for more bandwidth and we should see something closer to the ~30% improvement in bandwidth in actual game performance when GPU limited because of it. Welp, not really - even at 4k.
 
I have to agree as far as the current games are concerned. You have to really make an effort to find cases where 12GB is the problem.

But to be fair longevity or expected future disadvantage has always been a part of that argument.
I made no effort and found problems with 12GB on my 4070 😅
 
I made no effort and found problems with 12GB on my 4070 😅

I think you might have got a bit unlucky with game selection there. Personally I've only ran into one noticeable issue which was with R&C and that was easily solved by dropping the texture settings down a (personally unnoticeable) notch. Outside of that the only game I've seen reference running out of memory is the RE4 demo, but it didn't seem to impact the gameplay in any way I could notice.

I am only running at 3840x1600 though which gives me a fair bit more leeway vs full 4k.
 
I think you might have got a bit unlucky with game selection there. Personally I've only ran into one noticeable issue which was with R&C and that was easily solved by dropping the texture settings down a (personally unnoticeable) notch. Outside of that the only game I've seen reference running out of memory is the RE4 demo, but it didn't seem to impact the gameplay in any way I could notice.

I am only running at 3840x1600 though which gives me a fair bit more leeway vs full 4k.
I was playing at 1080p in Hogwarts and 1440p in Witcher 3.
 
I was playing at 1080p in Hogwarts and 1440p in Witcher 3.

Witcher 3? What issues have you seen there? I'm maxing that out at the above resolution with no issue on the GPU side that I've noticed (at DLSS Balanced if I recall). CPU is the big issue in that game in my experience.
 
Witcher 3? What issues have you seen there? I'm maxing that out at the above resolution with no issue on the GPU side that I've noticed (at DLSS Balanced if I recall). CPU is the big issue in that game in my experience.
1440p DLSS Quality in Witcher 3. I had everything at max and framegen on. It gets slow after a while if I don't turn textures down, but it takes quite a while so it's not a big deal.

Hogwarts was worse. 1080p max settings using DLAA and framegen. I modded the DLSS to be 100% res because you can't select DLAA and framegen at the same time in this game for some reason. After only a few minutes I'd get lower performance unless I set textures to High or lowered DLSS to 90% res. This was very clearly a VRAM limitation. I could deal with it but I just don't see this getting better with newer games.

But when I said I made no effort I'd forgotten I modded DLSS to be native res. That probably qualifies as making an effort :LOL:
 
Anyone know what's wrong with the Ventus 3x 4070Ti Super? I see the problem mentioned a lot but nobody seems to have any clue why it's slow and the updates MSI has pushed out don't seem to fix it.
 
Based on reviews I've seen the 4070 Ti Super does indeed have 64MB L2. (y)

Also the MSI Ventus 3x 4070Ti Super is possibly messed up or something according to Techspot. Up to 5% slower than other cards.
Hardware Unboxed review has been corrected, saying has 48MB L2 but at this point I'm not sure at all.
 
Anyone know what's wrong with the Ventus 3x 4070Ti Super? I see the problem mentioned a lot but nobody seems to have any clue why it's slow and the updates MSI has pushed out don't seem to fix it.
Bios was not optimized ...

BIOS Update for Enhanced Performance MSI GeForce RTX 4070 Ti SUPER Series Graphics Cards

As mentioned in our review, following an in-depth analysis by our Research and Development department, it was identified that the GeForce RTX 4070 Ti SUPER 16G VENTUS 3X graphics card was not operating at its maximum capability. To address this, MSI have developed a new BIOS version (95.03.45.40.F0) that has been rigorously fine-tuned to enhance the performance of the GeForce RTX 4070 Ti SUPER 16G VENTUS 3X graphics card.
 
Anyone know what's wrong with the Ventus 3x 4070Ti Super? I see the problem mentioned a lot but nobody seems to have any clue why it's slow and the updates MSI has pushed out don't seem to fix it.
Botched BIOS, fixed ones are already out
 
The biggest advantage that the increase to a 256 bit bus brings is vram capacity apparently.
Doesnt have to be that way though my Cayman card HD6950 had 2GB vram and a 256bit bus
ps: the question I wanted to ask I have a 4070ti Super and am currently running it off a single pci-e cable with 2x8pin plugs and the supplied 2x8pin to 12vhpwr adapter.(because my psu is modular and only has 1 pcie cable plugged into it at the moment. I have another one in the psu box but because I'm moving soon everything is packed away in boxes and getting to it would be a pita. What are the down sides of doing what I'm doing?
 
Doesnt have to be that way though my Cayman card HD6950 had 2GB vram and a 256bit bus

Doesn't have to be as the 16GB 4060ti also shows, but the reality of Nvidia's product segmentation indicates that widening the bus was the only way you were going to get more than 12GB in this price class currently. The only other way would be to go clamshell and bump it up to 24gb, which was obviously never in the cards as that would throw the whole stack in disarray.
 
BIOS issues perhaps skewing some cards negatively aside, what these results indicate to me is that they end up basically validating Nvidia's approach in at least the performance aspect wrt to Ada's bus architecture. That being, a narrow bus + cache is indeed an engineering optimization that can adequately compete against a wider bus with significantly more brute raw bandwidth.

I don't think that should be the take away here. We have to assume in general, due to common sense, that these GPUs are typically designed in such a way the memory subsystem would not be a significant limitation for their targeted workloads, and really this would apply to all other aspects as well. It's not that the L2 cache size itself is doing anything special only that the 4070ti's entire memory subsystem is ample enough to feed the rest of the hardware, at least for typical workloads (eg. crypto mining specialty workload that are very memory bound and would likely be much faster on the 4070ti S).

In general I don't feel we can adequately judge the merits of more cache vs. bus/memory speed since we have access to very little details on the variables. We'd need to actually compare 2 samples, one with more bus/memory speed vs. one with more cache. Even with that we'd need to know details about the cost variables. Just spit balling here for example hypothetically if AD104 was 256 bit with I guess "regular" cache and let's say it performed the same as current AD104 but at the expense of more power would that be a better or worse design? Imagine say RTX 4070/4070ti with 16GB but requiring more power vs. current 4070/4070ti 12GB but more efficient. Which would you pick?
 
Something else I've also always been wondering for awhile is I wonder with how much larger and complex modern (well for some time now) GPUs are if there are caveats to disabling large unit structures that aren't readily apparent and perhaps even some cases being hidden/accounted for by the driver.

For instance with the 4070ti S are they always disabling at least 1 entire GPC? If a large amount of L2 is also lost as well, does any of that effect access to all the 8 PHYs and other sub units making up the 256 bit memory bus compared to how the 4080 is configured?

I know there's been some interesting findings with microbenchs in the past (and sometimes with some disclosure through Nvidia) that there was interactivity between certain subunits which ended up limiting effective throughput if one was cut even if the other was not. For example when GPC/ROPs were separated I believe cutting GPCs ended up limiting effective fill rate for shaded pixels even if ROPs were identical.
 
Back
Top