Nvidia GeForce RTX 40x0 rumors and speculation

So roughly half way between a 3060 and 3060Ti in SM count and only 8GB VRAM.

Looks like another huge disappointment from Nvidia if true.
It is 128bit, so it must be 8GB. Otherwise it ends up with more memory than the 4070Ti.

So 4060Ti has 4GB less memory than the 3060, as will the regular 4060. It's also a significant downgrade in bandwidth vs both the 3060Ti and 3060. And it will certainly be $500+. Please people don't buy this shit.

In some thread a few weeks ago people were trying to convince me that memory bus doesn't matter as much any more because dies are more expensive and have more cache and stuff. I started to come around, but now I'm fully back to thinking that argument is fuckin crazy. AD102 having 384bit is proof enough that it still matters just as much as it always has, and NVIDIA is giving us the shaft on everything below the 4090. Or shafting us harder on everything below the 4090. The size of the shaft increases as you move down the product stack.
 
Last edited:
Maybe Nvidia doesn't think it needs real performance anymore. If we can generate a bunch of extra frames between real ones and make sure everything is using DLSS3 then we only need 4050's and get 200fps at 4k!
 
Maybe Nvidia doesn't think it needs real performance anymore. If we can generate a bunch of extra frames between real ones and make sure everything is using DLSS3 then we only need 4050's and get 200fps at 4k!
It's subtler than that: NVidia can spend transistors on RT acceleration instead of math, and still get gen-on-gen gains for "professional rendering", i.e., brute force path tracing.
 
Maybe Nvidia doesn't think it needs real performance anymore. If we can generate a bunch of extra frames between real ones and make sure everything is using DLSS3 then we only need 4050's and get 200fps at 4k!
I know you're being facetious and I have no idea if that's where Nvidia is headed. But I want to entertain your thought seriously because it may not be such an absurd vision.

At the end of the day all of real time CG is a hacky simulation attempting to convey enough information for our brains to interpret the scene as some facsimile of reality. Multiple factors contribute to the fidelity of this simulation, but the two broad buckets that are relevant to this discussion are (1) fidelity of the physical model (i.e., lighting, materials, geometry) and (2) display sampling rate (i.e., resolution and fps).

Let's stick to (2) for now. What's funny is that a lot of the desire for higher sampling is due to the horrible nature of sample-and-hold LCD/OLED displays. CRTs were much less affected by this because they would sample a point for a tiny fraction of the refresh window and our brains would run the "DLSS3" frame generation to fill in the gaps. This led to a much smoother perception of motion than LCDs which keep displaying the same old sample for the entire duration of the refresh cycle, leading to a jarring visual shock when it instantly refreshes to the next sample. Black-frame insertion in LCDs make a weak attempt to approach CRT behavior but lose brightness (because the screen is off half the time). CRTs did not have this problem because each sample is insanely bright (which means they are probably much more strenuous on the eyes than LCDs).

(This was a purely temporal argument, but the spatial argument is somewhat analogous).

I believe that frame-generation is effectively making up for the sample-and-hold display's faults by simulating the brain's reconstruction behavior in code. Recent AI work has shown us that deep-learning models are actually excellent at solving specific perception problems that the human brain is adept at. I think motion interpolation falls into that category of problems. How well DLSS3's current implementation works is a separate discussion. But on a fundamental level I don't think there's anything wrong with the approach. Today we are just generating 1 frame. I'm hopeful that some day we are able to reconstruct from 60Hz to 360Hz or more.

All of this is especially important because transistors are getting costlier and so simply attempting to increase raw sample rate will get commensurately costlier. That's not to say that frame-generation comes for free -- you need transistors for that too! However, it seems to scale much better than raw rendering power, especially given that it cuts down on CPU cycles too. If it works well, it can lead to more cost effective GPUs (which we sorely need). Conversely, for high-end GPUs, it frees up transistors for higher modeling fidelity (lighting, materials) instead of chasing dumb sample rates.
 
Last edited:
instead of chasing dumb sample rates.
That's how nature actually works too. Our visual sampling rate is not fixed, it's smart and it adjusts the rate when needed.

The highest sampling resolution is only the center of the eye, the periphery of the eye is of low resolution and mostly detects black and white, and motion.

The optic nerve compresses the data before sending it to the brain, it also applies post processing to the image before sending it to the brain.

The brain then combines images, fills out the missing pieces (of the blind spots and the low resolution areas), does lots of temporal reconstruction and reconstruction from visual memory then constructs a final high resolution "mental" image.

Real time CGI should strive to copy biology by doing things smartly, instead of going the opposite direction.
 
The optic nerve compresses the data before sending it to the brain, it also applies post processing to the image before sending it to the brain.
Is there a good resource for non-neurologist on how our eyesight works? I don't believe we can talk about images before the brain.
 
So at least the mobile 4050 is looking like 96bit.
If naming relation will remain the same in the low end we're probably looking at desktop 4050 with 128 bit bus and 8GBs.
Which will be a bit weird because it seems that the desktop 4060 will have the same memory config. L2 size cut on 4050 maybe?
 
If naming relation will remain the same in the low end we're probably looking at desktop 4050 with 128 bit bus and 8GBs.
Which will be a bit weird because it seems that the desktop 4060 will have the same memory config. L2 size cut on 4050 maybe?
Would that disable some ROPs as well? I don't know if it still works like that.
 
Would that disable some ROPs as well? I don't know if it still works like that.
Not necessarily. AD102 in 4090 already has its L2 cut by 25% in comparison to the full AD102 spec.
They may do something similar here but this time in h/w to make AD107 cheaper in comparison to AD106.
Another obvious option is to use slower VRAM - G6 on 4050 instead of G6X on 4060 for example.
 
Not necessarily. AD102 in 4090 already has its L2 cut by 25% in comparison to the full AD102 spec.
They may do something similar here but this time in h/w to make AD107 cheaper in comparison to AD106.
Another obvious option is to use slower VRAM - G6 on 4050 instead of G6X on 4060 for example.
But the 4090 also has some ROPs disabled right? I'm just wondering if it's now possible to disable L2 independently of ROPs. I guess it is possible to disable ROPs and L2 without ending up with a GTX970 situation or we would have heard about it.
 
Last edited:
But the 4090 also has some ROPs disabled right?
1 GPC is disabled on 4090 which means minus 16 ROPs in addition to removed SMs, etc.

I'm just wondering if it's now possible to disable L2 independently of ROPs.
Has been for some time actually even before ROPs were moved to GPCs - 970 anyone?
But here it is also possible to just cut down the size of each L2 partition because the size of L2 allows this in Lovelace.
 
Back
Top