NVidia Ada Speculation, Rumours and Discussion

Status
Not open for further replies.
2022-09-22_16-58-00-1480x751.png

 
Last edited:
AD106 looks like it's only going to be about as fast as the 3070 in rasterisation. I doubt that will cut it against Navi 33, so it makes sense to use the 3000 series to fill the gap.
How exactly will Navi 33 be faster? Navi 33 will have less from everything:
FP32 performance, rasterizing, geometrie performance, TensorFlop, raytracing performance etc.

/edit: Fun fact:
AD106 will deliver more FP32 performance (25 TFLOPs should be possible) and FP16 TensorFlops than a CDNA2 chiplet. But i guess AMD has no problem to provide the same numbers in 6nm at 200mm^2.
 
Last edited:
We have confirmed with NVIDIA that the 30-cycle spec for the 16-pin connector is the same as it has been for the past 20+ years. The same 30-cycle spec exists for the standard PCIe/ATX 8-pin connector (aka mini-fit Molex). The same connector is used by AMD and all other GPU vendors too so all of those cards also share a 30-cycle life. So in short, nothing has changed for the RTX 40 GPU series.
...
The next-gen NVIDIA GeForce RTX 40 series graphics cards including the RTX 4090 and TX 4080 will be bundled with such cables however, each package will include just one cable so enthusiasts who like to unplug their hardware a lot and try new stuff will have to be careful because you will need new cables every time you run out of the 30 life cycles period.
...
This needs to be verified once the launch of card but for new owners or those who plug their GPU inside the PC once and never take it back out again, this shouldn't be a huge concern. Certain PSU companies are also making their cables with higher quality components but there's no way to tell just how good or bad a 12VHPWR cable is without removing the sleeves.
 
Seen some guy on Twitter claim that Nvidia are moving the low/mid tier 3000 series GPU's to 5nm and will use the shrink to increase clock speeds by ~50-60% and increase performance that way.

That would keep DLSS3 exclusive to the bigger 4000 series cards while offering 40-50% performance increase through clock speed bumps for the low/mid 4000 cards.

If they did that I'm not sure how I would feel about it, on the one hand they would be offering a good performance uplift but at the same time still locking out the new tech innovations.
I call BS because you can't "move" anything from Samsung's 8N to TSMC's N5, this would be the same as making a completely new chip in which case there are zero reasons to use the old IP.
 
I call BS because you can't "move" anything from Samsung's 8N to TSMC's N5, this would be the same as making a completely new chip in which case there are zero reasons to use the old IP.
There's one reason: If you think, perf is sufficient even with the old IP and you can get it done in less square millimeters compared to newer, potentially more spacious IP blocks. And of course, if you think your products/brand is strong enough and you want to create customer incentives to move their purchase upwards in the stack.
 
Has nvidia given bandwidth numbers for the 96 l2 cache like amd did with infinity cache ?..
Compared to Ampere, Ada’s Level 2 cache has been completely revamped. AD102 has been outfitted with 98304 KB of L2 cache, an improvement of 16x over the 6144 KB that shipped in GA102. All applications will benefit from having such a large pool of fast cache memory available, and complex operations such as ray tracing (particularly path tracing) will yield the greatest benefit.
That's all.

There's one reason: If you think, perf is sufficient even with the old IP and you can get it done in less square millimeters compared to newer, potentially more spacious IP blocks. And of course, if you think your products/brand is strong enough and you want to create customer incentives to move their purchase upwards in the stack.
While true it would be even cheaper to just continue selling the same Ampere chips made on 8N process.
 
How exactly will Navi 33 be faster? Navi 33 will have less from everything:
FP32 performance, rasterizing, geometrie performance, TensorFlop, raytracing performance etc.

/edit: Fun fact:
AD106 will deliver more FP32 performance (25 TFLOPs should be possible) and FP16 TensorFlops than a CDNA2 chiplet. But i guess AMD has no problem to provide the same numbers in 6nm at 200mm^2.
I was talking purely about rasterisation, so the RT and TensorFlop numbers aren't relevant. From the rumoured specs it would be 48 ROPS (3 GPCs) and 144 TMUs on AD106 against a presumed 64 ROPS and 128 TMUs on NAVI 33. If the boost clocks are 2.6 GHz and 2.8 GHz respectively say, then AMD would be behind by ~4% on texture rate and ahead by ~44% on fillrate. Nvidia would be ahead by ~4% in compute.

But if I understand correctly, Nvidia still have to share the extra FP32 units with the INT32 units, where as AMD is just doubling across the board. So we would expect AMD to get better scaling for their doubling. If they can increase performance at 1440p by ~31% vs the 6650 XT then they are already at 3070 level performance. If they can increase it by ~63%, then they are at 6800 XT levels. The rumours were that they were targeting 6800 XT/6900 XT performance, at least at 1080p.
 
I was talking purely about rasterisation, so the RT and TensorFlop numbers aren't relevant. From the rumoured specs it would be 48 ROPS (3 GPCs) and 144 TMUs on AD106 against a presumed 64 ROPS and 128 TMUs on NAVI 33. If the boost clocks are 2.6 GHz and 2.8 GHz respectively say, then AMD would be behind by ~4% on texture rate and ahead by ~44% on fillrate. Nvidia would be ahead by ~4% in compute.

But if I understand correctly, Nvidia still have to share the extra FP32 units with the INT32 units, where as AMD is just doubling across the board. So we would expect AMD to get better scaling for their doubling. If they can increase performance at 1440p by ~31% vs the 6650 XT then they are already at 3070 level performance. If they can increase it by ~63%, then they are at 6800 XT levels. The rumours were that they were targeting 6800 XT/6900 XT performance, at least at 1080p.
Narrow GPUs have less problems to fill the gaps to avoid performance bottlenecks. AD106 will around ~1/3 of the 4090 with even higher clocks. At least on 6nm AMD wont double everything. There own claim is >50% better efficiency with 5nm and chiplets.
 
Narrow GPUs have less problems to fill the gaps to avoid performance bottlenecks. AD106 will around ~1/3 of the 4090 with even higher clocks. At least on 6nm AMD wont double everything. There own claim is >50% better efficiency with 5nm and chiplets.
If it's really 3 GPCs/36 SMs, then that's 1/4 of the 4090. And Ampere mid range GPUs didn't see massively higher clocks. Right now kopite7kimi is claiming a 7000 TSE score, which is a bit better than the 3070.

I agree AMD isn't doubling everything on every product with RDNA 3, but they do seem to be doubling the number of ALUs per WGP. That should give a performance boost somewhere in between the Turing to Ampere "doubling" and the RDNA1 to RDNA 2 "doubling".
 
You can find RTX2080Ti from as low as 250usd here in the used market. That's quite a great performance for the money even though its an old gpu by now. 3080 from 500usd and upwards from there (non EHT gpus).
If youre in the market for a new GPU and not afraid of the used market it can be worth it to check these out. Im myself on a 2080Ti which isnt far from 3070 performance, but with 11gb's of fast ram. RT performance is still great on these too.
 
I call BS because you can't "move" anything from Samsung's 8N to TSMC's N5, this would be the same as making a completely new chip in which case there are zero reasons to use the old IP.
Exactly. A shrink involves a full VLSI, place/route and tapeout cycle. In this case it’s even worse because it’s a foundry shift so it may involve changing RAM macros (basically different foundries use different SRAM bitwidths as building blocks), which is painful as hell. If you’re going through all that pain you would just use the latest IP.
 
There's one reason: If you think, perf is sufficient even with the old IP and you can get it done in less square millimeters compared to newer, potentially more spacious IP blocks.
Unless the less square millimeters in 4N costs more than the more square millimeters in 8N. We need to reset our mental calibration of Moore’s Law.
 
Status
Not open for further replies.
Back
Top