Why? Ada's RT features materialized in less than a year from its launch.New Blackwell features for RT? Will be waiting a while to see those in games if ever.
Why? Ada's RT features materialized in less than a year from its launch.
It would of course help if these features would be pushed as a DX update instead of being an NvAPI thing.
That was just (mostly) the norm back then. Kepler GPU's generally had a fair chunk of headroom for overclocking compared to stock core/memory specs as well. And the highest end parts usually had the most, given their stock specs tended to be more conservative for binning purposes.980Ti had a lot of headroom, AIB cards were ~20% faster out of the box, impossible to think of today.
Gigabyte GTX 980 Ti XtremeGaming 6GB Review
Gigabyte's new GTX 980 Ti XtremeGaming is highly overclocked, yet more affordable than other GTX 980 Ti variants. It also comes with a quiet triple-slot, triple-fan cooling solution that stops the fans in idle and provides great temperatures.www.techpowerup.com
Zotac GeForce GTX 980 Ti AMP! Extreme 6GB Review
Zotac's GeForce GTX 980 Ti Amp! Extreme is one of the fastest custom-design GTX 980 Ti cards out there, yet comes at a relatively affordable price increase - unlike such competitors as the MSI Lightning or ASUS Matrix.www.techpowerup.com
Recent rumours have suggested that Nvidia’s planned RTX 5090 will not be making full use of their flagship “Blackwell” silicon. Now, it seems clear why Nvidia isn’t using the full potential of Blackwell with the RTX 5090. Nvidia has a stronger GPU in the works, a new TITAN-class product called the TITAN AI.
Nvidia’s RTX 5090 reportedly features 28GB of GDDR7 memory over a 448-bit memory bus. This is not the full potential of Nvidia’s GB202 silicon, which reportedly features a 512-bit memory bus. If this is true, Nvidia’s planned Blackwell TITAN could feature up to 32GB of GDDR7 memory.
Below is a report from RedGamingTech that claims that Nvidia has a new TITAN GPU in the works. This GPU is reportedly around 63% faster than Nvidia’s RTX 4090. If the data below is correct, this New GPU will be around 10% faster than Nvidia’s RTX 5090.
It's more like a 55% increase now (the 3070 is 55% faster than 1080Ti at both 1080p and 1440p, and 3070 is equal to 2080Ti), timestamped below.1080Ti -> 2080Ti was a 33% increase in performance
Well yea, gaps over time tend to grow, especially once you get to like two generations behind with Nvidia products where driver optimization support tends to stop. And it was ever more the case with the switch to RTX/DX12U stuff where the 1080Ti was gonna fall farther behind after....yea, eight years. lolIt's more like a 55% increase now (the 3070 is 55% faster than 1080Ti at both 1080p and 1440p, and 3070 is equal to 2080Ti), timestamped below.
I am well aware of that, but that's not the case here, the 1080Ti is still on the level of 5700XT/Radeon VII, but the Turing advantage grew over time in the new games.Well yea, gaps over time tend to grow
The benchmark suite has no ray tracing tests. Only rasterization.the switch to RTX/DX12U stuff where the 1080Ti
"The Turing advantages grew over time in the new games"I am well aware of that, but that's not the case here, the 1080Ti is still on the level of 5700XT/Radeon VII, but the Turing advantage grew over time in the new games.
The benchmark suite has no ray tracing tests. Only rasterization.
I could believe it at $4000.
RGT dude got ripped!
It does make sense to sell non-HBM GPUs for AI as long as you're supply limited on CoWoS though - and TSMC's latest quarterly earning conference call strongly implied those supply limits were improving but not fully resolving any time soon if demand remains as expected. Also NVIDIA can price the server variants using these chips for whatever the market will tolerate, just like they did for L40/AD102.It makes zero sense to launch anything AI for less than what current AI solutions are selling.
On the other hand, I hope they can tweak the design so they don't waste too much area on AI for everything below GB202...
Just a guess, I think this is the most likely structure of the 'redesigned SM'.
GPU Subwarp Interleaving | Research
Raytracing applications have naturally high thread divergence, low warp occupancy and are limited by memory latency. In this paper, we present an architectural enhancement called Subwarp Interleaving that exploits thread divergence to hide pipeline stalls in divergent sections of low warp...research.nvidia.comUS11934867B2 - Techniques for divergent thread group execution scheduling - Google Patents
Warp sharding techniques to switch execution between divergent shards on instructions that trigger a long stall, thereby interleaving execution between diverged threads within a warp instead of across warps. The technique may be applied to mitigate pipeline stalls in applications with low warp...patents.google.com
That’s fair, at NVIDIA’s new scale it makes sense - the biggest obstacle short-term is probably hiring fast enough and replacing people early retiring, so it may not be realistic in this timeframe yet.Wouldn’t it be easier to tape out a dedicated GDDR7 AI GPU? It’s not like they’re short on cash.
AFAIK (and I really need to double check this), NVIDIA’s L1 cache is still in-order for all misses. That is, if all of a warp’s load requests hit in the L1, it can reschedule earlier than an older warp reques that missed - but if a single thread misses in the L1, then it might have to wait until another warp’s data comes back from the far DRAM controller even though it hit in the L2 and could reschedule much sooner.Sounds like maybe there starting to make it a bit more superscalar/pipelined.
ie. a single SM can work on multiple ops at once, not OOO but the first step.