Speculation and Rumors: Nvidia Blackwell ...

DegustatoR · Jun 6, 2024

Seanspeed said:
There was nothing especially 'cutting edge' about a 5nm family process in late 2022.

TSMC would disagree with you, as well as Nvidia and AMD all of which are using N5 family for all their products right now including those launching this year.

Arun · Jun 6, 2024

Seanspeed said:
150mm² is low end. It just is. There is no process so advanced that it changes this. Also just a 128-bit bus. That's again, a low end spec that you use on low end parts.

So the GeForce4 (NV25) at 128mm2 and the Radeon 8500 (R200) at 120mm2 (both on 150nm) were low-end then, despite being unquestionably the fastest cards of their day? And a >$20K 2nm wafer price doesn't make any difference compared to ~$4K for 28nm in the Maxwell timeframe? (iirc, exact number could be slightly wrong)

This is literally just arguing semantics and like Albuquerque I don't get why everyone is so pointlessly obsessed about it. It's perfectly fair to point out that NVIDIA's gross margins for consumer GPUs are significantly higher this generation than in the past (it's hard to tell how much though given their revenue is now dominated by datacenter) and that this is problematic for the long-term health of the PC gaming market. You don't need to debate what's low-end or high-end to agree on that.

Hopefully AMD & Intel get more competitive, and hopefully chiplets allow for more efficient 2.5D/3D designs despite transistor costs not scaling as fast anymore. That might take a while though, and I really don't want to keep arguing this for the next few years until then...

DegustatoR · Jun 6, 2024

trinibwoy said:
we don’t have good info on what a given die costs anyway

Exactly. There's also a whole side missing from this discussion - die cost depends on how many dies from a platter is being sold. If a 150mm^2 die is one from a platter then it's cost is the same as of one 600 mm^2 die from the same platter. It doesn't have to be a result of physical defects either - a 150 mm^2 binned for some SKU as the one with ultra high clocks for example could end up being just one from a platter. The whole idea that die size mean anything about how "low end" a product is is just completely wrong.

trinibwoy · Jun 7, 2024

Micron claims GDDR7 will deliver 50% more raytracing performance than best-in-class GDDR6X. Very vague but if they’re referring to Blackwell vs 4090 that would be a nice bump.

First NVIDIA RTX 5090 Performance View? Micron Knows Better

Micron is getting ready to set sail with the new GDDR7 memory, and being NVIDIA's long-term partner, both are gaining big with the GeForce RTX 4000 Series. A few days ago, Micron announced its imminent launch of GDDR7 memory, remember that in March JEDEC published the GDDR7 memory standard. The...

www.techpowerup.com

Krteq · Jun 7, 2024

trinibwoy said:
... Very vague but if they’re referring to Blackwell vs 4090 that would be a nice bump.

What's the GDDR6 platform then?

Qesa · Jun 7, 2024

Krteq said:
What's the GDDR6 platform then?

Presumably RDNA3

trinibwoy · Jun 7, 2024

Krteq said:
What's the GDDR6 platform then?

We can try to guess. On the slides the GDDR6X platform is 2x the performance of the GDDR6 platform at RT while being 20% faster at raster. That lines up well with the 4090 and 7900XTX if you pick something like CP2077 for the RT workload.

The GDDR6 platform is likely from AMD. Nvidia’s fastest GDDR6 card is the 4060 Ti.

arandomguy · Jun 7, 2024

It's not consumer but strictly speaking it would be RTX A6000 Ada.

Jubei · Jun 8, 2024

https://videocardz.com/newz/nvidia-geforce-rtx-50-laptop-series-launch-in-2025-with-16gb-12gb-and-8gb-gddr7-variants

Erinyes · Jun 9, 2024

Jubei said:
https://videocardz.com/newz/nvidia-geforce-rtx-50-laptop-series-launch-in-2025-with-16gb-12gb-and-8gb-gddr7-variants

CES launch has been the norm for mobile parts in sync with new CPUs so pretty much as expected. Good to see them move to 8 GB on the xx50 series finally, I feel it was long overdue tbh.

Seanspeed · Jun 10, 2024

Arun said:
So the GeForce4 (NV25) at 128mm2 and the Radeon 8500 (R200) at 120mm2 (both on 150nm) were low-end then, despite being unquestionably the fastest cards of their day?

They absolutely would have been low end if much larger GPU's existed on the same architecture like exists now.

150mm² is low end today, and has been for quite a while. I feel like trying to find some wormy argument to slither out of acknowledging such a basic concept is bizarre. We're not talking about some magical super advanced process beyond anything else.

DegustatoR · Jun 11, 2024

https://twitter.com/x/status/1800437391519551830

https://videocardz.com/newz/geforce-rtx-50-blackwell-gb20x-gpu-specs-have-been-leaked

Basic unit configs more than "specs" really.

Father_Murphy · Jun 11, 2024

Assuming those are the Blackwell die configurations, what is the prevailing thought as to primary driver of (presumed) performance enhancements? To my layman's eyes, it would seem the main contenders are, in no particular order:

Significantly increased clock speeds;
Going wider i.e., each TPC contains more 2 SMs than prior architectures, so there are more compute units without having more, or significantly more, GPCs;
internal reconfigurations, such as cache improvements or enlargements or scheduling enhancements, to improve perf/mm2 or the "IPC" of the existing units;
Memory bandwidth boost from GDDR7 and/or the larger bus (at least for GB202);
Specialized hardware additions or improvements (e.g., BVH building fixed function hardware, expanding frame generation to additional frames, hardware to push GPU-driven work generation, etc.)

It has always been fascinating to me to see how different architectures have evolved to achieve greater performance or lower power. Like, Maxwell reconfigured the SM and kept data more local, Pascal ramped the clocks, Turing employed specialized hardware, etc.

If the GB dies aren't on a smaller node than Ada, then it'd probably be a combination of 3, 4, and 5. If there is a die shrink, it opens up 1 and 2. I don't know which way I'm leaning, and I'm just talking out of my nethers, but it's fun to speculate.

DegustatoR · Jun 11, 2024

1. Is unlikely as the process should be the same.
2. Putting more SMs into TPC isn't really "going wider" and won't be much different from just adding more TPCs.
3. Possible.
4. That's a given. Should also highlight the possible changes here, as in more b/w on the same shading performance makes little sense but maybe they've cut their L2 and put logic there instead?
5. It's highly likely that RT h/w will be improved but you don't really improve overall performance by improving only RT h/w (i.e. just tracing of rays).

One possible scenario is them going back to 32-wide SIMDs and removing the 2-clock warp execution. Would make SM level scheduling harder (more complex in h/w) but could allow for another doubling of peak math throughput with reasonable complexity increase.

trinibwoy · Jun 11, 2024

Father_Murphy said:
Assuming those are the Blackwell die configurations, what is the prevailing thought as to primary driver of (presumed) performance enhancements? To my layman's eyes, it would seem the main contenders are, in no particular order:

Significantly increased clock speeds;

Going wider i.e., each TPC contains more 2 SMs than prior architectures, so there are more compute units without having more, or significantly more, GPCs;

internal reconfigurations, such as cache improvements or enlargements or scheduling enhancements, to improve perf/mm2 or the "IPC" of the existing units;

Memory bandwidth boost from GDDR7 and/or the larger bus (at least for GB202);

Specialized hardware additions or improvements (e.g., BVH building fixed function hardware, expanding frame generation to additional frames, hardware to push GPU-driven work generation, etc.)

It has always been fascinating to me to see how different architectures have evolved to achieve greater performance or lower power. Like, Maxwell reconfigured the SM and kept data more local, Pascal ramped the clocks, Turing employed specialized hardware, etc.

If the GB dies aren't on a smaller node than Ada, then it'd probably be a combination of 3, 4, and 5. If there is a die shrink, it opens up 1 and 2. I don't know which way I'm leaning, and I'm just talking out of my nethers, but it's fun to speculate.

More SMs per TPC would be a significant increase in width of each chip which is very unlikely on the same node.

My guess is higher clocks, more transistors spent on RT and a boost from GDDR7. They can always position chips lower down the stack to compensate. E.g. GB206 for the 5060 vs AD107 for the 4060. GB207 is probably a 3050 replacement.

Frenetic Pony · Jun 11, 2024

DegustatoR said:
https://twitter.com/x/status/1800437391519551830

https://videocardz.com/newz/geforce-rtx-50-blackwell-gb20x-gpu-specs-have-been-leaked
Basic unit configs more than "specs" really.

Guessing time!

5090ti - 448bit bus, 28gb of ram, 525w+ $2k, 40% faster than a 4090
5090 - 384bit bus, 24gb of ram, 450w, $1500, 20% faster than a 4090

5080ti - 256bit bus, 16gb of ram, 350w, $1k, 20% faster than a 4080
5080 - 228bit bus, 14gb of ram, 300w, $750, basically a 4080 super

5070ti - 192bit bus, 12gb of ram, $600, 5% faster than 4070ti Super
5070 - 192bit bus, 12gb of ram, $500, 4070 super

5060ti - 128bit bus, 12gb of ram, $400, 4070
5060 - 96bit bus, 9gb of ram, $329, 4060ti

They'll shrink the L2 SRAM cache size with TSMC's new SRAM library that debuted in Zen 4c, use the same to increase L1 and GDDR7 bandwidth increases to increase clockspeed. Basically chips the same size with a modest performance uplift.

Then they'll update their matrix multiplication units, a very modest RT update, present a bunch of bullshit RTX AI "benchmarks" to show "how much better they are"/claim they've "doubled in performance gen over gen", launch DLSS 4 with AI framerate tripling and hole filling/claim 30fps upscaling to 90 is good enough now, etc.

Seanspeed · Jun 12, 2024

Frenetic Pony said:
Guessing time!

~~5090ti~~ - 448bit bus, 28gb of ram, 525w+ $2k, 40% faster than a 4090
5090 - 384bit bus, 24gb of ram, 450w, $1500, 20% faster than a 4090

5080ti - 256bit bus, 16gb of ram, 350w, $1k, 20% faster than a 4080
5080 - ~~228bit bus, 14gb of ram, 300w, $750, basically a 4080 super~~

5070ti - 192bit bus, 12gb of ram, $750, 5% faster than 4070ti Super
5070 - 192bit bus, 12gb of ram, $600, 4070 super

5060ti - 128bit bus, 8gb of ram, $400, 4070
5060 - 128bit bus, 8gb of ram, $329, 4060ti

Unless Nvidia have pulled off some incredible architectural performance improvements(via clockspeed increases or sheer IPC gains), this is looking like a fairly disappointing generation. It would be nice if Nvidia gave us a bone in terms of pricing/value to compensate, but AMD seems half checked out of the GPU market by now and consumers have decided they are fine with being exploited in order to have the new shiny thing so.....

trinibwoy · Jun 12, 2024

Seanspeed said:
Unless Nvidia have pulled off some incredible architectural performance improvements(via clockspeed increases or sheer IPC gains), this is looking like a fairly disappointing generation. It would be nice if Nvidia gave us a bone in terms of pricing/value to compensate, but AMD seems half checked out of the GPU market by now and consumers have decided they are fine with being exploited in order to have the new shiny thing so.....

I don’t think there’s any scenario where the 5080 isn’t significantly faster than the 4080. So there won’t be any disappointment in raw performance if you care about model numbers. It’ll just come down to pricing.

What will that 5080 look like? No idea but the easiest option is a cut down GB202. The other option is magic.

Shifty Geezer · Jun 12, 2024

Seanspeed said:
They absolutely would have been low end if much larger GPU's existed on the same architecture like exists now.

150mm² is low end today, and has been for quite a while. I feel like trying to find some wormy argument to slither out of acknowledging such a basic concept is bizarre. We're not talking about some magical super advanced process beyond anything else.

You're arguing semantics in a response to a post from a moderator that such an argument really isn't adding to the conversation?

Revisit what exactly the debate is. What is it about die size that affects whatever it is people are discussing. Notions of 'high' and 'low' end don't need enter it and so don't need defining to a point everyone can agree to push the debate forwards.

gamervivek · Jun 12, 2024

I've been testing out lossless scaling's 3x frame generation and quite impressed with it. Around low 40s is good enough to play and I'm usually picky with the input lag, though with integrated FG like DLSS/FSR, the input latency would be higher,
Anyway, the gist is that nvidia can promote Blackwell to do 3x or even 4x FG which will add onto the performance difference and displayed prominently in the slides. With how ubiquitous upscaling has become in the past few years, FG will be adopted the same way and with how good the experience is with it on, it'd be seen more and more as 'real' performance.

Speculation and Rumors: Nvidia Blackwell ...

DegustatoR

Arun

Unknown.

DegustatoR

trinibwoy

Meh

First NVIDIA RTX 5090 Performance View? Micron Knows Better

Krteq

Qesa

trinibwoy

Meh

arandomguy

Jubei

Erinyes

Seanspeed

DegustatoR

Father_Murphy

DegustatoR

trinibwoy

Meh

Frenetic Pony

Seanspeed

trinibwoy

Meh

Shifty Geezer

uber-Troll!

gamervivek