Nvidia Blackwell Architecture Speculation

TopSpoiler · Jan 15, 2025

homerdog said:
Is there a reason IHVs have mostly avoided odd numbers of memory channels? Are they better in pairs or something?

I found this answer:

Memory transaction size

That’s a lot of questions External Media I’ll just try to answer two… On G80 to GT200 architectures, 32-bit GDDR3 RAM chips are grouped in pairs, and every transaction is routed to a pair of chips put in parallel. This is why bus widths of NVidia GPUs are all 64-bit multiples. Then, each...

forums.developer.nvidia.com

arandomguy · Jan 16, 2025

TopSpoiler said:
I found this answer:

Memory transaction size

That’s a lot of questions External Media I’ll just try to answer two… On G80 to GT200 architectures, 32-bit GDDR3 RAM chips are grouped in pairs, and every transaction is routed to a pair of chips put in parallel. This is why bus widths of NVidia GPUs are all 64-bit multiples. Then, each...

forums.developer.nvidia.com

I believe this was changed with Pascal.

GM204 -

https://tpucdn.com/gpu-specs/images/g/767-block-diagram.jpg

GP104 -

https://tpucdn.com/gpu-specs/images/g/793-block-diagram.jpg

Hence likely also why the we saw 11 channel configurations used for Pascal (1080ti) and the next generation in Turing (2080ti) as well.

rSkip · Jan 16, 2025

arandomguy said:
I believe this was changed with Pascal.

GM204 -

https://tpucdn.com/gpu-specs/images/g/767-block-diagram.jpg

GP104 -

https://tpucdn.com/gpu-specs/images/g/793-block-diagram.jpg

Hence likely also why the we saw 11 channel configurations used for Pascal (1080ti) and the next generation in Turing (2080ti) as well.

Is this also the reason of GTX 970 ROP/memory issues? nVIDIA wanted a 224bit interface but could not do it?

DegustatoR · Jan 16, 2025

homerdog said:
Is there a reason IHVs have mostly avoided odd numbers of memory channels?

The only reason is that it leads to rare configurations like 10 or 11GB. Anything rare is in danger of being omitted from developers optimization and QA processes.
I was thinking recently that with 3GB G7 modules there are now more options for such configuration than previously. You could have a 224 bit GB203 with 21GB of memory for example. Or a 160 bit GB205 with 15GBs. The disabling of one MC and associated L2 partition could offset the cost of denser memory modules. But such configurations could be a challenge for many developers who still optimize to some fixed memory amount progression like 8/12/16 GBs.

arandomguy said:
I believe this was changed with Pascal.

I think the change is more related to memory type which is being used. At some point 64 bit memory channels became 32 bit ones.

Rys · Jan 16, 2025

That’s not the reason. It would particularly complicate any logic involved in request-to-channel balancing, which is critical for performance. So unless the odd number is 1, it’s then very easy to end up with lower utilisation of the memory system. Design and validation in hardware and then any tuning in software is not juice that’s worth the squeeze (otherwise you’d see it more often).

trinibwoy · Jan 16, 2025

DegustatoR said:
Looks rather small to me. And since that's 5080 it will only have to deal with 360W.

Got a better look in the GN deep dive video. It is quite small.

DegustatoR · Jan 16, 2025

Rys said:
That’s not the reason. It would particularly complicate any logic involved in request-to-channel balancing, which is critical for performance. So unless the odd number is 1, it’s then very easy to end up with lower utilisation of the memory system. Design and validation in hardware and then any tuning in software is not juice that’s worth the squeeze (otherwise you’d see it more often).

Doesn't look like 1080Ti had any issues with request-to-channel balancing.

Seanspeed · Jan 16, 2025

DegustatoR said:
Also +30-40% seem to be confirmed now.

Only for 4090->5090.

Everything else will be a fair bit lower than that.

It's gonna be a very disappointing generation by most standards.

DegustatoR · Jan 16, 2025

Seanspeed said:
Everything else will be a fair bit lower than that.

Let's wait for benchmarks. The selection which Nvidia used for comparisons without FG remains very weird - two AMD sponsored titles, one console title w/o RT and one with RT shadows. I will be surprised if other titles like those which make heavy use of RT will show the same results as Horizon for example.

Seanspeed · Jan 16, 2025

DegustatoR said:
Let's wait for benchmarks.

You can see the specs and understand what's coming the same as I can. Nvidia isn't exactly shy about touting raw performance gains if they've got something great, either. But it'll probably be easier to bury/deflect such talk with other things about the reviews when they come, complaining that some of them weren't fair to Nvidia or weren't testing frame generation and Nvidia's advantages enough, blah blah. I can already see the HUB topic going crazy.

Would be very glad to be wrong, but the writing has been on the wall for a good while, and these recent leaks about performance are pretty much confirming what could reasonably be guessed beforehand.

DegustatoR · Jan 16, 2025

Seanspeed said:
You can see the specs and understand what's coming the same as I can.

No, I can't. The architecture is different enough to avoid applying previous patterns to how it will perform. Shading core seem to be different, RT units are different, tensor h/w is different, even memory controllers are different again.

trinibwoy · Jan 16, 2025

Seanspeed said:
It's gonna be a very disappointing generation by most standards.

It’s shaping up to be a non-event for most owners of current generation cards. Maybe people shopping in the $400-$600 range will get something enticing from AMD with FSR 4.

Lots of RDNA 2 and Ampere owners out there though so everything will still sell.

Rys · Jan 16, 2025

DegustatoR said:
Doesn't look like 1080Ti had any issues with request-to-channel balancing.

Any is a strong word. The product did very well but I’d be surprised if there were zero issues from a product bringup and tuning perspective. My point is more that the GP102 in that product wouldn’t ever have been designed with 11 channels natively.

There’s a reason we’ve rarely (never as far as I can remember, at least in the last 20 years) seen a native odd numbered channel count in a GPU before.

Man from Atlantis · Jan 16, 2025

https://twitter.com/x/status/1879900220001263658

DegustatoR · Jan 16, 2025

Yeah, caches are the same, it was pretty clear from the transistor density figures.

trinibwoy · Jan 16, 2025

I’ll wait for reviews but my enthusiasm is definitely fading. I’ll probably still get a 5090 for nvenc, better rt, framegen etc but it feels like I could’ve just bought a 4090 2 years ago instead. Oh well.

I think Ollie mentioned that the 4090 encoder still doesn’t hit 4K 120fps. If the 5090 also doesn’t get there it would suck.

Bludd · Jan 16, 2025

trinibwoy said:
I’ll wait for reviews but my enthusiasm is definitely fading. I’ll probably still get a 5090 for nvenc, better rt, framegen etc but it feels like I could’ve just bought a 4090 2 years ago instead. Oh well.

I think Ollie mentioned that the 4090 encoder still doesn’t hit 4K 120fps. If the 5090 also doesn’t get there it would suck.

yeah im just staying with my 4090 this gen and maybe the 6090 will replace it in the future

Scott_Arm · Jan 16, 2025

I think if you have a 60Hz display (yikes) or even a 120Hz display, then 40 series to 50 series looks like a bad upgrade. It's betting a lot on the adoption of all of the new neural shader stuff. Another two year wait is probably the best option. I have a 3080 and a 4080 would be, at worst, a card that gives me maybe 75% improved frame rate, which is a nice upgrade. Then I also get frame gen, which I currently don't have in a good form. Makes ray tracing viable etc. But 40 series to 50 series is maybe the weakest upgrade ever. The only good use case for that upgrade is really high refresh displays like 240, 360 and 480Hz. But that's basically one use case that relies on frame gen and someone having one of those displays or planning on buying one. They're expensive.

fellix · Jan 16, 2025

GB203 in particular looks like a straight rehash of AD103 -- similar die size, transistor density, clock rates and a minor bump in SM count. The GDDR7 support is a bright spot for bandwidth bottleneck situations.
TSMC wafer rates for latest nodes must really eyewatering even for Nvidia.

pcchen · Jan 16, 2025

I think if you already have a 4090 then it's probably not that a big upgrade to 5090 (as always). However, 4090 is quite bandwidth limited in many games @ 4K so the greately increased bandwidth of 5090 may help a lot. Another case is if you like to run path tracing games then 5090 may be a healthy upgrade.

Nvidia Blackwell Architecture Speculation

TopSpoiler

Memory transaction size

arandomguy

Memory transaction size

rSkip

DegustatoR

Rys

Graphics @ AMD

trinibwoy

Meh

DegustatoR

Seanspeed

DegustatoR

Seanspeed

DegustatoR

trinibwoy

Meh

Rys

Graphics @ AMD

Man from Atlantis

idk

DegustatoR

trinibwoy

Meh

Bludd

Experiencing A Significant Gravitas Shortfall

Scott_Arm

fellix

pcchen

Moderator