Nvidia Blackwell Architecture Speculation

DegustatoR · Dec 31, 2024

Broopster said:
It is wild we haven’t gotten a single credible performance leak yet.

Not that wild. Nvidia generally doesn't release any performance numbers or even drivers prior to the announcement.

Broopster · Dec 31, 2024

DegustatoR said:
Not that wild. Nvidia generally doesn't release any performance numbers or even drivers prior to the announcement.

Yeah but the last two releases have had benches leaked months in advance. Maybe they made adjustments to how they handle roll out to partners and OEMs to keep a lid on things.

Ext3h · Jan 1, 2025

IQandHDR said:
Perhaps new hardware blocks for "Neural rendering"?

No, that's just the tensor cores doing their usual business. If NVidia decides to make that Blackwell only, then it's a pure software restriction. I expect only a minor reduction in the overhead of switching between tensors, which is effectively equivalent to being able to switch between multiple classic textures. So possibly some memory management detail, to permit efficient arrays of tensors similar in a use case similar to array textures. The hidden (firmware internal) details usually deal with the prefetch logic necessary to have the right amount of data in L1 at the right point in time.

DegustatoR said:
All videocards are capable of "neural rendering" and of course there will be "advanced DLSS" and "enhanced RT" on future products.

It's slightly more complicated than that. "Neural rendering" with a single texture from a deferred texturing style compute shader is really simple and brutally efficient since you don't even need to reload the expensive tensor that has the texture encoded, but only the short input vector from the G-buffer. But doing that for a setup with many textures needs some reasonably smart deferred batching by texture. You are trading effectively mip-chains and their very small L1 footprint (much waste in VRAM, very little in L1) for something that isn't even remotely as L1 cache friendly (but much friendlier on VRAM).

I suspect there's going to be a little trick for getting it fast enough even without aggressive batching.

DavidGraham said:
No no, something is coming ... NVIDIA just released a new SDK for "In-Game Inference".

NVIDIA In-Game Inferencing SDK

Integrate AI models into apps to manage deployment across devices.

developer.nvidia.com

Red herring. Unless I'm mistaken, the motivation for that API doesn't appear to be rendering at all, but rather to enable efficient scheduling of generic inferencing tasks related to the game logic. Not so sure if that will even start to have any relevance for the next 2-3 years, especially considering that simple AI related inference doesn't need to be offloaded to the GPU in the first place. Plus hiding the details behind an opaque, proprietary interface sounds like a stillbirth which will at most hinder integration of costlier AI features such dynamic voice and dialogue synthesis, which is what we are probably going to see more tech demos for. Then again, those options are so VRAM hungry they don't fit the lower half of the product range either. (And if they do, they are mostly still cheap enough to run them somewhere on the CPU anyway.)

techuse · Jan 1, 2025

Ext3h said:
No, that's just the tensor cores doing their usual business. If NVidia decides to make that Blackwell only, then it's a pure software restriction. I expect only a minor reduction in the overhead of switching between tensors, which is effectively equivalent to being able to switch between multiple classic textures. So possibly some memory management detail, to permit efficient arrays of tensors similar in a use case similar to array textures. The hidden (firmware internal) details usually deal with the prefetch logic necessary to have the right amount of data in L1 at the right point in time.

It's slightly more complicated than that. "Neural rendering" with a single texture from a deferred texturing style compute shader is really simple and brutally efficient since you don't even need to reload the expensive tensor that has the texture encoded, but only the short input vector from the G-buffer. But doing that for a setup with many textures needs some reasonably smart deferred batching by texture. You are trading effectively mip-chains and their very small L1 footprint (much waste in VRAM, very little in L1) for something that isn't even remotely as L1 cache friendly (but much friendlier on VRAM).

I suspect there's going to be a little trick for getting it fast enough even without aggressive batching.

Red herring. Unless I'm mistaken, the motivation for that API doesn't appear to be rendering at all, but rather to enable efficient scheduling of generic inferencing tasks related to the game logic. Not so sure if that will even start to have any relevance for the next 2-3 years, especially considering that simple AI related inference doesn't need to be offloaded to the GPU in the first place. Plus hiding the details behind an opaque, proprietary interface sounds like a stillbirth which will at most hinder integration of costlier AI features such dynamic voice and dialogue synthesis, which is what we are probably going to see more tech demos for. Then again, those options are so VRAM hungry they don't fit the lower half of the product range either. (And if they do, they are mostly still cheap enough to run them somewhere on the CPU anyway.)

Do you think we will get anything new or just more generated frames/higher quality upscaling?

Cappuccino · Jan 1, 2025

trinibwoy said:
Yeah funny how it’s impossible to find a simple queue system at US retailers. It’s every man or woman for themselves. So uncivilized.

I had my heart set on a 5090 but $4000 would definitely make me think twice.

During the shortages AMD.com had a semi-functional queue system when they had their drops once a week on Thursday. That’s the only one I saw, EVGA had a different system where you joined a waitlist as well.

Cappuccino · Jan 1, 2025

DegustatoR said:
Not that wild. Nvidia generally doesn't release any performance numbers or even drivers prior to the announcement.

I remember having very accurate information on Ampere and Ada prior to the announcement. Even more information from the AMD side. It’s like crickets this time, feels like enthusiasm is rather low this time around.

DegustatoR · Jan 1, 2025

Cappuccino said:
I remember having very accurate information on Ampere and Ada prior to the announcement.

I remember that it was not just inaccurate but in fact completely fake.

troyan · Jan 1, 2025

Broopster said:
It is wild we haven’t gotten a single credible performance leak yet.

Somebody heard you. Timespy leak from the mobile 5060: https://videocardz.com/newz/nvidia-...u-3dmark-leak-shows-33-increase-over-rtx-4060

arandomguy · Jan 3, 2025

Cappuccino said:
I remember having very accurate information on Ampere and Ada prior to the announcement. Even more information from the AMD side. It’s like crickets this time, feels like enthusiasm is rather low this time around.

Keep in mind with Ada there was the Nvidia hack which happened. As such there was quite a bit of leaked data, which is why you had articles like this months out - https://semianalysis.com/2022/04/16/nvidia-ada-lovelace-leaked-specifications/

It was also very known that Ada would not be a significant departure architecturally from Ampere. What was more influx rumour price was the clock speed, power and of course specific SKU configuration.

In terms of Ampere the notable SM change with x2 FP32 was not known and caused some confusion in the rumour mill until very near release from what I remember.

With Blackwell it's somewhat similar. Just based on the CUDA versioning changes it's largely assumed that it will likely have signficant changes architecturally. This makes guesstimating numbers is likely going to be very tricky.

del42sa · Jan 3, 2025

https://videocardz.com/newz/nvidia-...dly-features-tdp-of-575w-rtx-5080-set-at-360w

not 600W but 575W for rtx 5090 ...

Digidi · Jan 3, 2025

Hmm is it coming with water cooling? I think it was allready hard to cool the 4090 by air.

DegustatoR · Jan 3, 2025

Digidi said:
Hmm is it coming with water cooling? I think it was allready hard to cool the 4090 by air.

Not at all. 4090 was easily cooled by the monstrosity coolers most cards had.

Albuquerque · Jan 4, 2025

DegustatoR said:
Not at all. 4090 was easily cooled by the monstrosity coolers most cards had.

Agreed. There were even "small" 4090's with two fans instead of three, and those seemed to perform fine. My ASUS Gaming 4090 which runs 24/7 at 2715MHz core / 11251MHz mem sits at around 76*C under full load, and it's fully loaded all day every day. It's also air cooled alongside my similarly air-cooled 9800x3d, which wanders between 40*C and 88*C depending on what's going on.

This reminds me of a not-so-old quote of mine when we were discussing the "upcoming" ADA series announcements:

Albuquerque said:
My point wasn't the uber-overclock cards hitting 450W max, and it wasn't a bunch of cards chained together to aggregate a bunch of wattage either.

It's a single die, at "typical" power draw, finally getting into the 450W category is getting pretty steep. We could reasonably extrapolate the OC cards of that era are going to be into the 600W range. Which then goes back to my statement: you're about two full iterations away from a "typical" wattage being in the mid to high 600's, with OC cards nearing 800W or more. That's literally a hotplate sufficient for searing steak.

If NVIDIA's OE 5090 reference design really has a thermal design power in the just-shy-of-600W range, this can reasonably extrapolate to das uberclocken cards bordering on 700W if not more... And when the 60-series lands, we find ourselves searing steak on our GPUs. Yikes.

Time for a vent hood in your gaming room

Scott_Arm · Jan 4, 2025

Albuquerque said:
Agreed. There were even "small" 4090's with two fans instead of three, and those seemed to perform fine. My ASUS Gaming 4090 which runs 24/7 at 2715MHz core / 11251MHz mem sits at around 76*C under full load, and it's fully loaded all day every day. It's also air cooled alongside my similarly air-cooled 9800x3d, which wanders between 40*C and 88*C depending on what's going on.

This reminds me of a not-so-old quote of mine when we were discussing the "upcoming" ADA series announcements:

If NVIDIA's OE 5090 reference design really has a thermal design power in the just-shy-of-600W range, this can reasonably extrapolate to das uberclocken cards bordering on 700W if not more... And when the 60-series lands, we find ourselves searing steak on our GPUs. Yikes.

Time for a vent hood in your gaming room

Honestly, air conditioning is a serious consideration with top of the line pcs. People are going to give themselves heat stroke.

Albuquerque · Jan 5, 2025

My office has a 9800X3D + 4090 and separately an i7-3930k with both a 3080 Ti and a 4070 Super which are all folding at all hours of the day. During the winter months it's actually quite pleasant, during the summer it gets a tad stifling

homerdog · Jan 6, 2025

Is Blackwell RTX5000 getting revealed today?

DegustatoR · Jan 6, 2025

Depends on where you live. Will be tomorrow for most of Europe and Asia.

troyan · Jan 6, 2025

In 11h30m:

NVIDIA GeForce Special Events at CES 2025

Tune in to the opening keynote by NVIDIA CEO Jensen Huang.

www.nvidia.com

homerdog · Jan 6, 2025

troyan said:
In 11h30m:

NVIDIA GeForce Special Events at CES 2025

Tune in to the opening keynote by NVIDIA CEO Jensen Huang.

www.nvidia.com

Video unavailable?

Well it actually let me see the link when I quoted you. Weird.

troyan · Jan 6, 2025

Link to youtube doesnt work here.

Nvidia Blackwell Architecture Speculation

DegustatoR

Broopster

Ext3h

NVIDIA In-Game Inferencing SDK

techuse

Cappuccino

Cappuccino

DegustatoR

troyan

arandomguy

del42sa

Digidi

DegustatoR

Albuquerque

Red-headed step child

Scott_Arm

Albuquerque

Red-headed step child

homerdog

donator of the year

DegustatoR

troyan

NVIDIA GeForce Special Events at CES 2025

homerdog

donator of the year

NVIDIA GeForce Special Events at CES 2025

troyan