Speculation and Rumors: Nvidia Blackwell ...

  • Thread starter Deleted member 2197
  • Start date
What you mean by mistake? Take a look at the timeline. By the point NVIDIA sold/licensed GDeflate to Microsoft, they had already the hardware decompression unit ready. "Hardware accelerated JPEG decompression" in Hopper rings a bell? They had deflate support, 2 years ago.

Well Nvidia talked about hardware LZ and deflate decompression at GTC 2021. Direct Storage came out in 2022. Not seeing any real opportunity to bamboozle the competition here.

Btw, are deflate and gdeflate interchangeable? How exactly does a hardware deflate decompressor help with DS gdeflate?
 
Btw, are deflate and gdeflate interchangeable? How exactly does a hardware deflate decompressor help with DS gdeflate?
Not bitstream compatible, but only a couple of bit shuffles and extra padding bits away from being. Also more constrained.

If you have a hardware deflate decompressor, is trivial to extend it for GDeflate. But it does require to patch the frontend.

Same as LZ4 is mostly just Deflate with a different frontend extension that directly skips Huffman decoding.

Still the same though - even though it's 99% the same logic blocks, it's still an incompatible bitstream unless you were prepared.
 
Not bitstream compatible, but only a couple of bit shuffles and extra padding bits away from being. Also more constrained.

If you have a hardware deflate decompressor, is trivial to extend it for GDeflate. But it does require to patch the frontend.

Same as LZ4 is mostly just Deflate with a different frontend extension that directly skips Huffman decoding.

Still the same though - even though it's 99% the same logic blocks, it's still an incompatible bitstream unless you were prepared.

So are you suggesting that the hardware decompressor in Blackwell will be capable of decompressing both GDeflate and LZ4? And so devs can choose to use either format, both of which will be handled by dedicated hardware in Blackwell, but by either CPU or GPU compute (depending on format used) on all other GPU architectures?

And that the max decompression rate of the unit is presumably PCIe5 16x, so 64GB/s?
 
So are you suggesting that the hardware decompressor in Blackwell will be capable of decompressing both GDeflate and LZ4?
That would be my expectation, yes. If there is any hardware, all 3 formats are supported.
And that the max decompression rate of the unit is presumably PCIe5 16x, so 64GB/s?
Uncertain about that, not on consumer grade silicon. 64GB/s would be over engineered for several more SSD generations. But I do expect that non-consumer silicon does achieve that data rate, and be it only for multiple streams.
 
I have no doubt the $10b claim is an exaggeration or misleading to some decent degree. It's not actually bad PR for them to talk about the immense amount of money they can spend on something when they're the leaders. It makes them look super healthy and impossible to compete with.
 
According to Kopite7kimi the RTX 5080 release should preceed the RTX 5090.
The NVIDIA GeForce RTX 50 "Blackwell" GPU family is expected to launch in Q4 2024 and will first include two products, the GeForce RTX 5090 & the GeForce RTX 5080. With the "Ada Lovelace" RTX 40 lineup, we saw NVIDIA introduce the GeForce RTX 4090 first followed by the RTX 4080. Both top-tier cards launched a month apart from each other but this time, it looks like NVIDIA has decided to launch the "80" model first.
 
Just throwing out some possiblilties here.

RTX 2080 ended up having availablity one week earlier than RTX 2080ti.

RTX 5090 if being the dual MCM design might be reserved for Pro Viz and FE with AiB availability later (or at all) due to various factors and considerations.

Especialy if RTX 5090 does end up having >24GB via a larger bus size they're going to want to heavily control how they product segment that as a feature.

I've thrown this theory out before but I have a feeling 5xxx/Ada will sell on features and other factors initially and this may be the trend going forward with $/performance being relegated to mid-gen.
 
RTX 5090 if being the dual MCM design might be reserved for Pro Viz and FE with AiB availability later (or at all) due to various factors and considerations.

You think there’s a chance Nvidia will waste precious CoWoS capacity on a lowly 5090? Seems unlikely especially if there’s no competition in the high end.
 
Well personally I don't know exactly what form a 5090 will take and therefore if it will even use CoWoS.

However strategically I do think it's likely they will want to the feed the market, if not in large volumes initially, just to maintain a presence and mindshare. Also from a developmental pipeline standpoint it could be prudent to have some real world data on how to proceed with MCM going forward. Depending on how they ultimately segment, yield and harvest in combination with the Pro line it may mitigate opporunity cost as well.

In either scenario it means it's certainly very possible a 5090 does not release at launch before a 5080, or even if it does its heavily restricted and possibly not given to AiBs as well (which would restrict the rumour mill somewhat).

Also I might be misunderstanding your initial post there, and therefore this discussion might have the wrong context, as I read it as it's 100% not true in response other rumour, which means "it's 100% not true the 5080 will be first?"
 
Also from a developmental pipeline standpoint it could be prudent to have some real world data on how to proceed with MCM going forward. Depending on how they ultimately segment, yield and harvest in combination with the Pro line it may mitigate opporunity cost as well.

The margins on the ProViz stuff are nice but probably not as nice as AI. Of all the rumored reasons for canceling RDNA 4 the most reasonable is that AMD didn’t want to burn scarce CoWoS capacity on consumer chips. B200 should be a good enough test bed for MCM coupling. HPC workloads are more MCM friendly anyway. I don’t see the need to rush into a super expensive and complicated MCM graphics flagship right now given absent competition and the opportunity cost of allocating CoWoS capacity away from AI.

Also I might be misunderstanding your initial post there, and therefore this discussion might have the wrong context, as I read it as it's 100% not true in response other rumour, which means "it's 100% not true the 5080 will be first?"

I was just poking fun at the “ture” typo in the twitter post. Sorry :)
 
Assuming the rumor of 192SM's is true(full chip, a 5090 will likely be cut down from that), this represents a 33% increase on AD102's 144SM's. If there was a shrink to 3nm, I dont think many people would question this could be done monolithic, but still being on 4nm, there's gonna be limited scaling here possible. It's still potentially possible if they take what minor process shrink they can, try and push on design density, and crank up the die size to 700mm²+. But I'd also question whether they could then also do a large L2 + 512-bit bus. Maybe one or the other, but I kinda doubt both unless it's gonna be some near 800mm² monster.

It's also possible the 192SM rumor part is untrue, and will be less. But then that would also really require Nvidia to provide a big increase in performance per SM(focus on clockspeeds over density) or else the 5090 wont represent that big of a leap over 40 series. I kinda like this option better just for potential pricing concerns of a 700mm²+ GPU...

I doubt the chiplet/MCM speculation as well for the reasons stated above by others. But it's not impossible Nvidia had simply already committed to this well ahead of time. Which would be great, and make some quite sky high specs within the realm of possible.
 
Highest end seems to be only mindshare, I doubt there's much demand there otherwise. Besides, that 512bit bus says "AI" all over it. They want to make a datacenter inference part then maybe shovel the salvage dies off to consumers after. They don't need chiplets to make that profitable yet, and there's all the trouble chiplets give with trying to run DirectX/Vulkan/etc. on it aka "gaming".

A 45% faster 5090, accounting for salvage and a very modest clockspeed increase, on 525w aircooled or so, sounds like Nvidia would be fine with circa when they were planning this thing out.
 
Back
Top