Speculation and Rumors: Nvidia Blackwell ...

And why would you duplicate the display, video, PCIE stuff, etc. It's a non-trivial amount of silicon and certainly not cheap.
I think most of this sounds like nonsense, but duplicating all of that might not be that bad given the yields recovery benefit.

I could see something like a M1 Pro/Max physical layout strategy making sense (google it and come to your own conclusions regarding what that means) but at that point it’s literally just an engineering implementation detail and not something anyone outside the company has any reason to care about (outside of geeking out about these kinds of things of course :))

Also no consumer/graphics GPUs have had a A100/H100-like Split L2, so if that was the case, it would be noteworthy. I am not sure why they’d want one either though, given how superior the RTX 4090 L2 cache is to the H100’s L2 in the ways that matter for consumer GPUs (i.e. H100 likely benefits from higher maximum cache bandwidth due to massive HBM DRAM bandwidth and AI matrix multiplication workloads access patterns likely fit the split slightly better, but besides that the AD102 L2 is lower latency and higher capacity while still not taking a huge percentage of the die).

My guess is it’s a very traditional bruteforce monolithic 800mm2+ 512-bit GDDR7 chip on N4P with a single L2, and *maybe* GB203 is “just” a cut GB202 but in a way that just saves engineering/verification effort and might even still require a separate tape-out depending on their methodology. I’d be pleasantly surprised if it was more interesting than that but I don’t see what benefit they get from it at their volumes tbh.
 
Last edited:
Nvidia's L2 in graphics GPUs has never been "single", it is partitioned to be tied to MCs and is usually split in two part in physical layouts. So whatever GB202 is doing to be "MCM-like" it is unlikely to be the L2 split.
That’s fair, the key distinction between A100-style split L2 and AD102-style L2 is that on the latter, a given physical memory address always maps to the same partition and set of the L2, and the data is only ever present in a single set.

Having an A100-style L2 would probably be relatively simple for NVIDIA to implement in GB202 but once again I’m not sure why they’d need/want to do so unless they are aiming for a massively larger L2 than AD102 and that seems wasteful combined with the rumoured 512-bit GDDR7.
 
If Nvidia’s RT implementation is similar to their published patents then tracing may be latency sensitive. The patent describes a scheduling mechanism that selects rays from a relatively small pool with minimal latency hiding opportunities. A large L2 may be helpful in ensuring BVH nodes are readily available and avoid stalls on those active rays.
 
Supposed "2 slot cooler for 5090 FE"


but 2 slot doesn't seem right for the 5090, there's no major node jump, Blackwell compute does not demonstrate any major efficiency gains, and there's no way Nvidia would allow performance recessions.

I could see an AI dedicated version that has a 2 slot cooler. Binned super low compute clock/384bit bus w/high memory speed. A bin specifically for high memory/low clockspeed/slim server SKU. But two slot cooler for a 5090, like with "it's logically MCM" it doesn't sound like Kopite knows shit.
 
I could believe < 3 slot, as in, 2.x slot. But an exact 2 slot design does seem unlikely. It is possible his source referred to a 2.x slot design but did not word it specifically.
 
Can you please share links to those published patents? Thx in advance

There are a bunch but these links are a good place to start. Check out the cited patents on these pages for links to other elements of the design.

 
I could believe < 3 slot, as in, 2.x slot. But an exact 2 slot design does seem unlikely. It is possible his source referred to a 2.x slot design but did not word it specifically.

Also 2 slots doesn't make sense with the 3 PCB rumor. Surely a 2-slot card is too narrow to host a PCB parallel to the motherboard that's large enough to accommodate all of the components.
 
Worse case, 5080 is bumped to 5090 with return of Titan branding with the concomitant price tag and a full(er) chip Titan Super releasing a year later.
 
I hope they do better. No full-chip 4090 helps and 5080 could be around 20-30 faster to be labeled 5090.

Also, the cut-down 28GB does not bode well for my 8k gaming wish.
 
Worse case, 5080 is bumped to 5090 with return of Titan branding with the concomitant price tag and a full(er) chip Titan Super releasing a year later.
GB203 is looking too cut down from GB202 to be able to do this. They can still sell a 5080 for over $1000 either way, so they can have their cake and eat it too. GB202 is looking like it might be another 2080Ti-esque monstrosity of a GPU, so they'll just jack up the price of the 5090 to like $1800-2000. People will pay it and defend it because of how big the performance difference is from the 5080(which would be done on purpose specifically for this reaction).
 
Why do you need more than 28GB for 8K?

8k games get close to 24GB usage, so with 8k assets I'd think 28GB would be cutting it too close.
Also, FG VRAM usage increases with resolution. When I tested it at 8k output, it was using around 6GB compared to 2-3GB at 4k. We might also see 2 generated frames now.
 
8k games get close to 24GB usage, so with 8k assets I'd think 28GB would be cutting it too close.
Also, FG VRAM usage increases with resolution. When I tested it at 8k output, it was using around 6GB compared to 2-3GB at 4k. We might also see 2 generated frames now.

Buffers don't take that much, and the only titles I'm aware of with 8k textures are the Crysis Remasters so far, which don't take 24gb. The only title that seems to want above 16gb at all so far is Frontiers of Pandora, on the hidden Unobtanium settings, and it's really unclear how much of the memory it reserves is actually needed (someone would need to go through with a debugging tool running to figure it out).

This rumor doesn't make much practical sense, but I'm not going to claim what Nvidia does for it's uppermost tier makes practical sense. The Titan RTX launched at a then unprecedented 280w and still unprecedented $2499 back in 2018, the entire point is to grab headlines first, and see how much profit margin they can squeeze out of the highest paying customers a close second. "Practical sense" is for anything below that.
 
Last edited:
Back
Top