Speculation and Rumors: Nvidia Blackwell ...

Arun · May 29, 2024

Erinyes said:
And why would you duplicate the display, video, PCIE stuff, etc. It's a non-trivial amount of silicon and certainly not cheap.

I think most of this sounds like nonsense, but duplicating all of that might not be that bad given the yields recovery benefit.

I could see something like a M1 Pro/Max physical layout strategy making sense (google it and come to your own conclusions regarding what that means) but at that point it’s literally just an engineering implementation detail and not something anyone outside the company has any reason to care about (outside of geeking out about these kinds of things of course

)

Also no consumer/graphics GPUs have had a A100/H100-like Split L2, so if that was the case, it would be noteworthy. I am not sure why they’d want one either though, given how superior the RTX 4090 L2 cache is to the H100’s L2 in the ways that matter for consumer GPUs (i.e. H100 likely benefits from higher maximum cache bandwidth due to massive HBM DRAM bandwidth and AI matrix multiplication workloads access patterns likely fit the split slightly better, but besides that the AD102 L2 is lower latency and higher capacity while still not taking a huge percentage of the die).

My guess is it’s a very traditional bruteforce monolithic 800mm2+ 512-bit GDDR7 chip on N4P with a single L2, and *maybe* GB203 is “just” a cut GB202 but in a way that just saves engineering/verification effort and might even still require a separate tape-out depending on their methodology. I’d be pleasantly surprised if it was more interesting than that but I don’t see what benefit they get from it at their volumes tbh.

DegustatoR · May 29, 2024

Arun said:
with a single L2

Nvidia's L2 in graphics GPUs has never been "single", it is partitioned to be tied to MCs and is usually split in two part in physical layouts. So whatever GB202 is doing to be "MCM-like" it is unlikely to be the L2 split.

Arun · May 29, 2024

DegustatoR said:
Nvidia's L2 in graphics GPUs has never been "single", it is partitioned to be tied to MCs and is usually split in two part in physical layouts. So whatever GB202 is doing to be "MCM-like" it is unlikely to be the L2 split.

That’s fair, the key distinction between A100-style split L2 and AD102-style L2 is that on the latter, a given physical memory address always maps to the same partition and set of the L2, and the data is only ever present in a single set.

Having an A100-style L2 would probably be relatively simple for NVIDIA to implement in GB202 but once again I’m not sure why they’d need/want to do so unless they are aiming for a massively larger L2 than AD102 and that seems wasteful combined with the rumoured 512-bit GDDR7.

trinibwoy · May 29, 2024

If Nvidia’s RT implementation is similar to their published patents then tracing may be latency sensitive. The patent describes a scheduling mechanism that selects rays from a relatively small pool with minimal latency hiding opportunities. A large L2 may be helpful in ensuring BVH nodes are readily available and avoid stalls on those active rays.

Krteq · May 29, 2024

Can you please share links to those published patents? Thx in advance

Frenetic Pony · May 29, 2024

Supposed "2 slot cooler for 5090 FE"

https://twitter.com/x/status/1795710634820268111

but 2 slot doesn't seem right for the 5090, there's no major node jump, Blackwell compute does not demonstrate any major efficiency gains, and there's no way Nvidia would allow performance recessions.

I could see an AI dedicated version that has a 2 slot cooler. Binned super low compute clock/384bit bus w/high memory speed. A bin specifically for high memory/low clockspeed/slim server SKU. But two slot cooler for a 5090, like with "it's logically MCM" it doesn't sound like Kopite knows shit.

RedVi · May 30, 2024

I could believe < 3 slot, as in, 2.x slot. But an exact 2 slot design does seem unlikely. It is possible his source referred to a 2.x slot design but did not word it specifically.

trinibwoy · May 30, 2024

Krteq said:
Can you please share links to those published patents? Thx in advance

There are a bunch but these links are a good place to start. Check out the cited patents on these pages for links to other elements of the design.

US11645810B2 - Method for continued bounding volume hierarchy traversal on intersection without shader intervention - Google Patents

A hardware-based traversal coprocessor provides acceleration of tree traversal operations searching for intersections between primitives represented in a tree data structure and a ray. The primitives may include opaque and alpha triangles used in generating a virtual scene. The hardware-based...

patents.google.com

US11157414B2 - Method for efficient grouping of cache requests for datapath scheduling - Google Patents

In a ray tracer, a cache for streaming workloads groups ray requests for coherent successive bounding volume hierarchy traversal operations by sending common data down an attached data path to all ray requests in the group at the same time or about the same time. Grouping the requests provides...

patents.google.com

trinibwoy · May 30, 2024

RedVi said:
I could believe < 3 slot, as in, 2.x slot. But an exact 2 slot design does seem unlikely. It is possible his source referred to a 2.x slot design but did not word it specifically.

Also 2 slots doesn't make sense with the 3 PCB rumor. Surely a 2-slot card is too narrow to host a PCB parallel to the motherboard that's large enough to accommodate all of the components.

TopSpoiler · May 30, 2024

https://videocardz.com/newz/nvidia-rtx-5090-new-rumored-specs-28gb-gddr7-and-448-bit-bus

del42sa · May 30, 2024

TopSpoiler said:
https://videocardz.com/newz/nvidia-rtx-5090-new-rumored-specs-28gb-gddr7-and-448-bit-bus

so full 512 bit bus with 32gb RAM going to be new Titan / Ti

DegustatoR · May 30, 2024

del42sa said:
so full 512 bit bus with 32gb RAM going to be new Titan

5090Ti a year later against RNDA5.
No point in new Titans (i.e. a cheap entry product to HPC and AI) when everyone is spending billions on their DC solutions.

gamervivek · May 31, 2024

Worse case, 5080 is bumped to 5090 with return of Titan branding with the concomitant price tag and a full(er) chip Titan Super releasing a year later.

DegustatoR · May 31, 2024

gamervivek said:
Worse case, 5080 is bumped to 5090

5080 will be about on par with 4090 so that wouldn't work.

gamervivek · May 31, 2024

I hope they do better. No full-chip 4090 helps and 5080 could be around 20-30 faster to be labeled 5090.

Also, the cut-down 28GB does not bode well for my 8k gaming wish.

trinibwoy · May 31, 2024

gamervivek said:
Also, the cut-down 28GB does not bode well for my 8k gaming wish.

Why do you need more than 28GB for 8K?

Seanspeed · May 31, 2024

gamervivek said:
Worse case, 5080 is bumped to 5090 with return of Titan branding with the concomitant price tag and a full(er) chip Titan Super releasing a year later.

GB203 is looking too cut down from GB202 to be able to do this. They can still sell a 5080 for over $1000 either way, so they can have their cake and eat it too. GB202 is looking like it might be another 2080Ti-esque monstrosity of a GPU, so they'll just jack up the price of the 5090 to like $1800-2000. People will pay it and defend it because of how big the performance difference is from the 5080(which would be done on purpose specifically for this reaction).

DegustatoR · May 31, 2024

Seanspeed said:
they'll just jack up the price of the 5090 to like $1800-2000

Can't "jack up a price" on something which doesn't have any price right now.

gamervivek · May 31, 2024

trinibwoy said:
Why do you need more than 28GB for 8K?

8k games get close to 24GB usage, so with 8k assets I'd think 28GB would be cutting it too close.
Also, FG VRAM usage increases with resolution. When I tested it at 8k output, it was using around 6GB compared to 2-3GB at 4k. We might also see 2 generated frames now.

Frenetic Pony · May 31, 2024

gamervivek said:
8k games get close to 24GB usage, so with 8k assets I'd think 28GB would be cutting it too close.
Also, FG VRAM usage increases with resolution. When I tested it at 8k output, it was using around 6GB compared to 2-3GB at 4k. We might also see 2 generated frames now.

Buffers don't take that much, and the only titles I'm aware of with 8k textures are the Crysis Remasters so far, which don't take 24gb. The only title that seems to want above 16gb at all so far is Frontiers of Pandora, on the hidden Unobtanium settings, and it's really unclear how much of the memory it reserves is actually needed (someone would need to go through with a debugging tool running to figure it out).

This rumor doesn't make much practical sense, but I'm not going to claim what Nvidia does for it's uppermost tier makes practical sense. The Titan RTX launched at a then unprecedented 280w and still unprecedented $2499 back in 2018, the entire point is to grab headlines first, and see how much profit margin they can squeeze out of the highest paying customers a close second. "Practical sense" is for anything below that.

Speculation and Rumors: Nvidia Blackwell ...

Arun

Unknown.

DegustatoR

Arun

Unknown.

trinibwoy

Meh

Krteq

Frenetic Pony

RedVi

trinibwoy

Meh

US11645810B2 - Method for continued bounding volume hierarchy traversal on intersection without shader intervention - Google Patents

US11157414B2 - Method for efficient grouping of cache requests for datapath scheduling - Google Patents

trinibwoy

Meh

TopSpoiler

del42sa

DegustatoR

gamervivek

DegustatoR

gamervivek

trinibwoy

Meh

Seanspeed

DegustatoR

gamervivek

Frenetic Pony