Nvidia Blackwell Architecture Speculation

  • Thread starter Deleted member 2197
  • Start date
I've a hard time believing that given the relatively tiny jump from there to the 5080 but significant price increase. If true though I would very likely get one - mostly for the 16GB VRAM but the performance boost would be nice too.
You could *try* to get one :LOL:

I got a feeling these are gonna fly off the shelves like nothing we've seen yet.
 
I've a hard time believing that given the relatively tiny jump from there to the 5080 but significant price increase. If true though I would very likely get one - mostly for the 16GB VRAM but the performance boost would be nice too.
$750 to $1000 is a +33% price increase but it's the same chip and the same VRAM and all so the difference should be somewhat small, likely less than that on price.
One thing to ponder upon is that 5070 and 5070Ti prices are likely set with knowledge of where Navi 48 will be and they are the most likely reason why AMD has scrapped their RDNA4 launch at CES.
So a smaller perf gap can be a result of these cards actually having some competition.
 
I've a hard time believing that given the relatively tiny jump from there to the 5080 but significant price increase. If true though I would very likely get one - mostly for the 16GB VRAM but the performance boost would be nice too.

I'm guessing it's from the Videocards article - https://videocardz.com/newz/geforce...ender-benchmark-7-6-faster-than-4070-ti-super

But the "source" they use shows a 4070 ti super scoring higher.


With Geekbench in general I'm not sure how people are actually judging representative scores whenever they cite from the database.

As for Blender results cited the it shows the 5080 as 20% faster than the 5070ti. But the 4080 Super is also 20% faster than the 4070 Ti super. This time around the 5070ti is $50 cheaper (at least officially MSRP), so the value differential is higher but not really that dramatically so.

For context if it does follow the same product stack targets it's worth keeping in mind the 4080 Super was only just under 20% faster on aggregate than the 4070 Ti Super according to the newest TPU numbers at 4k - https://www.techpowerup.com/review/gpu-test-system-update-for-2025/2.html

including RT numbers -
 
NVIDIA definitely runs tensor and fp32 ops concurrently, especially now with their tensor cores busy almost 100% of the time (doing upscaling, frame generation, denoising, HDR post processing, and in the future neural rendering).

Latest NVIDIA generations have become exceedingly better at mixing all 3 workloads (tensor+ray+fp32) concurrently, I read somewhere (I can't find the source now) that ray tracing + tensor are the most common concurrent ops, followed by ray tracing + fp32/tensor + fp32.


Informative threads. Confirms my understanding of how instruction issue works on Nvidia.

It's interesting that Nvidia never talks about the uniform math pipeline while AMD marketing makes a big deal about the scalar pipe. Presumably they perform similar functions.

turing-udp1.png

turing-udp2.png
 
Will we see a future where high end gaming PCs have to be connected to 240V outlets like you use for your dryer? A PC with a 5090 and 14900K could already use half the continuous capacity (80%) of a 120V 20A breaker.
Nah, running those lines is actually a lot more expensive than regular cables, and presumably you’d need to have a bunch so people could plug things in anywhere they wanted. Also a 20A outlet can run 1800W continuously (80% of max load).
 
And if somehow 1900W (20A at 80% derate for code is 16A at 120v nominal, which is ~1920W) isn't enough, then the same 20A wiring for US code should also support the 6-20 receptacle, which is two 20A hots and a ground rather than a single 20A hot, a neutral and ground. It would take up one additional space in your breaker box, but would swap the 5-20 receptacle you have today for a 6-20 receptacle, which would provide double the voltage and thus double the available wattage. IF/when we get to the point where 3800W continuous isn't enough for your favorite PC gaming session, well, then we have a larger problem! Although, 1800W as a ceiling isn't as far off as some might think.

The real challenge is most outlets used for PC gaming aren't a single receptacle on a single breaker. As an example, all the outlets within my office connect to a single 20A breaker, and my office includes a Bambu X1C 3D printer, another bigass Fedora Linux PC with its own 27" Dell Ultrasharp monitor, a small drinks fridge, and two LED lamps. Today, the office breaker regularly sees ~1100W continuous load with "spikes" to 1300W with only my gaming rig and my Linux rig both folding 24/7. When I crank up the 3D printer, the outlet will absolutely get to 1500W sustained for an hour or two or more. And when the beer fridge compressor kicks on? Another 50-60W depending. Also while that Dell Ultrasharp is awake, just by itself is another ~100W (it's an old model.)

Yes, the room stays nice and warm :D And I'm within a few hundred watts of looking at dropping a 240v outlet into my own office without even talking about a new video card.
 
Informative threads. Confirms my understanding of how instruction issue works on Nvidia.

It's interesting that Nvidia never talks about the uniform math pipeline while AMD marketing makes a big deal about the scalar pipe. Presumably they perform similar functions.

According to an open source driver developer who writes drivers for Nvidia HW, their UGPRs have really complex rules and stringent restrictions for usage ...

I assume it explains why their hardware is unable to optimize the divergent indexing of constant buffers (cbuffer{float4} load linear/random) in this perftest because divergent resource access breaks one of the conditions that "uniform control flow" must be required to make use of UGPRs. They speculate that it's mainly designed for address calculations at the start of a program ...
 
Yeah I think @Kaotik has it right -- I can't imagine a GeForce-branded (ie consumer) card makes ANY sense with 96GB of VRAM; this has to be a datacenter product.
 
Good point. I know Micron talked about creating 32 Gbit GDDR6 modules, but I can't find anywhere suggesting they ever did or they have any for sale. The spec does allow for such a capacity...
 
Oh yeah 100% it's pure aftermarket. They have to use a different PCB because the "stock" 4090 card doesn't have a spot for clamshell-mounted DRAM chips. The rumored 4090-96GB would be the same design but ostensibly with 32Gbit chips (instead of the 16Gbit ones available today.) The trick is, 32 Gbit chips don't seem to exist...
 
The discussion presupposes that it's possible to fit 96 GB on an AD102, which it just isn't. 12 memory channels, two DRAMs per channel, 2 GB per DRAM = 48 GB.

A GB202 with 96 GB makes sense and is almost certainly gonna be the RTX 6000 Blackwell
Yeah pretty obvious, 3GB Modules + 512-bit bus = 48GB/96GB. That said I expect RTX 60 GeForce to use 3GB modules across the board (maybe 6090 has 42GB and a 6090 Ti with 48GB? And hopefully RDNA 5/UDNA Gen 1 has a top end option as well).
Good point. I know Micron talked about creating 32 Gbit GDDR6 modules, but I can't find anywhere suggesting they ever did or they have any for sale. The spec does allow for such a capacity...
Following on that, I don't expect 32 Gibt/4GB Modules to appear until like RTX 70, UDNA Gen 2/RDNA 6 and maybe Xe4 dGPU if that's 2028.
 
Back
Top