AMD RDNA4 Architecture Speculation

You're very naive to think, that there isn't a substantial price difference. Why should the suppliers even develop faster Ram, if they can't charge a premium?
Especially at the beginning with Samsung as only supplier Nvidia will pay a significant higher amount. This might change at the end of the year, when Micron and Hynix have their GDDR7 on the market too.

I wouldn't be surprised, if there wouldn't be a big margin difference comparing 5070Ti with the 9070XT at launch price levels. But Nvidia has the 5080 to increase their margin of GB203, while the 9070 decreases AMD total margin of the chip.
20-21 Gbps G6 isn't exactly slow or available from anyone either.
 
According to TPU FSR4 requires 779 AI TOPS which pretty much confirms it has very little to do with PSSR (which runs on the PS5 Pros 300 TOPs) and will hopefully be a much superior solution. Also the 9070 (non XT) offers almost 1200 TOPs or around 4x the PS5 Pros AI capability at raster levels which are presumably more like 50% higher, so clearly little to no architectural relation there either from an AI perspective.

As a product the 9070XT seems pretty exciting. ~4070Ti Super level performance for 75% of the price with what will hopefully be an upscaler comparable to DLSS 3 along with comparable frame gen capabilities. They even apparently have their own AI based denoiser in response to Ray Reconstruction. Hopefully it's competitive.

Where do these numbers come from? By comparison, how much does DLSS4 require?
TPU has somehow misunderstood that "Up to 779 TOPS AI-Acceleration via AMD RDNA 4 Architecture" in FSR 4 slide means it would need that, while in reality that's what RX 9070 XT has at FP8 precision used by FSR 4.
 
TPU has somehow misunderstood that "Up to 779 TOPS AI-Acceleration via AMD RDNA 4 Architecture" in FSR 4 slide means it would need that, while in reality that's what RX 9070 XT has at FP8 precision used by FSR 4.
Interesting, but one can assume that FSR4 would not run well on the previous generation. The TOPS value of the 9000 series is several times that of the 7000 series. Here is a significant improvement.
 
Interesting, but one can assume that FSR4 would not run well on the previous generation. The TOPS value of the 9000 series is several times that of the 7000 series. Here is a significant improvement.
IIRC AMD said they'll investigate if it can be brought to at least part of RX 7000 gen, but it will also be twice as heavy if they do (since they need to do it in FP16)
 
IIRC AMD said they'll investigate if it can be brought to at least part of RX 7000 gen, but it will also be twice as heavy if they do (since they need to do it in FP16)
If they can't get it running on the RDNA3/3.5 APUs, then there isn't much of a business case for that. Those APUs will continue to be manufactured and sold for years to come - unlike RX 7000 series, which has now ended and has a minuscule market share.
 
All "4" TSMC processes are in fact derived from N5.

This seems like a semantics argument.

TSMC N4 is (at least claimed by TSMC) to be an iterative node enhancement with density, efficiency and performance gains.

TSMC 4N despite the naming from all reporting is just a customization of TSMC N5.
 
Seems like they've adopted Apple's "Dynamic Caching" idea from the M3. Registers are dynamically allocated at runtime instead of the worst case. Apple's solution also dynamically allocations threadgroup memory and stack memory.
According to a former Apple graphics enginer, a closer description to their "dynamic caching" technology would be dynamic 'deallocation' which allows them to release unified/flexible on-chip memory during runtime depending on whichever path of execution or branch is taking place within a shader. This does not help them increase occupancy since their hardware is unable to issue more waves opportunistically ...

Based on AMD's slides about their dynamic register allocation technology, their hardware can have variable occupancy during during mid-shader execution but there's no mention or hints of a 'unified/flexible' on-chip memory pool space where we can do variable allocation between each type of memory (register/tile/buffer/stack) like as with Apple's dynamic caching ...
 
Back
Top