Recent content by andermans

A
AMD: RDNA 3 Speculation, Rumours and Discussion

I'm aware it was labeled for premium FF, with stuff like NN acceleration specifically for these usecases. However AFAICT Rembrandt is AFAICT pretty much a superset of VanGogh with the same features (edit: besides the changed CPU setup obviously). So I'm not sure what would make these chips...
- andermans
- Post #428
- Jun 3, 2021
- Forum: Architecture and Products
A
AMD: RDNA 3 Speculation, Rumours and Discussion

I think price isn't just die size. AMD seems to mostly price their older gens significantly discounted but still supported. Think Lucienne and Barcelo. Just my speculation here but I wouldn't be surprised if VanGogh could be significantly cheaper here.
- andermans
- Post #426
- Jun 3, 2021
- Forum: Architecture and Products
A
AMD: RDNA 3 Speculation, Rumours and Discussion

I don't think it will eat its breakfast. Rembrandt has no Infinity Cache: Furthermore purely going from 8->12 CUs will not matter that much at low TDP (maybe 10-20% diff at 15W?), it will probably really matter at higher TDP where the extra +50% CUs results in a significantly higher...
- andermans
- Post #422
- Jun 3, 2021
- Forum: Architecture and Products
A
AMD: RDNA 3 Speculation, Rumours and Discussion

I wonder why people seem to couple a DLSS answer to a GPU generation again and again. RDNA2 has the same machine learning instructions as RDNA1 (including 8/4-bit variants) and unless RDNA3 goes full tensor-cores/mAI or dedicated upscaling HW there shouldn't be a significant HW barrier to...
- andermans
- Post #84
- Jan 24, 2021
- Forum: Architecture and Products
A
RDNA 2 Ray Tracing

A problem is that the nodes have to be 64-byte aligned, so if you add data after the fp16-box node you essentially get a 128 byte structure. Might as well use the fp32-box node then. (assuming no performance difference in processing of fp16/fp32 in the RT hardware). Of course you could put them...
- andermans
- Post #50
- Jan 8, 2021
- Forum: Architecture and Products
A
AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

I think that is the input to the BVH building process though? The triangle nodes definitely contains the raw position data after the BVH is build. The factor 2 was only meant as a minimal number of children to get an upper bound, but I'd indeed hope AMD on average gets closer to 4 children...
- andermans
- Post #2,084
- Dec 31, 2020
- Forum: Architecture and Products
A
AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

why exponential? Pretty much the majority of the BVh is just going to be the raw triangle data which would be a lower bound anyway (40 bytes for 9 floats + the triangle id). For a packing with N triangles (and assuming each box node has at least 2 children) you need N triangle nodes + N/2 box...
- andermans
- Post #2,078
- Dec 29, 2020
- Forum: Architecture and Products
A
AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

Why do you think one triangle per leaf node is mistaken? I can think of some disadvantages but nothing particularly huge as far as I can tell.
- andermans
- Post #2,074
- Dec 28, 2020
- Forum: Architecture and Products
A
AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

FWIW primitive shaders versus legacy pipelines is a driver decision, they work perfectly fine on Navi 10. With the smaller GPU you were often just not geometry limited enough to get any benefit out of it. However on Linux the drivers like to use primitive shaders but just didn't enable the...
- andermans
- Post #1,565
- Dec 2, 2020
- Forum: Architecture and Products
A
AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

Primitive shaders have been working fine in RDNA1 as well, it is just not as beneficial to do culling in primitive shaders because the overall expected rasterization performance (and hence the triangle throughput you need to achieve) on the RDNA1 GPUs is lower. Remember culling in primitive...
- andermans
- Post #1,102
- Nov 20, 2020
- Forum: Architecture and Products
A
AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

I think inline raytracing is just fine when you only have a raydepth of 1 or if you don't branch out and there is good convergence of the rays. Once you do more bounces and branch out the non-inline path is likely better as it allows rebalancing shader work to achieve better convergence.
- andermans
- Post #617
- Nov 8, 2020
- Forum: Architecture and Products
A
AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

No, the caching stuff was available before this already since forever (pre-GCN at least). The SAM change is that the CPU can access 100% of the GPU memory directly by resizing the BAR. As an example that can already be enabled on many X399 (threadripper) boards, though not under the Smart...
- andermans
- Post #375
- Oct 30, 2020
- Forum: Architecture and Products
A
AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

Pinned memory is actually totally unrelated to Smart Access Memory. Pinned memory is system memory that is accessible from the GPU but not allocated by the driver. That is useful if there is something else allocating the memory that is not easily changed to let the driver allocate the memory...
- andermans
- Post #354
- Oct 30, 2020
- Forum: Architecture and Products
A
AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

The 256 MiB BAR has been exposed for a while already in Vulkan : http://vulkan.gpuinfo.org/displayreport.php?id=9781#memorytypes
- andermans
- Post #349
- Oct 29, 2020
- Forum: Architecture and Products
A
AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

So AFAIU directstorage promises less overhead and RTX-IO promives decompression on the GPU (resulting in less processing on the CPU) but I've seen neither really talk about P2P DMA explicitly? (though some RTX-IO diagram certainly make it look like it is skipping the CPU completely). I'm kind...
- andermans
- Post #338
- Oct 29, 2020
- Forum: Architecture and Products