AMD: Navi Speculation, Rumours and Discussion [2019-2020]

Status
Not open for further replies.
Ok, it’s probably true that the shader doesn’t need to inspect the contents of the node in order to schedule it. But that doesn’t seem to be a notable benefit of shader based scheduling given it’s also the case for Nvidia’s fixed function approach.
The original comparison was with hypothetical RT block that only gave intersection results while not performing traversal, which would leave the SIMD in a position where determining the next node addresses would require explicit vector memory reads to data that would have been fetched and parsed by the RT unit already. AMD's method is at least less redundant than that.

AMD’s patent calls for storing traversal state in registers and the texture cache. It would seem the shader is responsible for managing the traversal stack for each ray and that stack presumably lives in L0. I don’t see how you would avoid thrashing the cache if you try to do anything else alongside RT. Unless of course you have an “infinite” amount of cache :eek:
AMD's patent doesn't clearly outline where the process resides for the intermediate work between node evaluations. It highlights that the SIMD and CU have substantial storage available at no additional cost versus the likely hardware footprint of implementing sufficient storage on an independent unit.
AMD's claims are between their hybrid method and a dedicated unit implementing a unit that might be able to traverse a BVH to arbitrary depths without redoing traversal due to losing the full context of what had been traversed already.
Nvidia's scheme appears to have a traversal stack of finite depth that can lead to redundant node traversal, which makes it less expensive than what AMD was using as its baseline.

Whether AMD's method leverages registers, LDS, or possibly spills to memory isn't spelled out. Even if there were spills to memory, writing out data based on pointers and metadata from completed RT node evaluations to something like a stack seems like it could be less disruptive than the SIMD re-gathering node data on its own.


curious, aren't all ROPS typically tied to caches past and current gen?
IIRC the difference with RDNA is that compute is now tied in with the L2 cache, whereas with GCN it went directly to the memory controller. But I think ROPS are unchanged.
ROPs were linked to memory channels until Vega, which made them a client to the L2.
RDNA makes them a client for at least some of their traffic of the new intermediate L1.
GCN had a read-write L2, and compute's use or non-use of the cache depends more on what settings were used for the memory accesses. The choice would be based on the level of coherence needed for the data.

ergo this older post by sebbbi:
https://forum.beyond3d.com/posts/1934106/


with respect to RDNA

it does look like they changed how they accessed data however for the RBs.
Render back ends have had relatively small per-RBE caches throughout the generations. There's evidence that the RBEs still have caches with RDNA, though I haven't seen specific capacities given.
 
You may be right that it’s more balanced. In terms of absolute performance though it’ll be really interesting to see where the chips fall.

And not being power limited, it might also punch a bit above its weight as it would be able to clock very high. If so, the gap between the hypothetical 64CU salvage part and the top 40CU part may not be as high as the difference in raw specifications. Would also explain why there is no 60CU die.

Meaningless comparison. How much does 3080 score?
Just above 2080 Ti performance for $500. Wouldn't it be something if the 6800 xt is also $500.

Would definitely shake up the upper mid-range market. The market has been waiting for this kind of competition ever since the GTX 1070.
 
3080 is 10,667 on Fire Strike Ultra per Guru3D.

6800XT is clearly gunning for that perf tier (at least in terms of rasterization). The cache based design (if true) is going to cause some interesting variability in relative perf game-to-game performance. Given that the hardware can get there, it stands to reason that AMD would tune the product to be on average at par with (or even slightly better than) the 3080 — again in terms of traditional rasterization. DXR and ML are completely unknown.

The big question is how AMD decides to price it. From the customer perspective 16GB gives it an edge, but they are going up against NVIDIA brand recognition, DXR, DLSS. I think $599 would make this an interesting proposition and $499 would make it a really excellent value proposition.

I wonder what the equation looks like on the cost/supply/margin side for AMD. They’ve got 16GB and TSMC7 working against them here but perhaps they don’t need as sophisticated a cooler as the 3080FE. Zen is bringing in the big bucks so they don’t need a ton of margin on Radeons, but I doubt they want to sell these close to cost, not when they are competing for Zen wafers.
 
If the benchmarks samples they gave are a 6800XT with 72 CU, then it looks like it'll be close to 3080 in 4k, but has a lot of potential to beat it handily at 1080p and 1440p. Realtime raytracing will be the most interesting. I'm curious how competitive it'll be at native resolutions.
 
Well, tweets have been removed, but I had a backup :)

It seems like some AIB OC model (beefed up VRM, higher PCB etc.).

Photo of reference card for reference (weird sentence:confused:)
75367_02_if-these-new15kxy.jpg


... and some recent tweet
 
Why is there kapton tape where the GDDR6 memory modules should sit?
I mean, why would you populate a complex pcb when you can not use it afterwards? (I assume, that one would bake the GDDR6 first to the circuit board - or, is this a false assumption?)
 
One has to wonder if it's 12+4 setup with 384 bit bus. Though that would likely not go well. Some 970 models that tried slower memory pool got a ton of shit when things became slow.
 
Status
Not open for further replies.
Back
Top