AMD: Navi Speculation, Rumours and Discussion [2019-2020]

Status
Not open for further replies.
RDNA2 doesn't have tensor cores but shaders improved to support 4 and 8 bit integers, Series X quotes 97TOPS (4 Int) - useful for DirectML (DLSS-like) 9:27

A CDNA slide had that - "Accelerate ML/HPC with Compute/Tensor OPS" - but I don't think it's been confirmed for RDNA2? Slide at the top of the page. Of course that means using the shaders themselves to upscale removing them from rendering other things, instead of dedicated hardware to do it and allowing all shaders to do other things.

https://www.anandtech.com/show/1559...a-dedicated-gpu-architecture-for-data-centers

The GPU also supports mesh shading! 13:04 in the video. VRS, mesh shaders, RTRT, DirectML, exciting stuff next gen

 
Last edited:
RDNA2 doesn't have tensor cores but shaders improved to support 4 and 8 bit integers, Series X quotes 97TOPS (4 Int) - useful for DirectML (DLSS-like) 9:27
This was supported by Pascal since 2016 btw. I dunno but I don't feel that this feature will get a lot of usage in gaming, at least in graphics.

The GPU also supports mesh shading
That's another great new Turing feature getting traction. The only one left is sampler feedback, hopefully it will be supported too.
 
RDNA2 doesn't have tensor cores but shaders improved to support 4 and 8 bit integers, Series X quotes 97TOPS (4 Int) - useful for DirectML (DLSS-like) 9:27

A CDNA slide had that - "Accelerate ML/HPC with Compute/Tensor OPS" - but I don't think it's been confirmed for RDNA2? Slide at the top of the page. Of course that means using the shaders themselves to upscale removing them from rendering other things, instead of dedicated hardware to do it and allowing all shaders to do other things.
Don't know about tensors, but int4/8 should be supported on specific model(s) of RDNA2, just like there's RDNA1 w/ DLops (unreleased at this time)
 
How? It is multiple times higher than previous consoles.
The PS4 had a 250W PSU, the PS4 Pro 310W. OneX was 245W.

The OneX max draw was around 175W, PS4Pro around 155W, meaning 71 and 50% of rated capacity, so my 80% probably is way off. At a similar 71% PSU load, that's 217W, which again seems excellent given the performance delivered here.
 
The PS4 had a 250W PSU, the PS4 Pro 310W. OneX was 245W.

The OneX max draw was around 175W, PS4Pro around 155W, meaning 71 and 50% of rated capacity, so my 80% probably is way off. At a similar 71% PSU load, that's 217W, which again seems excellent given the performance delivered here.
Some XBX models are topping 200W (Hovis method lottery), Latest model of Pro consumes up to 170W.
 
Really disappointed that MS didn't go the whole hog, 420mm2 and 69CUs would have really spiced things up. :D

Desktop frequencies would be amazing if this thing runs at 1825MHz locked in that power envelope.
 
Ok, weird assed super custom design, hats off to you Kaotik. Asymmetrical memory design and the weird CU split and shit are straight up bizarre. But MS really was quite angry over that initial 1080p PS4/1600p Xbox One thing, I'm pretty sure they've never gotten over that. So shelling out a couple hundred million extra to try and beat that this time does make some sense. Still, we can glean some info on RDNA2.

Lower clocks, by several percent, for compute makes sense. This is a baseline one bin chip, so no good verse bad. But combined with the bandwidth being about a 20% increase put's it right in line with the 5700xt's bandwidth to compute ratio. So I think we can assume RDNA2 doesn't offer any revolutions in the way of bandwidth compression.

We're also looking at 8int and 4int instructions for RDNA2. Hardly surprising with ML Vega having them. But that does mean Nvidia's tensor cores should get a workout in some titles beyond DLSS in due time.

And of course Mesh Shaders! Whee, let's hope the PS5 is full RDNA2 and not some RDNA1 custom thingy that doesn't include such, without it as a baseline it's not nearly as good. But easy fine grained culling without the hair pulling you have to go through now, and far less CPU sync, is a straight up win.

And in GPU thread spawning is also coooool. Again, moving work off the CPU and off programming as well.

The raytracing section offers... nothing. That's all I got out of it for actual RDNA2 arch info.

Edit- Wait, I'm dumb! The 50% increase in performance per watt seems very likely. Lots of performance for low power there (assuming here), we can assume the low clocks and quite low CPU clocks help with that, but still it's a good assumption. We can also see little improvement in performance per mm in terms of good ol "flops" as the CU/clockspeed to flops is right in line with RDNA1.
 
Last edited:
Ok, weird assed super custom design, hats off to you Kaotik. Asymmetrical memory design and the weird CU split and shit are straight up bizarre. But MS really was quite angry over that initial 1080p PS4/1600p Xbox One thing, I'm pretty sure they've never gotten over that. So shelling out a couple hundred million extra to try and beat that this time does make some sense. Still, we can glean some info on RDNA2.
The asymmetric memory design is unusual for products, but something the individual controllers and their memory addressing functions can handle readily. What do you mean by weird CU split?
Does this look like the XBox Series X has two SEs with two shader arrays each? This might have a similar inactivation pattern to the RX 5700, which inactivates one WGP per SE, leading to a slightly asymmetric pair of shader arrays per SE.

We're also looking at 8int and 4int instructions for RDNA2. Hardly surprising with ML Vega having them. But that does mean Nvidia's tensor cores should get a workout in some titles beyond DLSS in due time.
There are GFX10 ISA revisions that bring them in as well, so this seems to be making its way into recent Navi implementations as well.

And of course Mesh Shaders!
I thought I saw somewhere that there was some kind of emulation with compute and dispatch for task shaders, but now I can't seem to find the reference.
 
I thought I saw somewhere that there was some kind of emulation with compute and dispatch for task shaders, but now I can't seem to find the reference.
It was mentioned here:
The new consoles don't natively support 'task' shaders. Task shaders can be emulated with indirect dispatch and a compute shader. A possible reason for omission in hardware support is that console hardware designers aren't convinced if there's any significant gains to be had in performance with a native implementation.
https://forum.beyond3d.com/posts/2109742/
 
TASCrGo.jpg


The CCX dimensions suggest a hefty cut down of the L3 cache size, compared to the chiplet version of Zen2.
 
TASCrGo.jpg


The CCX dimensions suggest a hefty cut down of the L3 cache size, compared to the chiplet version of Zen2.
But not necessarily compared to APU-version of Zen2, they only have 4MB per CCX, and performance seems to be fine at least on AMDs own tests
 
I wouldn't draw any conclusion based on the (photoshoped?) render. The colored part doesn't even fit the silicon surface - there are left margins (left and right for this orienation). The colored parts seems to be made up, so even their dimensions could be off.
 
Status
Not open for further replies.
Back
Top