AMD: Navi Speculation, Rumours and Discussion [2019-2020]

CarstenS · Mar 16, 2020

So, with RDNA2, apparently you can cram 52 CUs (worth 12 TFLOPS) @1,825 GHz into a 360.45 mm² die and have space for an 8-core custom Zen2-CPU?

https://news.xbox.com/en-us/2020/03/16/xbox-series-x-tech/

AlNom · Mar 16, 2020

CarstenS said:
So, with RDNA2, apparently you can cram 52 CUs (worth 12 TFLOPS) @1,825 GHz into a 360.45 mm² die and have space for an 8-core custom Zen2-CPU?
https://news.xbox.com/en-us/2020/03/16/xbox-series-x-tech/

At least going by the Fritz die shots, a WGP is only about 4.05mm^2, so adding 8 would just be about 32.5mm^2. Add 76mm^2 if it’s 32MB zen 2 8 core, then additional glue and RT plus 251mm^2 Navi 10

Kaotik · Mar 16, 2020

CarstenS said:
So, with RDNA2, apparently you can cram 52 CUs (worth 12 TFLOPS) @1,825 GHz into a 360.45 mm² die and have space for an 8-core custom Zen2-CPU?
https://news.xbox.com/en-us/2020/03/16/xbox-series-x-tech/

56, there's 4 disabled

Nebuchadnezzar · Mar 16, 2020

300W power supply. Let's say operating at designed 80% load, that's 240W for the whole thing, which is pretty damn good.

Newguy · Mar 16, 2020

RDNA2 doesn't have tensor cores but shaders improved to support 4 and 8 bit integers, Series X quotes 97TOPS (4 Int) - useful for DirectML (DLSS-like) 9:27

A CDNA slide had that - "Accelerate ML/HPC with Compute/Tensor OPS" - but I don't think it's been confirmed for RDNA2? Slide at the top of the page. Of course that means using the shaders themselves to upscale removing them from rendering other things, instead of dedicated hardware to do it and allowing all shaders to do other things.

https://www.anandtech.com/show/1559...a-dedicated-gpu-architecture-for-data-centers

The GPU also supports mesh shading! 13:04 in the video. VRS, mesh shaders, RTRT, DirectML, exciting stuff next gen

Pressure · Mar 16, 2020

Nebuchadnezzar said:
300W power supply. Let's say operating at designed 80% load, that's 240W for the whole thing, which is pretty damn good.

Remember PSU efficiency tops out at around 96% at 50% load on 230V, less on 110V.

Putas · Mar 16, 2020

Nebuchadnezzar said:
300W power supply. Let's say operating at designed 80% load, that's 240W for the whole thing, which is pretty damn good.

How? It is multiple times higher than previous consoles.

DegustatoR · Mar 16, 2020

Newguy said:
RDNA2 doesn't have tensor cores but shaders improved to support 4 and 8 bit integers, Series X quotes 97TOPS (4 Int) - useful for DirectML (DLSS-like) 9:27

This was supported by Pascal since 2016 btw. I dunno but I don't feel that this feature will get a lot of usage in gaming, at least in graphics.

Newguy said:
The GPU also supports mesh shading

That's another great new Turing feature getting traction. The only one left is sampler feedback, hopefully it will be supported too.

Kaotik · Mar 16, 2020

Newguy said:
RDNA2 doesn't have tensor cores but shaders improved to support 4 and 8 bit integers, Series X quotes 97TOPS (4 Int) - useful for DirectML (DLSS-like) 9:27

A CDNA slide had that - "Accelerate ML/HPC with Compute/Tensor OPS" - but I don't think it's been confirmed for RDNA2? Slide at the top of the page. Of course that means using the shaders themselves to upscale removing them from rendering other things, instead of dedicated hardware to do it and allowing all shaders to do other things.

Don't know about tensors, but int4/8 should be supported on specific model(s) of RDNA2, just like there's RDNA1 w/ DLops (unreleased at this time)

Nebuchadnezzar · Mar 16, 2020

Putas said:
How? It is multiple times higher than previous consoles.

The PS4 had a 250W PSU, the PS4 Pro 310W. OneX was 245W.

The OneX max draw was around 175W, PS4Pro around 155W, meaning 71 and 50% of rated capacity, so my 80% probably is way off. At a similar 71% PSU load, that's 217W, which again seems excellent given the performance delivered here.

Globalisateur · Mar 16, 2020

Nebuchadnezzar said:
The PS4 had a 250W PSU, the PS4 Pro 310W. OneX was 245W.

The OneX max draw was around 175W, PS4Pro around 155W, meaning 71 and 50% of rated capacity, so my 80% probably is way off. At a similar 71% PSU load, that's 217W, which again seems excellent given the performance delivered here.

Some XBX models are topping 200W (Hovis method lottery), Latest model of Pro consumes up to 170W.

gamervivek · Mar 16, 2020

Really disappointed that MS didn't go the whole hog, 420mm2 and 69CUs would have really spiced things up.

Desktop frequencies would be amazing if this thing runs at 1825MHz locked in that power envelope.

Frenetic Pony · Mar 16, 2020

Ok, weird assed super custom design, hats off to you Kaotik. Asymmetrical memory design and the weird CU split and shit are straight up bizarre. But MS really was quite angry over that initial 1080p PS4/1600p Xbox One thing, I'm pretty sure they've never gotten over that. So shelling out a couple hundred million extra to try and beat that this time does make some sense. Still, we can glean some info on RDNA2.

Lower clocks, by several percent, for compute makes sense. This is a baseline one bin chip, so no good verse bad. But combined with the bandwidth being about a 20% increase put's it right in line with the 5700xt's bandwidth to compute ratio. So I think we can assume RDNA2 doesn't offer any revolutions in the way of bandwidth compression.

We're also looking at 8int and 4int instructions for RDNA2. Hardly surprising with ML Vega having them. But that does mean Nvidia's tensor cores should get a workout in some titles beyond DLSS in due time.

And of course Mesh Shaders! Whee, let's hope the PS5 is full RDNA2 and not some RDNA1 custom thingy that doesn't include such, without it as a baseline it's not nearly as good. But easy fine grained culling without the hair pulling you have to go through now, and far less CPU sync, is a straight up win.

And in GPU thread spawning is also coooool. Again, moving work off the CPU and off programming as well.

The raytracing section offers... nothing. That's all I got out of it for actual RDNA2 arch info.

Edit- Wait, I'm dumb! The 50% increase in performance per watt seems very likely. Lots of performance for low power there (assuming here), we can assume the low clocks and quite low CPU clocks help with that, but still it's a good assumption. We can also see little improvement in performance per mm in terms of good ol "flops" as the CU/clockspeed to flops is right in line with RDNA1.

3dilettante · Mar 16, 2020

Frenetic Pony said:
Ok, weird assed super custom design, hats off to you Kaotik. Asymmetrical memory design and the weird CU split and shit are straight up bizarre. But MS really was quite angry over that initial 1080p PS4/1600p Xbox One thing, I'm pretty sure they've never gotten over that. So shelling out a couple hundred million extra to try and beat that this time does make some sense. Still, we can glean some info on RDNA2.

The asymmetric memory design is unusual for products, but something the individual controllers and their memory addressing functions can handle readily. What do you mean by weird CU split?
Does this look like the XBox Series X has two SEs with two shader arrays each? This might have a similar inactivation pattern to the RX 5700, which inactivates one WGP per SE, leading to a slightly asymmetric pair of shader arrays per SE.

We're also looking at 8int and 4int instructions for RDNA2. Hardly surprising with ML Vega having them. But that does mean Nvidia's tensor cores should get a workout in some titles beyond DLSS in due time.

There are GFX10 ISA revisions that bring them in as well, so this seems to be making its way into recent Navi implementations as well.

And of course Mesh Shaders!

I thought I saw somewhere that there was some kind of emulation with compute and dispatch for task shaders, but now I can't seem to find the reference.

DavidGraham · Mar 16, 2020

3dilettante said:
I thought I saw somewhere that there was some kind of emulation with compute and dispatch for task shaders, but now I can't seem to find the reference.

It was mentioned here:

Lurkmass said:
The new consoles don't natively support 'task' shaders. Task shaders can be emulated with indirect dispatch and a compute shader. A possible reason for omission in hardware support is that console hardware designers aren't convinced if there's any significant gains to be had in performance with a native implementation.

https://forum.beyond3d.com/posts/2109742/

fellix · Mar 17, 2020

The CCX dimensions suggest a hefty cut down of the L3 cache size, compared to the chiplet version of Zen2.

Kaotik · Mar 17, 2020

fellix said:
The CCX dimensions suggest a hefty cut down of the L3 cache size, compared to the chiplet version of Zen2.

But not necessarily compared to APU-version of Zen2, they only have 4MB per CCX, and performance seems to be fine at least on AMDs own tests

no-X · Mar 17, 2020

I wouldn't draw any conclusion based on the (photoshoped?) render. The colored part doesn't even fit the silicon surface - there are left margins (left and right for this orienation). The colored parts seems to be made up, so even their dimensions could be off.

Alexko · Mar 17, 2020

CarstenS said:
So, with RDNA2, apparently you can cram 52 CUs (worth 12 TFLOPS) @1,825 GHz into a 360.45 mm² die and have space for an 8-core custom Zen2-CPU?
https://news.xbox.com/en-us/2020/03/16/xbox-series-x-tech/

Makes you wonder about next-gen APUs, doesn't it?

Bondrewd · Mar 17, 2020

Alexko said:
Makes you wonder about next-gen APUs, doesn't it?

This gen too.

AMD: Navi Speculation, Rumours and Discussion [2019-2020]

CarstenS

Moderator

AlNom

Moderator

Kaotik

Drunk Member

Nebuchadnezzar

Newguy

Pressure

Putas

DegustatoR

Kaotik

Drunk Member

Nebuchadnezzar

Globalisateur

Globby

gamervivek

Frenetic Pony

3dilettante

DavidGraham

fellix

Kaotik

Drunk Member

no-X

Alexko

Bondrewd