AMD: Navi Speculation, Rumours and Discussion [2019-2020]

DDH · Oct 20, 2020

BRiT said:
I think its SSGI (Screen Space Global Illumination) and not RealTimeRayTracing...

He says something like "software based real time ray traced global illumination" around the 10:10 mark

Scott_Arm · Oct 20, 2020

DDH said:
We only have actual consumption figures for the Xbox series X. Let's also not forget the ps5 has a 256bit bus, 16gb gddr6. So doubling the CUs alone isn't going to linearly increase the power consumption

I'm not doubling the power consumption of the memory. You're right about memory channels, but I'm also not claiming a linear doubling is exact. it's just an estimate. We don't even know how much power the gpu in ps5 can draw. My guess is it's over 100W for just the gpu portion of the apu, not including external memory chips or the PS5 pcb, or any other components.

Scott_Arm · Oct 20, 2020

NightAntilli said:
XSX power consumption is already know... Despite it having a 310W PSU, its power usage is about 210W while running Gears 5. That is for everything, meaning RAM, CPU/GPU, SSD and so on. Other games use a lot less power.

If you shove off 50W for all the additional components (as an educated guess), you're left with ~160W for a 52CU RDNA2 GPU at 1.8GHz.

So add in 20 CUs (38% increase) and 300 MHz (17% increase), then add that 40-50W for the PCB and you have a 72CU 2.1GHz RDNA2 gpu (mind you with a bigger memory bus). Just naively you're going to be well over 260W-270W because of the clock increase.

Rootax · Oct 20, 2020

DegustatoR said:
It's possible that they haven't expected the doubling of FP32 units in Ampere.

Even then, nVidia would'nt have release a new génération slower (or as fast as) than the old one (even if it was the top of the line)

trinibwoy · Oct 20, 2020

DegustatoR said:
It's possible that they haven't expected the doubling of FP32 units in Ampere.

Even if that’s true they certainly would’ve aimed much higher than TU102’s 4608 units for any chance at the crown.

DegustatoR · Oct 20, 2020

Rootax said:
Even then, nVidia would'nt have release a new génération slower (or as fast as) than the old one (even if it was the top of the line)

It's not that they would, it's that you'd get a lot less of a performance increase by making just a wider Turing, even with higher clocks. So it is possible that AMD expected that instead of what Ampere turned out to be. Still I wouldn't count on this. They both tend to know quite a lot about each other well in advance of each generation release.

Rootax · Oct 20, 2020

DegustatoR said:
It's not that they would, it's that you'd get a lot less of a performance increase by making just a wider Turing, even with higher clocks. So it is possible that AMD expected that instead of what Ampere turned out to be. Still I wouldn't count on this. They both tend to know quite a lot about each other well in advance of each generation release.

Yeah but don't forget the node jump too, even if it's not 7nm tsmc. I don't know, seems like a real dumb thing if true...

Erinyes · Oct 20, 2020

ToTTenTranz said:
What is the original performance/power point?

I just don't see why the clocks that are coming in the cards aren't the originally intended.
We do have a PS5 clocked at 2.23GHz on what seems to be a ~150-200W power budget for the APU.

BRiT said:
Its a 350 watt psu for PS5 and 340 watt PSU for PS5 All Digital. I think it has more than 200 watts power budget for the APU.

But the XB1X consumed ~170W at the wall in gears, and XSX consumes ~210W. Unless you're suggesting the CPUs consume next to zero power, the APU budget shouldn't even be close to 200W.

A1xLLcqAgt0qc2RyMz0y said:
They were originally targeting the RTX 2080 Ti. With the release of the RTX 3080 and the performance and the higher power that it has AMD probably needed to so the same to compete. Thus the higher clocks and power.

And I'm sure Lisa Su told you that personally?

NightAntilli said:
XSX power consumption is already know... Despite it having a 310W PSU, its power usage is about 210W while running Gears 5. That is for everything, meaning RAM, CPU/GPU, SSD and so on. Other games use a lot less power.

If you shove off 50W for all the additional components (as an educated guess), you're left with ~160W for a 52CU RDNA2 GPU at 1.8GHz.

You mean you're actually taking data from consoles using an almost identical architecture, on an almost identical process, likely on a slightly worse silicon bin, and suggesting we extrapolate that to PC GPUs?? You can't do that! Because....reasons.

BRiT said:
Thats still without portions of the GPU being utilized, like the RTRT hardware. Its entirely unknown how that may have an impact.

Well since the RDNA2 HW seemingly cannot do RT concurrently, I'd say it shouldn't have much of an impact at all, if any.

chris1515 · Oct 20, 2020

Erinyes said:
But the XB1X consumed ~170W at the wall in gears, and XSX consumes ~210W. Unless you're suggesting the CPUs consume next to zero power, the APU budget shouldn't even be close to 200W.

And I'm sure Lisa Su told you that personally?

You mean you're actually extrapolating data from an almost identical architecture, on an almost identical process, and likely on a slightly worse silicon bin, and suggesting we compare that to PC GPUs?? You can't do that! Because....reasons.

Well since the RDNA2 HW seemingly cannot do RT concurrently, I'd say it shouldn't have much of an impact at all, if any.

RDNA2 GPU can do RT concurrently. They can do ray/intersection and shading in parrallel.

Leoneazzurro5 · Oct 20, 2020

RDNA2 cannot do texturing concurrently AFAIK. It depends how much power RT units are taking. Which is not known.

SimBy · Oct 20, 2020

A1xLLcqAgt0qc2RyMz0y said:
They were originally targeting the RTX 2080 Ti. With the release of the RTX 3080 and the performance and the higher power that it has AMD probably needed to so the same to compete. Thus the higher clocks and power.

If you paid attention to the hints coming out of AMD (Wang) and Cerny (PS5) you would know that they designed RDNA2 as a 'multi gigahertz clock frequency' architecture. Increasing the clock speeds was literally one of design goals. This whole narrative that Sony and now AMD pushed clock speeds up the last second is honestly insulting.

DegustatoR · Oct 20, 2020

2.0 and 2.4 are both "multi gigahertz clock frequency" but the resulting power consumption of a chip on these can be drastically different.

CarstenS · Oct 20, 2020

Jawed said:
In the videos I linked, Furmark framerates varied massively. Any ideas why?

I don't know really. Maybe they really just hit their power limits or some current limit at those points in time? I am no Furmark expert by any means.

3dilettante · Oct 20, 2020

Leoneazzurro5 said:
RDNA2 cannot do texturing concurrently AFAIK. It depends how much power RT units are taking. Which is not known.

As a texture-type instruction, we know that a BVH instruction cannot issue in parallel with a texture/vmem operation, but is that the same as them not working concurrently? Outside of that initial few cycles, it could be hundreds of cycles before the buses used by the BVH instruction are needed by it. Are we sure a wavefront cannot issue a memory operation or maybe another BVH? Texturing and memory ops can be issued freely until a waitcnt instruction is encountered and not enough have resolved.

SimBy · Oct 20, 2020

DegustatoR said:
2.0 and 2.4 are both "multi gigahertz clock frequency" but the resulting power consumption of a chip on these can be drastically different.

That's not what I was arguing with. I said they didn't bump the clocks last second to compete with the 'mighty' Ampere. Clocks are this high by design and have absolutely nothing to do with Ampere. Same goes for PS5 and XSX comparisons.

Leoneazzurro5 · Oct 20, 2020

3dilettante said:
As a texture-type instruction, we know that a BVH instruction cannot issue in parallel with a texture/vmem operation, but is that the same as them not working concurrently? Outside of that initial few cycles, it could be hundreds of cycles before the buses used by the BVH instruction are needed by it. Are we sure a wavefront cannot issue a memory operation or maybe another BVH? Texturing and memory ops can be issued freely until a waitcnt instruction is encountered and not enough have resolved.

Possibly that is the case. But, in any case no one at the moment knows how much power these RT unit draw and their average utilization so declaring that they will dramatically increase power usage is as wrong as saying they will not affect power usage at all.

Erinyes · Oct 20, 2020

DegustatoR said:
2.0 and 2.4 are both "multi gigahertz clock frequency" but the resulting power consumption of a chip on these can be drastically different.

I think we all know that. However claiming that AMD did not intend to clock x.xx ghz (rumoured) speed and only did so as a reaction to Nvidia, is speculation at best.

3dilettante said:
As a texture-type instruction, we know that a BVH instruction cannot issue in parallel with a texture/vmem operation, but is that the same as them not working concurrently? Outside of that initial few cycles, it could be hundreds of cycles before the buses used by the BVH instruction are needed by it. Are we sure a wavefront cannot issue a memory operation or maybe another BVH? Texturing and memory ops can be issued freely until a waitcnt instruction is encountered and not enough have resolved.

To clarify, I wasn't trying to suggest that some RT cannot be done concurrently at all, and perhaps could have worded it better. I was more trying to point towards it probably not having a significant impact on power consumption.

DegustatoR · Oct 20, 2020

Erinyes said:
I think we all know that. However claiming that AMD did not intend to clock x.xx ghz (rumoured) speed and only did so as a reaction to AMD, is speculation at best.

You doubt that AMD will adjust their clocks according to what NV has launched already?
They can of course lower them just as well as increasing them but both possibilities are essentially a given at this point. They would be stupid not to.

Erinyes · Oct 20, 2020

DegustatoR said:
You doubt that AMD will adjust their clocks according to what NV has launched already?
They can of course lower them just as well as increasing them but both possibilities are essentially a given at this point. They would be stupid not to.

Of course they probably would adjust the clocks based on what they've seen from NV, but by how much? You can't magically change silicon, PCB and cooler design overnight to dissipate 50-100W more. Testing, validation, production and distribution take a lot longer than the approx 4-6 weeks since Ampere has been out.

pTmdfx · Oct 20, 2020

3dilettante said:
As a texture-type instruction, we know that a BVH instruction cannot issue in parallel with a texture/vmem operation, but is that the same as them not working concurrently? Outside of that initial few cycles, it could be hundreds of cycles before the buses used by the BVH instruction are needed by it. Are we sure a wavefront cannot issue a memory operation or maybe another BVH? Texturing and memory ops can be issued freely until a waitcnt instruction is encountered and not enough have resolved.

Even normal load-stores are quite liberal — operations can be freely reordered, and only RF writeback is in program order. The texture load-store path has been supporting varying latency and a huge swarm of capabilities since GCN anyway, say for example, address coaleasing or the lack thereof can cause a load instruction to take a varying number of cycles to complete, even though multiple load instructions can be issued back-to-back. RDNA enhanced it further by adding a low-latency path bypassing the samplers, and RDNA 2 BVH intersection seems to be merely a (new) cherry on the "filtering/pre-processing" pie.

---

Regarding the on-going thread about the power usage though, I don't see anything contentious... As the patent describes, the intersection engine is basically an alternative path to texture filtering, operating on packed BVH node data. Ray-box and ray-tri testing seem quite straightforward logic, so likely no "power drainage" to be expected... At worst the CU can issue bunch of intersections, issue a vmcnt wait and eventually clockgate the ALU datapaths, if no other kind of kernels is running in parallel.

Nvidia persumably does the whole traversal process in the fixed function hardware, so one might argue that they could have an edge in power usage in potentially keeping the CU/SM off. But it is uncertain whether it matters with the prevalent use of async compute to fill gaps, and whether the actual saving does make a dent in the grand power consumption.

AMD: Navi Speculation, Rumours and Discussion [2019-2020]

DDH

Scott_Arm

Scott_Arm

Rootax

trinibwoy

Meh

DegustatoR

Rootax

Erinyes

chris1515

Leoneazzurro5

SimBy

DegustatoR

CarstenS

Moderator

3dilettante

SimBy

Leoneazzurro5

Erinyes

DegustatoR

Erinyes

pTmdfx