AMD: Navi Speculation, Rumours and Discussion [2019-2020]

Discussion in 'Architecture and Products' started by Kaotik, Jan 2, 2019.

Thread Status:
Not open for further replies.
  1. DDH

    DDH Newcomer

    He says something like "software based real time ray traced global illumination" around the 10:10 mark
     
  2. Scott_Arm

    Scott_Arm Legend

    I'm not doubling the power consumption of the memory. You're right about memory channels, but I'm also not claiming a linear doubling is exact. it's just an estimate. We don't even know how much power the gpu in ps5 can draw. My guess is it's over 100W for just the gpu portion of the apu, not including external memory chips or the PS5 pcb, or any other components.
     
  3. Scott_Arm

    Scott_Arm Legend

    So add in 20 CUs (38% increase) and 300 MHz (17% increase), then add that 40-50W for the PCB and you have a 72CU 2.1GHz RDNA2 gpu (mind you with a bigger memory bus). Just naively you're going to be well over 260W-270W because of the clock increase.
     
    BRiT likes this.
  4. Rootax

    Rootax Veteran

    Even then, nVidia would'nt have release a new génération slower (or as fast as) than the old one (even if it was the top of the line)
     
    pjbliverpool likes this.
  5. trinibwoy

    trinibwoy Meh Legend

    Even if that’s true they certainly would’ve aimed much higher than TU102’s 4608 units for any chance at the crown.
     
  6. DegustatoR

    DegustatoR Veteran

    It's not that they would, it's that you'd get a lot less of a performance increase by making just a wider Turing, even with higher clocks. So it is possible that AMD expected that instead of what Ampere turned out to be. Still I wouldn't count on this. They both tend to know quite a lot about each other well in advance of each generation release.
     
  7. Rootax

    Rootax Veteran

    Yeah but don't forget the node jump too, even if it's not 7nm tsmc. I don't know, seems like a real dumb thing if true...
     
    Lightman likes this.
  8. Erinyes

    Erinyes Regular

    But the XB1X consumed ~170W at the wall in gears, and XSX consumes ~210W. Unless you're suggesting the CPUs consume next to zero power, the APU budget shouldn't even be close to 200W.
    And I'm sure Lisa Su told you that personally?
    You mean you're actually taking data from consoles using an almost identical architecture, on an almost identical process, likely on a slightly worse silicon bin, and suggesting we extrapolate that to PC GPUs?? You can't do that! Because....reasons.
    Well since the RDNA2 HW seemingly cannot do RT concurrently, I'd say it shouldn't have much of an impact at all, if any.
     
    Last edited: Oct 20, 2020
    Lightman likes this.
  9. chris1515

    chris1515 Legend

    RDNA2 GPU can do RT concurrently. They can do ray/intersection and shading in parrallel.
     
    DegustatoR likes this.
  10. Leoneazzurro5

    Leoneazzurro5 Regular

    RDNA2 cannot do texturing concurrently AFAIK. It depends how much power RT units are taking. Which is not known.
     
    Lightman likes this.
  11. SimBy

    SimBy Regular

    If you paid attention to the hints coming out of AMD (Wang) and Cerny (PS5) you would know that they designed RDNA2 as a 'multi gigahertz clock frequency' architecture. Increasing the clock speeds was literally one of design goals. This whole narrative that Sony and now AMD pushed clock speeds up the last second is honestly insulting.

     
  12. DegustatoR

    DegustatoR Veteran

    2.0 and 2.4 are both "multi gigahertz clock frequency" but the resulting power consumption of a chip on these can be drastically different.
     
  13. CarstenS

    CarstenS Legend Subscriber

    I don't know really. Maybe they really just hit their power limits or some current limit at those points in time? I am no Furmark expert by any means.
     
  14. 3dilettante

    3dilettante Legend Alpha

    As a texture-type instruction, we know that a BVH instruction cannot issue in parallel with a texture/vmem operation, but is that the same as them not working concurrently? Outside of that initial few cycles, it could be hundreds of cycles before the buses used by the BVH instruction are needed by it. Are we sure a wavefront cannot issue a memory operation or maybe another BVH? Texturing and memory ops can be issued freely until a waitcnt instruction is encountered and not enough have resolved.
     
    NightAntilli, Krteq and BRiT like this.
  15. SimBy

    SimBy Regular

    That's not what I was arguing with. I said they didn't bump the clocks last second to compete with the 'mighty' Ampere. Clocks are this high by design and have absolutely nothing to do with Ampere. Same goes for PS5 and XSX comparisons.
     
  16. Leoneazzurro5

    Leoneazzurro5 Regular

    Possibly that is the case. But, in any case no one at the moment knows how much power these RT unit draw and their average utilization so declaring that they will dramatically increase power usage is as wrong as saying they will not affect power usage at all.
     
    Erinyes likes this.
  17. Erinyes

    Erinyes Regular

    I think we all know that. However claiming that AMD did not intend to clock x.xx ghz (rumoured) speed and only did so as a reaction to Nvidia, is speculation at best.
    To clarify, I wasn't trying to suggest that some RT cannot be done concurrently at all, and perhaps could have worded it better. I was more trying to point towards it probably not having a significant impact on power consumption.
     
    Cuthalu likes this.
  18. DegustatoR

    DegustatoR Veteran

    You doubt that AMD will adjust their clocks according to what NV has launched already?
    They can of course lower them just as well as increasing them but both possibilities are essentially a given at this point. They would be stupid not to.
     
    Cuthalu likes this.
  19. Erinyes

    Erinyes Regular

    Of course they probably would adjust the clocks based on what they've seen from NV, but by how much? You can't magically change silicon, PCB and cooler design overnight to dissipate 50-100W more. Testing, validation, production and distribution take a lot longer than the approx 4-6 weeks since Ampere has been out.
     
  20. pTmdfx

    pTmdfx Regular

    Even normal load-stores are quite liberal — operations can be freely reordered, and only RF writeback is in program order. The texture load-store path has been supporting varying latency and a huge swarm of capabilities since GCN anyway, say for example, address coaleasing or the lack thereof can cause a load instruction to take a varying number of cycles to complete, even though multiple load instructions can be issued back-to-back. RDNA enhanced it further by adding a low-latency path bypassing the samplers, and RDNA 2 BVH intersection seems to be merely a (new) cherry on the "filtering/pre-processing" pie.

    ---

    Regarding the on-going thread about the power usage though, I don't see anything contentious... As the patent describes, the intersection engine is basically an alternative path to texture filtering, operating on packed BVH node data. Ray-box and ray-tri testing seem quite straightforward logic, so likely no "power drainage" to be expected... At worst the CU can issue bunch of intersections, issue a vmcnt wait and eventually clockgate the ALU datapaths, if no other kind of kernels is running in parallel.

    Nvidia persumably does the whole traversal process in the fixed function hardware, so one might argue that they could have an edge in power usage in potentially keeping the CU/SM off. But it is uncertain whether it matters with the prevalent use of async compute to fill gaps, and whether the actual saving does make a dent in the grand power consumption.
     
    w0lfram, dskneo and Jawed like this.
Loading...
Thread Status:
Not open for further replies.

Share This Page

Loading...