Recent content by dr_ribit

  1. D

    Hardware details about Adreno 740

    Specifications, architecture details, execution model, ISA information, that kind of stuff. I was under impression that this was a GPU enthusiast community? Didn't think that I had to specify this in more detail. If you'd follow your own google recommendation you'd quickly see that the only...
  2. D

    Hardware details about Adreno 740

    So roughly 2TFLOPs, assuming the usual scalar ALUs. This definitely sounds more realistic than other claims and combined with the faster RAM also explains why it's does better than A15/A16 in some synthetic GPU benchmarks. Still wondering about the poor Geekbench compute scores. Maybe indeed...
  3. D

    Hardware details about Adreno 740

    Thanks! Do you happen to know more about the architecture of Adreno SPs? I’ve found various claims around (there is a paper that depicts Adreno 630 using 3—wide SIMD like AMD or Apple). As I said, what surprises me that compute benchmarks of Adreno740 are not very good...
  4. D

    Hardware details about Adreno 740

    Thanks! Yeah, I’ve seen those figures too and they are just as mysterious as the math simply doesn’t check out (2048 SIMD ALUs times 980mhz is not any multiply of 2138). Assuming these are FP16 ALUs or FP32 is rate-throttled would at least put the numbers in the right ballpark but it’s still...
  5. D

    Hardware details about Adreno 740

    My timing is as so often impeccable :) Thanks for the heads up! Makes sense.
  6. D

    Hardware details about Adreno 740

    Cute. I hope the time spent writing this very useful post at least made you feel like you are contributing.
  7. D

    Hardware details about Adreno 740

    Does anyone have some information about the Adreno 740? I couldn't find any details on Qualcomm website, and the only source (Wikipedia) that I've seen mentioning some lists quite ridiculous specs (like 2560 ALUs and 3.5TFLOPS FP32) which don't make much sense to me.
  8. D

    Hardware implementation of threading models in contemporary GPUs

    But they will all be executing different warps, so it’s taken care by other schedulers, right? What I want to understand is how some GPUs achieve within-warp ILP without the expensive OOE machinery used in CPUs. Anyway, these are really cool tricks!
  9. D

    Hardware implementation of threading models in contemporary GPUs

    I have some difficulty wrapping my head around it. Just to see if I got it correctly: when you say that 32-wide warps execute over 2 cycles this still means that an instruction has to be issued only once, right? So if I have two instruction without dependencies the timeline will look a bit like...
  10. D

    Hardware implementation of threading models in contemporary GPUs

    In case anyone is interested in this, at least AMD’s dual-issue on RDNA3 is not a mystery anymore. According to their ISA reference they use a limited form of VLIW which packs two operations into one. There are some limitations however which operands can be used. So it’s much less exiting than...
  11. D

    DX12, DX12U, DXR, API bias, and API evolution ..

    They added inline tracing in 2021, it's just confusingly they call this "intersection query API" and the older method where the intersector calls an intersection function is called "intersector API". You can do inline RT with Metal in any kind of shader program as far as I remember, but please...
  12. D

    DX12, DX12U, DXR, API bias, and API evolution ..

    API design is hard. Besides, there are a lot of different types of hardware, each with their own quirks. Both Vulkan and DX12 aim to deliver a low-level interface which at the same time is abstract enough to work well for different hardware — truly a Sisyphean task. It is unfortunate that we...
  13. D

    Hardware implementation of threading models in contemporary GPUs

    Or maybe I misunderstood the slides. I just looked again and the basic SIMT algorithm described there seems to use only one PC after all (with a PC/execution mask stack that stores PCs of inactive threads). But then I don't understand why make a difference between SIMT and SIMD. Everyone seems...
  14. D

    Hardware implementation of threading models in contemporary GPUs

    That’s the basic knowledge, yes, but I was hoping to hear some more detail. For example, it doesn’t explain how some GPUs can issue multiple instructions for the same threat simultaneously. Also, isn’t Intel using a different type of architecture? Your statement seems to contradict the slides...
  15. D

    DX12, DX12U, DXR, API bias, and API evolution ..

    Vulkan might borrow the basic API design concepts of Mantle, but it was significantly dumbed down to fit the common denominator. If I remember correctly, Mantle offered arbitrary levels of indirection (nested table pointers) for resource descriptors, something that Vulkan still lacks (I might be...
Back
Top