Recent content by rikrak

  1. R

    Variable Rate Shading vs. Variable Rate Rasterization

    Thanks, I have learned something today!
  2. R

    Variable Rate Shading vs. Variable Rate Rasterization

    Thank's, this really clears things up. So in retrospect, VRS's main purpose is to improve the utilization of shader resources as it allows you to adaptively and independently set shading rate for individual portions of the render target. It does not affect rasterization or the layout of the...
  3. R

    Apple (PowerVR) TBDR GPU-architecture speculation thread

    Ah, now I understand what you mean. Yes, Metal argument buffers are typed aggregate objects and require more precise type declaration of their components. Vulkan/DX12 use weaker type bindings, possibly to support a wider hardware range and more flexible descriptor table juggling. There is indeed...
  4. R

    Apple (PowerVR) TBDR GPU-architecture speculation thread

    Forgive my confusion, but how can you create a texture with an "unknown" type? What does it even mean? Edit: I looked up DXGI_FORMAT_UNKNOWN. It appears to be just a badly chosen name for something like "choose a default format for me". It still chooses a concrete format according to a...
  5. R

    Apple (PowerVR) TBDR GPU-architecture speculation thread

    Well, duh, if you want to bind (encode) a resource, you have to create the resource first. And to create a resource, you need to know it's size and type. I don't understand why you would call it a limitation, sounds to me like a logical way to design a binding API? Is there any API out there...
  6. R

    Variable Rate Shading vs. Variable Rate Rasterization

    Could any of you knowledgeable people offer a more detailed explanation of the difference between Variable Rate Shading (as used in DX12 and Vulkan) and Variable Rate Rasterization (as used by Apple in Metal)? If I understand it correctly, VRS simply allows the fragment/pixel shader to run at a...
  7. R

    Apple (PowerVR) TBDR GPU-architecture speculation thread

    That's news to me. Where did you see that? Multisampled textures do have a different data type from regular textures, I would speculate because their layout is hardware-dependent. Same for depth textures. But that's about it?
  8. R

    Apple (PowerVR) TBDR GPU-architecture speculation thread

    A complete amateur opinion here, but maybe they refer to the fact that transformed triangle data must be collected prior to rasterization, so a large amount of very small triangles will quickly fill up the buffers, causing premature flushes and additional memory operations? TBDR only really...
  9. R

    Apple (PowerVR) TBDR GPU-architecture speculation thread

    I have the feeling that we already had this very conversation few pages back? Tiling certainly adds additional per-primitive cost to the process, but this cost should be proportional to the number of tiles a primitive intersects. Small primitives should actually be the cheapest. Not that this...
  10. R

    Apple (PowerVR) TBDR GPU-architecture speculation thread

    This has been confirmed by Apple GPU driver team leader on Twitter I have also run some benchmarks (look in the posts above) that show FMA throughput on M1 and A14 GPUs. I would guess mainly because it's a very intricate engineering puzzle. Getting the deferred rendering behavior in...
  11. R

    Apple (PowerVR) TBDR GPU-architecture speculation thread

    Now I am very confused. If the GPU cores are basically identical how has Apple managed to double the FP32 rate on M1 relative to A14? Is this an artificial limitation on A14?
  12. R

    Apple (PowerVR) TBDR GPU-architecture speculation thread

    Depends on what you understand under "microarchitecture". They have identical feature set, yes, but there should be little doubt that their ALUs are physically different (different FP32 compute throughput, different size on die). What's even more interesting that Metal RT works across all...
  13. R

    Apple (PowerVR) TBDR GPU-architecture speculation thread

    For full disclosure, I have no idea how these things can be implemented in hardware. It was pointed out (https://www.realworldtech.com/forum/?threadid=197759&curpostid=197993) that fusing two FP16 ALUs to perform a FP32 operation or splitting a single FP32 ALU to perform two FP16 operations per...
  14. R

    Apple (PowerVR) TBDR GPU-architecture speculation thread

    Upgrade on this — new benchmark results are up, including A14 results (iPhone 12): https://www.realworldtech.com/forum/?threadid=197759&curpostid=197985 To summarise: normalised per GPU core, A14 has exactly half the FP32 throughput of the M1, while their PF16 throughput is identical. My...
  15. R

    Apple (PowerVR) TBDR GPU-architecture speculation thread

    I completely agree with you that there are sizable benefits of having fast FP16 operations even on desktop GPUs. This is definitely not something I am debating. I am just pointing out that M1 appears to have identical throughput for both FP32 and FP16 operations, which I hypothesize is due to...
Back
Top