Apple (PowerVR) TBDR GPU-architecture speculation thread

Discussion in 'Architecture and Products' started by Kaotik, Jul 7, 2020.

Tags:
  1. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,334
    Likes Received:
    1,348
    We are fortunate on these forums that some people with deep knowledge and insight sometimes post here. It makes sense to listen to what they say.
     
  2. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,469
    Likes Received:
    187
    Location:
    Chania
    We shouldn't forget that all mobile ULP SoC GPUs cut corners one way or another in order to be as power efficient as possible. NVIDIA itself was cutting several corners with its initial Tegra GPUs until they addressed the high end automotive market with a Kepler derivative.
     
    #102 Ailuros, Nov 5, 2020
    Last edited: Nov 5, 2020
  3. rikrak

    Newcomer

    Joined:
    Sep 16, 2020
    Messages:
    30
    Likes Received:
    16
    Very true, this also makes educated comparisons tricky.

    For Apple specifically, lack of hardware double precision (something I find personally reasonable) and the fact that they seem to aggressively lower prevision to FP16 in the shaders.
     
  4. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,469
    Likes Received:
    187
    Location:
    Chania
    Apple's GPU ALU don't lack FP32, irrelevant what they optimize for; no idea if FP64 is possible but if it should be very slow (there was optional support for FP64 for Series7XT but I doubt Apple ever opted for it). Apple (as anyone else in their shoes) has it's own way of doing things and they were never really interested in DX11, tessellation or any of the kind. They rather invested for smartphones/tablets the hw overhead needed for DX11 in higher performance.

    Reasons why the skipped double precision and/or improved rounding support amongst others up to some stage in this blog post from Kristof here: https://www.imgtec.com/blog/powervr-gpu-the-mobile-architecture-for-compute/

    Now that they seem to move with their own GPUs with their Macs they shouldn't IMHO cut corners also there. Higher end designs are obviously completely different animals.
     
    #104 Ailuros, Nov 5, 2020
    Last edited: Nov 6, 2020
  5. rikrak

    Newcomer

    Joined:
    Sep 16, 2020
    Messages:
    30
    Likes Received:
    16
    I don't expect them to support FP64 any time soon — if ever — and personally, I think that the decision to skip it makes sense. Software emulation of double precision for whoever needs it is much faster than the gimped FP64 hardware units we have on modern consumer hardware, and it makes sense to invest silicon space to where it matters more.

    As to all other things, I completely agree. A14 in particular now implements some features of desktop GPUs that Apple was lacking (e.g. SIMD reduction operations and hardware barycentric coordinates) while improving the efficiency of async compute and GPU-driven rendering.
     
  6. Pressure

    Veteran Regular

    Joined:
    Mar 30, 2004
    Messages:
    1,507
    Likes Received:
    437
    I’m definitely watching their “One more thing” event tomorrow. That should give us some answers.
     
  7. Leovinus

    Newcomer

    Joined:
    May 31, 2019
    Messages:
    142
    Likes Received:
    73
    Location:
    Sweden
    Short recap of the important bulletpoints verbatim as they were presented:[​IMG]
    • Unified memory architecture
      • High bandwidth, low latency
      • Apple-designed package
      • Accessible to entire SoC
    • CPU
      • 4 high-performance cores
        • Ultra-wide execution architecture
        • 192KB instruction cache
        • 128KB data cache
        • Shared 12MB L2 cache
      • 4 high-efficiency cores
        • Wide execution architecture
        • 128KB instruction cache
        • 64KB data cache
        • Shared 4MB L2 cache
    • GPU
      • Up 8 Cores
        • 128 execution units
        • Up to 24,576 concurrent threads
        • 2.6 teraflops
        • 82 gigatexels/second
        • 41 gigatexels/second
      • Claimed 2x performance at 10W vs. "latest PC laptop chip"
      • Claimed 1/3 power draw at the indicated max performance of "latest PC laptop chip"
    • M1 claims
      • High efficiency CPU cores
      • High-Performance CPU cores
      • Secure Enclave
      • Low power video playback
      • Neural Engine
      • Advanced display engine
      • High-performance GPU
      • HDR video processor
      • HDR imaging
      • Gen 4 PCI Express
      • High-performance video editing
      • Always-on processor
      • Performance controller
      • Thunderbolt/USB 4 controller
      • High quality image signal processor
      • Low-power design
      • High-performance NVMe storage
      • High-efficiency audio processir
      • Advanced silicon packaging
     
    Pete, Lightman and BRiT like this.
  8. Arnold Beckenbauer

    Veteran Subscriber

    Joined:
    Oct 11, 2006
    Messages:
    1,708
    Likes Received:
    664
    Location:
    Germany
    So, it's somekind of Apple's IMG Series A? 128 ECUs and 8* TMUs per core (cluster) and upto 1,3 GHz clock?

    * or 4 TMUs mit doubled clock?
     
  9. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,960
    Likes Received:
    4,144
    Location:
    Finland
    Apparently AnandTech has been able to confirm it's really LPDDR4X memory behind 128-bit bus, not HBM as the finnish Apple pages said
     
    BRiT likes this.
  10. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,469
    Likes Received:
    187
    Location:
    Chania
    I'd like to stand corrected, but TMUs with double the frequency should be (for today's standards at least) suicidal for power consumption. It sounds like Alborix as you say but we don't have enough information yet.

    If my layman's backwards math shouldn't contain any serious brainfart, frequency should be somewhere in the =/>1.275GHz region.
     
    #110 Ailuros, Nov 11, 2020
    Last edited: Nov 11, 2020
  11. P_EQUALS_NP

    Newcomer

    Joined:
    Jun 17, 2020
    Messages:
    14
    Likes Received:
    3
    Even if apples claimed 2.6 teraflops uses half floats, 1.3 tera flops of single floats @10watts is pretty amazing!
     
  12. Leovinus

    Newcomer

    Joined:
    May 31, 2019
    Messages:
    142
    Likes Received:
    73
    Location:
    Sweden
    Possibly a bit off topic, but I feel tangential, to the technical and performance discussion:

    Over at eGPU.io there is plenty of discussion regarding wether or not Apple will discontinue the use of AMD GPU's altogether to focus on their own silicon. That being one of the possibilities why AMD drivers haven't yet been seen in the universal binaries of the Big Sur RC for ARM. Of course, a potential Pro 16" and iMac ARM lineup might change this for all we know.

    Either way it seems Apple could go the way of Intel and expand and refine this iGPU solutions into very capable products that entirely replace AMD. If developer and customer interest in their offerings take hold (an entirely parallel, non-static, hardware and software ecosystem seems a burden for all involved frankly). Still I think it raises an enormous amount of questions. Will we see Apple branded discrete GPU's for modular systems like Mac Pro? How will different SKU's be handled in other products? Fixed performance SKU's with novel SoCs (and assumedly binned versions for Lowe tiers). Multi-chip modules for bigger SoC's to increase yields? Discrete chips for high end systems? And I'm not even mentioning the developer aspects. My brain feels about as blocked as when you think about the endlessness of space when trying to fathom how Apple will move forward here.
     
  13. ^M^

    ^M^
    Newcomer

    Joined:
    Jul 6, 2006
    Messages:
    61
    Likes Received:
    17
    Apple speaks of a unified memory architecture between CPU and iGPU.
    Is it new for them? I remembered AMD trying something like this a while back, did it go somewhere?
     
  14. P_EQUALS_NP

    Newcomer

    Joined:
    Jun 17, 2020
    Messages:
    14
    Likes Received:
    3
    its new for them, and interestingly enough this may be the first implementations of unified memory in consumer space, since Intel and Amd apu's segment their gpu//cpu memory with all the assess restrictions imposed by windows api that *mostly* forbid sharing data between cpu and gpu.
     
  15. pTmdfx

    Regular Newcomer

    Joined:
    May 27, 2014
    Messages:
    390
    Likes Received:
    355
    It isn’t new to Apple, only new to the Mac.

    Metal has long documented that iOS and tvOS operate in the unified memory model (in Metal’s definition), while Mac (aka Intel) IGPs are made to present a discrete memory model.

    https://developer.apple.com/library.../doc/uid/TP40016642-CH17-DontLinkElementID_19
     
    #115 pTmdfx, Nov 11, 2020
    Last edited: Nov 11, 2020
    BRiT likes this.
  16. mfaisalkemal

    Newcomer

    Joined:
    May 26, 2017
    Messages:
    61
    Likes Received:
    33
    [​IMG]
    no, that's for peak performance of M1 @14.3W. Macbook Air @10W around 1.13TF FP32. Macbook Air with 70% FP16 and 30% FP32 shader code combination will achieve nearly original PS4 shader performance(1.742TF vs 1.843TF).
     
    Lightman and BRiT like this.
  17. Arnold Beckenbauer

    Veteran Subscriber

    Joined:
    Oct 11, 2006
    Messages:
    1,708
    Likes Received:
    664
    Location:
    Germany
    How can you then explain 82 Gtexels/s?

    The M1 is not a mobile SoC like A14, so FP16 performance is less relevant compared to FP32.
     
  18. Nebuchadnezzar

    Legend Veteran

    Joined:
    Feb 10, 2002
    Messages:
    1,042
    Likes Received:
    291
    Location:
    Luxembourg
    Pete, Lightman, Entropy and 2 others like this.
  19. Leovinus

    Newcomer

    Joined:
    May 31, 2019
    Messages:
    142
    Likes Received:
    73
    Location:
    Sweden
    Apple has now posted more in-depth information on what they compared performance with, scroll to the bottom of the page for footnotes. Though I'm not entirely sure it's detailed enough to give more than a slightly less murky view of things.

    The ones I interpret as making mention of the GPU are the following:
    • Testing conducted by Apple in October 2020 using preproduction MacBook Air systems with Apple M1 chip and 8-core GPU, as well as production 1.2GHz quad-core Intel Core i7-based MacBook Air systems, all configured with 16GB RAM and 2TB SSD. Tested with prerelease Final Cut Pro 10.5 using a 55-second clip with 4K Apple ProRes RAW media, at 4096x2160 resolution and 59.94 frames per second, transcoded to Apple ProRes 422. Performance tests are conducted using specific computer systems and reflect the approximate performance of MacBook Air.
    • Testing conducted by Apple in October 2020 using preproduction 13‑inch MacBook Pro systems with Apple M1 chip and 16GB of RAM using select industry-standard benchmarks. Comparison made against the highest-performing integrated GPUs for notebooks and desktops commercially available at the time of testing. Integrated GPU is defined as a GPU located on a monolithic silicon die along with a CPU and memory controller, behind a unified memory subsystem. Performance tests are conducted using specific computer systems and reflect the approximate performance of MacBook Pro.
    • Testing conducted by Apple in October 2020 using preproduction Mac mini systems with Apple M1 chip, and production 3.6GHz quad-core Intel Core i3-based Mac mini systems with Intel Iris UHD Graphics 630, all configured with 16GB of RAM and 2TB SSD. Tested with prerelease Final Cut Pro 10.5 using a complex 2-minute project with a variety of media up to 4K resolution. Performance tests are conducted using specific computer systems and reflect the approximate performance of Mac mini.
     
    Lightman likes this.
  20. Arnold Beckenbauer

    Veteran Subscriber

    Joined:
    Oct 11, 2006
    Messages:
    1,708
    Likes Received:
    664
    Location:
    Germany
    who can read has a clear advantage
    https://www.anandtech.com/show/15156/imagination-announces-a-series-gpu-architecture/3

    One M1's "GPU-core" should be 8-256-4, 8 texels per clock, 256 FP-FLOP per clock and 4 pixels per clock. It's something like 64-2048-32. Does it make sense?

    plus:
    https://www.anandtech.com/show/13661/the-2018-apple-ipad-pro-11-inch-review/6

     
    Pete likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...