Analysis Concludes Apple Now Using Custom GPU Design

Discussion in 'Mobile Graphics Architectures and IP' started by ltcommander.data, Oct 25, 2016.

Tags:
  1. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,550
    Likes Received:
    4,214
    Well this may be going over my head, but in my limited experience while promoting lower precision to higher precision is okay (just add a bunch of zeros to the left), the opposite results in a can of worms.
    If they're ignoring precision tags, what will happen if a wild FP64 appears and they're just defaulting everything to FP32?
     
  2. tangey

    Veteran

    Joined:
    Jul 28, 2006
    Messages:
    1,406
    Likes Received:
    149
    Location:
    0x5FF6BC
    I see this thread is now an Nvidia thread. You'd assume two guys with a combined beyond3d post count approaching 9000 would know better, but I guess there mustn't be any Nvidia related threads.
     
    TomK likes this.
  3. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,550
    Likes Received:
    4,214
    tangey likes this.
  4. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    429
    Location:
    Cleveland, OH
    You don't need to have 2xFP16 SIMD instructions to achieve 2x FP16 throughput in a GPU. They could just have a large enough wavefront length to where FP32 ops take twice as many cycles as FP16 ops.

    It would be very strange for IMG to simultaneously repeatedly refer to Series 6+ as scalar (in explicit contrast to Series 5/5XT) while also repeatedly mentioning the benefits of FP16, if their FP16 implementation were not scalar.

    Of course by scalar it's understood to mean no more than one lane per "thread"/work item, SIMD is naturally being used to process many "threads" in parallel.

    I think the G6x00 Series 6 cores don't even have FP16 ALUs.

    G6x30 Series 6 adds the FP16 ALUs at a 1:1 ratio to the FP32 ALUs. But the FP16 ALUs can perform 3 FLOPs somehow, vs the standard 2 FLOP FMA in the FP32 ALUs, so that's where the 1.5:1 ratio comes from. The actual ops haven't been disclosed as far as I know, but may be (A*B +/- C*D). They wouldn't be the first GPU to support operations like this either.

    Series 6XT and 7XT move to an actual 2:1 FP16 to FP32 ALU ratio, and away from 3 FLOP FP16 ALUs.

    Apple's A7 and A8 were at least until now alleged to be using GX6430 and GX6450, Series 6 w/FP16 and Series 6XT respectively.

    The document was last updated in 2016. So while it doesn't explicitly address Series 6XT it could easily incorporate the FP16 ALUs from G6x30.
     
  5. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    That's what I meant. It was just illustrated/phrased wrong. Note that as I repeatedly said in the past there's no public benchmark where I could ever find any efficiency difference between a 6200 and a G6230 core; obviously in order to show any difference the underlying code needs to get optimized for FP16. Sebbi stating amongst others that they'll start optimising for it is a good sign wherever those get used for.

    As noted above ratios where for throughput not physical units

    The material is dated in any case and even if they changed anything within the year they've left a lot of crucial changes they should have made out of the document.
     
  6. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    HLSL minimum precision types (10 and 16 bit) are specially designed to allow compiler to decide whether to apply them or not. It is 100% legal to threat all these types as 32 bit types.

    See the documentation here:
    https://msdn.microsoft.com/en-us/library/windows/desktop/hh968108(v=vs.85).aspx

    64 bit float is a completely different case. It is not a minimum precision type. Shader containing 64 bit types can only run on a GPU that supports it.
     
  7. tangey

    Veteran

    Joined:
    Jul 28, 2006
    Messages:
    1,406
    Likes Received:
    149
    Location:
    0x5FF6BC
    Linley have put up a piece stating their belief that A10 is throttling a lot more than previous Ax chips, citing increased GPU frequency as the fundamental reason.

    Apple Turbocharges PowerVR GPU
    http://www.linleygroup.com/newsletters/newsletter_detail.php?num=5619

    To me it looks poorly researched.

    They assume that A10 is fundamentally using similar but customised graphics IP compared to the A9, and that most of the GPU performance is from an increase in clock. Apple haven't historically kept the same GPU IP on new Ax generations.

    They cite poor increase in Futurmarks physics test, and also selective GFXbench tests as evidence of poor GPU performance/thermal throttling.

    But physics is a CPU test. Ax has always struggled with the test. Futuremark put out a PR several years ago explaining why iphone5s didn't increase much compared to iphone5. It isn't a GPU issue.

    https://www.futuremark.com/pressrel...results-from-the-apple-iphone-5s-and-ipad-air
    "In the Physics test, which measures CPU performance, there is little difference between the iPhone 5s and the iPhone 5"

    Finally, altough they do cite some glxbench tests, they don't mention the tests in that suite that might expose throttling, i.e. the sustain FPS tests and battery tests. According to the data in the Anandtech review, A10 in the iphone7 does drop from 60fps to 50fps after 5 mins, but sustains around 50fps until the battery dies. It's terminal fps is 50% more than on the iphone6s.

    The slightly bigger battery also last slightly longer. Assuming those last two things roughly cancel out, the overall package appears to be getting significantly more performance in that test from the same input power. Hardly indicative of the A10 having thermal issues relative to previous Ax generations.

    I guess that ultimately, if the chip has higher CPU performance and higher GPU performance, then it has the potential to generate more heat, and in fundamentally the same package, throttling of the higher performance has to happen. But Linley's argument is that much of the theoretical improvements aren't being seen, and is blaming it on GPU frequency increase. Also throwing in futuremarks physics test in a GPU discussion doesn't seem relevant.

    thoughts ?
     
    #47 tangey, Nov 10, 2016
    Last edited: Nov 10, 2016
    roninja likes this.
  8. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    A frequency of 900MHz? Is there even any sound indication to back up their gut feeling?

    Here's are the average results for the iPhone7 Plus, where it's easy to see how much the GPU throttles on average in Manhattan 3.1 (onscreen): https://gfxbench.com/device.jsp?D=A...al&testgroup=graphics&benchmark=gfx40&var=avg

    For the frequency it has been pointed out over and over again that the fillrate tests in the latest Kishonti benchmark suite are an aweful indicator for possible GPU frequencies. The A10 GPU gets in the offscreen fillrate test 9752 MTexels/s vs. 6074 MTexels/s for the A9 https://gfxbench.com/compare.jsp?be...S&api2=metal&hwtype2=GPU&hwname2=Apple+A9+GPU

    For the same amount of TMUs (12 in both GPUs) the difference is at 60%. So where does the author expect the A9 to clock exactly? And yes I doubt he has even the slightest clue.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...