Can iPad Pro out-game an XB360? *spawn

Discussion in 'Mobile Devices and SoCs' started by wco81, Mar 21, 2016.

  1. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    Uhmm you mean the Shield Android TV which isn't exactly a mobile battery powered device? In one GPU synthetic only (3dmark)? I expected to see a list of mobile games where the reviewer would had compared the devices against each other....

    But for the more important part when you really have a device that's battery powered from the link above:

     
  2. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    44,104
    Likes Received:
    16,896
    Location:
    Under my bridge
    Are we still going to use RGB or are people trying YUV type buffers more? We were talking YUV at the beginning of PS3 - I'm sure we all rememeber nAo16 or whatever it was format discussed for Heavenly Sword. YUV/HSV would better support different resolutions, seems to me.
     
  3. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    Crytek used this on last gen consoles. Definitely worth trying also on mobiles, as bandwidth is the biggest limitation.

    http://graphics.cs.aueb.gr/graphics/docs/papers/YcoCgFrameBuffer.pdf
     
  4. TheAlSpark

    TheAlSpark Moderator
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    22,146
    Likes Received:
    8,533
    Location:
    ಠ_ಠ
    NAO32 :wink2: - but that was for HDR.

    Much luv for log.
     
  5. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    44,104
    Likes Received:
    16,896
    Location:
    Under my bridge
    Who's not using HDR these days? LOGLUV colour space seems a smart change. IIR we even discussed GPUs supporting this in hardware. I'm guessing shaders now make that unnecessary, unless the ROPs are still hard coded RGB and can't work differently.
     
  6. The SHIELD tablet is slower, but it also uses a 2 year-old SoC made on 28nm.
    Regardless, Anandtech's ipad pro review show a completely different scenario in GFXBench, where the A9X outmatches a 15W Core i5.
    However as stated earlier, the GFXbench is using OpenGL and it's using lower precision shaders.
    It's possible that Intel's OpenGL drivers just suck and the lower precision shaders actually make a substantial difference, so GFXBench's results aren't representative of how the A9X would behave compared to Intel's HD515/520 if it had to deal with actual "PC grade" games. For example, the fact that it's a non-threaded dual core could be quite the issue in modern games.
     
  7. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    I'm still waiting for a plausible answer why it doesn't make an inch of a difference in performance in Gfxbench graphics tests on Rogue GPUs where FP16 SPs are completely absent. The article above aims to check if the PRO is as fast as a laptop and it obviously isn't. Not that it makes any particular sense either since a laptop and a tablet have a completely different power portofolio.
     
  8. Laurent06

    Veteran

    Joined:
    Dec 14, 2007
    Messages:
    1,091
    Likes Received:
    489
    Hmm I see the Shield Tablet (Tegra K1) behind the iPad. The Shield TV (Tegra X1) though is faster.

    Ice Storm physics is a different beast, since it's mostly a CPU test, and one where the Apple cores have been doing somewhat badly for a few generations.

    EDIT - Missed the third page, sorry :embarrased:
     
  9. What are you talking about? What SoCs are you comparing?

    Rogue GPUs without FP16 SPs? Is there such a thing? AFAIK the only difference between 6 and 6XT is that 6 used 3-way FP16 units at the same amount of units as FP32, whereas 6XT used 2-way FP16 units at twice the amount of units as FP32. The theoretical FP16 output is just 33% more.

    [​IMG]
    [​IMG]
     
    #49 Deleted member 13524, Mar 30, 2016
    Last edited by a moderator: Mar 30, 2016
  10. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    Thank you for the funky diagrams as if I wouldn't know what each consists of LOL. I have both devices here with me. One has a 6200 which DOESN'T have any FP16 SPs and a 6230 which has. With one exception I've verified the supplied results here in real time before I included them above.

    Again for reference:

    The first Rogue batch (Series6) came either without FP16 SPs (6200, 6400 ) or with FP16 SPs (6230, 6430). Mediatek has been using the G6200 for two SoC generations now. The latest is in the HelioX10 clocked at 700MHz.

    https://imgtec.com/powervr/graphics/series6/

    Table on the bottom of the page clarifies what each variant exactly contains.

    Series6, 6230, 6430, 6630 => FP32: 1.5x times FP16 SPs
    Series6XT & Series7XT => FP32: 2x times FP16 SPs

    FP16 output is workload dependent; if you'd go for instance for something like deep learning you're most certainly not stuck at 33% more output. As I said SIMDs as you can see them in the former marketing diagrams can either be fed with FP32 or FP16 instructions yet not a mix of both at the same time. I would think that they share datapaths as otherwise at least some of them could be used in parallel.

    Those Rogues that have dedicated FP16 units save IMHO mostly power compared to channelling everything through FP32 SPs in a 3D game. No one would use FP16 in a game or benchmark instead of wherever FP32 is recommended, since the difference would show.

    In order to come back to the above results: a 6230@533MHz with 102 GFLOPs FP16 vs. 90 GFLOPs FP32 of the 6200@700MHz should be at least close in performance in benchmarks that supposedly use excessively FP16. Contrary to that their performance difference matches too much the respective FP32 FLOP difference between the two to suggest anything that would escape the norm you'd find in any real mobile game out there.
     
    #50 Ailuros, Mar 31, 2016
    Last edited: Mar 31, 2016
  11. tangey

    Veteran

    Joined:
    Jul 28, 2006
    Messages:
    1,537
    Likes Received:
    282
    Location:
    0x5FF6BC
    So we can summarise from the above that:

    The mediatek chip X10T has rogue 6200
    The rockchip Allwinner A80 rogue 6230

    There are no FP16s available in the 6200

    Comparing the relative TREX scores would seem to suggest that the presence of FP16s does not majorly influence the scores.

    If that suggestion is factually correct, then the presence of FP16 in the iPad Pro is not given a false benefit when comparing TREX scores between the Pro and the HD7770
     
    #51 tangey, Apr 3, 2016
    Last edited: Apr 4, 2016
  12. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    For hairsplitting's sake the A80 is from Allwinner and not Rockchip.

    For the conclusion: the data at hand is too sparse for my taste to jump to any conclusions, since there could be other factors at play I'm not aware of. I'm just noting that all 3 benchmark scores (T-Rex, Manhattan 3.0 & Manhattan 3.1) are too close to the frequency difference of the two GPUs.

    Other than that why would AMD or any other IHV really bother to optimize a desktop GPU for a ULP mobile benchmark exactly? (assuming there's no other culprit for it). It should go without saying that IHVs like Apple, QCOM, Samsung and others heavily optimize for Gfxbench amongst other synthetic benchmarks for their ULP SoC GPUs.
     
    #52 Ailuros, Apr 4, 2016
    Last edited: Apr 4, 2016
  13. OlegSH

    Regular

    Joined:
    Jan 10, 2010
    Messages:
    797
    Likes Received:
    1,622
    A9X GPU has 384 SPs, its frequency is in 0.7 GHz - 1 GHz range, so both GPUs have almost equal number of flops
    Considering how bandwidth bound deferred shading is the A9X GPU with 2x of bandwidth should run circles around X1 GPU, but instead the perf difference shrinks with more ALU bound tiled deferred rendering in Manhattan3.1 and it's possible to see a reverse situation with more modern Car Chase test, I wonder how A9X GPU would deal with tesselation
     
  14. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    I severely doubt that Apple would all of the sudden not only increase unit amount as usual, but at the same time increase frequency in their GPUs by as much. Is there at least a single indication for such an absurd frequency or is it again another gut feeling? Please tell me it's not extrapolated from the fillrate results in Gfxbench....

    For correctness' sake yes it has 384SPs; I by mistake used FP32 FLOPs/clock.

    https://forum.beyond3d.com/posts/1901055/

    Same unit count obviously in the 9.7" tablet; over half a TFLOP is rather a FP16 quote than FP32. At estimated 400MHz it gives 307GFLOPs FP32 or 614GFLOPs FP16.

    How it'll fair in Car Chase is subject to Apple delivering 3.2 drivers and it won't give any considerable results with tessellation since it's most likely all chanelled through the ALUs itself as ARM does. That however has nothing to do with the above.

    For the record's sake and in case you haven't noticed the A8X fairs times worse in Manhattan 3.1 and it's not the fault of the architecture itself, but strangled resources in Series6XT vs. 7XT. The latter cores fair in 3.1 quite a bit better, however whatever they've increased in 7XT wasn't obviously as generous as it could have been.

    Here's the Salvator X from Renesas (R-Car H3) with a GX6650 (6XT) that actually gives you a first tessellation result from a 6 cluster config; heck even the 12 cluster T880 in the S7 reaches over 43fps in that one all channelled through compute and no I don't expect the A9X to even reach as high:

    https://gfxbench.com/device.jsp?benchmark=gfx40&os=Android&api=gl&cpu-arch=ARM&hwtype=GPU&hwname=Imagination Technologies PowerVR Rogue GX6650&did=30930332&D=Renesas Salvator-X

    But since you're bound to entertain us with self invented frequencies that one gives an offscreen fillrate of 10739 MTexels/s. With 12 TMUs the frequency is obviously not over 900MHz. More like 600MHz; now tell me what the heck could be "wrong" with that fillrate test..... :D

    With 24 TMUs of the A9X the fillrate should be at 21478 and that at 600MHz. However the A9X GPU gets "only" 15862 MTexels or else 26% less. Do the math....

    http://documentation.renesas.com/doc/DocumentServer/R70PF0027ED1000.pdf
    (page 6 for R-Car H3 frequencies)
     
    #54 Ailuros, Apr 5, 2016
    Last edited: Apr 5, 2016
  15. OlegSH

    Regular

    Joined:
    Jan 10, 2010
    Messages:
    797
    Likes Received:
    1,622
    Ok. I will do it for you, 24 filtered texels per clock * 0.6 GHz = 14.4 GTexels/sec, unsurprisingly, that is a close match to what iPad Pro actually does (14.085), but we don't know the efficiency (though it should be obviously close to peak theoretical values for TDBR), iPad Air 2 does 7.56 GTexels/sec, so 12 clusters iPad Pro is 1.86x times faster with 1.5 higher number of TMUs, 1.86/1.5 = 1.24, so A9Xs GPU should have 1.24x higher frequency to achieve its fillrate. 16nm FF+ allows approaching up to 1.35x higher frequencies due to vastly reduced dynamic power consumption at the same power in comparison with 20nm, while the density gains are minimal, it would be utterly stupid to not use the strong points of the tech process and rely instead on the weak density gains only, obviously it's perfectly known to engineers at Apple, hence the higher frequency of A9X GPU. I was wrong with my initial frequency estimation at a glance, frequency should be somewhere in the 650-750Mhz range depending on the efficiency, still, it doesn't change any of my conclusions, the number of FLOPs is the same for both chips, 500-538 Gflops for A9X@0.65 - 0.7GHz vs 512 Gflops for TX1@1 GHz and bandwidth is a lot higher for A9X
     
  16. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    I don't care what you consider wise or unwise and you may very well believe what suits your imagery better. The fillrate test uses alpha blending for the record. Other than that I've provided a wee bit more documentation then your usual gut feeling. If in doubt ask around it shouldn't be too hard to find out. Frequency is not over 500MHz either way you want to twist it.

    Apple ITSELF claimed in a marketing blurb that the A9X GPU has 360x times the GPU power compared to the original iPad. The SGX535 in that one does 2 TFLOPs so figure it out.
     
  17. OlegSH

    Regular

    Joined:
    Jan 10, 2010
    Messages:
    797
    Likes Received:
    1,622
    A9X as well as A8X have the same number of TMUs and ROPs, so all numbers are still perfectly valid

    So what?

    That's great, you can do the math by yourself now, just pick up the 1.6 Gflops number http://www.anandtech.com/show/4225/the-ipad-2-review/5 and multiply it by 360, hopefully you will get something like 576 Gflops for A9X :smile:
     
  18. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    There are always 2 TMUs per cluster, but I don't know how they scale the back end with increasing cluster amounts. Above reality works obviously just in cases you seem to want to select since I still haven 't received an inch of a viable answer why on God's green earth the 6 cluster GX6650 with 12 TMUs yields over 10GTexels while clocked at a mere 600MHz. Did you even bother to compare those results to the A8X results to see if they make sense?

    It won't give you TMUs * frequency = fillrate just because you think it will. Again 10739 MTexels/s / 12 TMUs = ~895MHz. See above the official product link it clocks at 600MHz.
    https://gfxbench.com/compare.jsp?be...logies+PowerVR+Rogue+GX6650&D2=Google+Nexus+9

    The fillrate results still make sense yes? It has been noted here on the boards many times that the latest Gfxbench fillrate test is highly misleading in regards of results to extrapolate frequencies out of those. Has the GX6650 above 2.4x times the fillrate of the GK20A in K1 or rather a <15% difference in peak texel fillrate due to frequency differences?

    What could make sense is compare same architecture GPUs preferably from the same generation.

    The original iPad GPU clocks at 250MHz; 2Vec2 FP16 * 0.25GHz = 2.0 GFLOPs FP16. I actually remember helping Anand himself back when he was writing that article for that page, because there was some confusion with MADDs.

    Apple marketing back then also claimed a 9x times increase for the GPU from iPad to the iPad2.

    SGX543MP2
    2 cores * [ (4 Vec4) + 1 SFU MUL ] * 0.25GHz = 18 GFLOPs / 2 GFLOPs = 9x times increase and yes that's just as much marketing as the 360x times claim which goes for FP16 FLOPs on the 535 of the iPad since it was capacble of 2 Vec2 FP32 only under conditionals. 2 GFLOPs * 360x = 720GFLOPs FP16. Counting that single 9th OP from the SFU is just another of those dubious stories; yes it can be used but under conditionals again.

    As one can see above Apple is rather consistent with GPU frequencies through each of their respective generation. For Series5/XT it was always in the 250-325MHz range and for anything Rogue since the A7/iPad Air frequencies are in the 400-533MHz ballpark for Apple.

    I know that Intel uses a frequency of somewhere 460-470MHz for the G6430 they had integrated for their smartphone SoCs and had a burst frequency of 533MHz, but I doubt Apple used something like that. The unfortunate thing is that the new Manhattan 3.1 long term performance isn't available yet. It would be interesting to see if and how much either the iPad Pro or iPad 9.7" Pro are throttling. A small tolerable persentage for GPU throttling would rather favour the low clock theory, and is actually the reason IMO why Apple prefers to go wide with relatively low frequencies for its GPUs.
     
    #58 Ailuros, Apr 5, 2016
    Last edited: Apr 5, 2016
  19. OlegSH

    Regular

    Joined:
    Jan 10, 2010
    Messages:
    797
    Likes Received:
    1,622
    Ask the one who did the test, it could be done with any possible frequencies, which are not guarantied to be limited by 600 Mhz, neither the results are guarantied to be correct for some development boards or whatever the thing is

    I don't bother to compare some random results of some random board because the board could be overclocked, it could be cooled with an air solution, it could not be limited by the same thermal and power constrains as A8X, and there is simply not enough of data samples to make any worthwhile conclusions at all

    They don't make any sense, but for totally different reasons. It's not the test issue if some results are random garbage

    We can compare texture filtering results, but this won't change anything - https://gfxbench.com/compare.jsp?be...GPU&hwname1=Apple+A9X+GPU&D2=Apple+iPad+Air+2

    These are FP32 flops, since USSE is unified it have to support FP32 for vertex processing - USSE enables up to IEEE 754 single precision floating point data processing essential for the best possible image quality, + https://imagination-technologies-cl...m/documentation/PowerVR_graphics_brochure.pdf (page 13)

    I don't think so, 360x goes for FP32 flops, this is the only possible way to be on par with the Shield ATV in ALU test - https://gfxbench.com/compare.jsp?be...me1=Apple+A9X+GPU&D2=NVIDIA+Shield+Android+TV
     
  20. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    Ironically it roughly yields the results you'd expect from a 600MHz 6 cluster 6XT compared to the A8X GPU which also clocks lower.

    Preferably same generation....I wouldn't suggest that the front and back ends for a GX6450 and a GT7600 (duplicated or not for both are identical):

    https://gfxbench.com/compare.jsp?be...U&hwname1=Apple+A9+GPU&D2=Apple+iPhone+6+Plus

    Without conditionals you don't get 2 Vec2 FP32 out of a 535, but rather 1 Vec2 FP32 or 2 Vec2 FP16; peak FP32 is obviously 2Vec2.

    Who says it has to match the X1 GPU in one ALU test, while it surpasses the former in the ALU2 test? Different architectures, different strengths and weaknesses. As you already noted the difference shrinks for the A9X in Manhattan3.1 and I'd expect another shrink in Car chase until Apple delivers a DX11.x GPU which doesn't sound like all that soon.

    I don't even recall what the ES2.0 ALU test does, but the ES3.0 "ALU2" test:

    [sarcasm start]Other than that: sure of course they've clocked a ULP SoC GPU that exceeds the 1b transistor mark at 930MHz because it's the only way it can make you feel better. And while you run any mobile game on it it throttles after a couple of minutes to half its frequency because it is really "that" common for Apple to follow such a strategy..... [/end of sarcasm]
     
    #60 Ailuros, Apr 5, 2016
    Last edited: Apr 5, 2016
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...