PowerVR Furian Architecture

Discussion in 'Mobile Graphics Architectures and IP' started by Ailuros, Mar 8, 2017.

Tags:
  1. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,407
    Likes Received:
    4,057
    So now we should count floating point output per ALU per cycle as ALU number x1.5?
     
  2. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    Instead of 2 MADDs (4 FLOPs) per SIMD lane, you now have 1 MADD + 1 MUL (3 FLOPs) per SIMD lane (if I haven't misunderstood their slides).

    Rogue 4 clusters@500MHz :
    4 * SIMD16 * 4 FLOPs * 0.5GHz = 128 GFLOPs FP32
    Furian 4 clusters@500MHz :
    4 * SIMD32 * 3 FLOPs * 0.5 GHz = 192 GFLOPs FP32
     
    Rys likes this.
  3. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,407
    Likes Received:
    4,057
    That's.. exactly the same as I wrote, no?

    Each cluster has 32 ALUs, 16 of each are MADD and the other 16 are MUL. 2*16 + 16 = 48.
    1 cluster does 32*1.5 = 48 floating point operations per clock.

    Clusters now have twice the ALUs, but half of them only do the multiply operation.
     
  4. tangey

    Veteran

    Joined:
    Jul 28, 2006
    Messages:
    1,406
    Likes Received:
    149
    Location:
    0x5FF6BC
    Author indicates that cores won't be in end user products until at least end-of 2018. Which raises the question, what will Apple be using in the next gen Iphone ? Same as last year with higher clocks ? rework of last year ? Have Apple already gotten the raw IP and are now designing their own graphics blocks based on Furian ?
     
    #5 tangey, Mar 8, 2017
    Last edited: Mar 8, 2017
  5. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    It's 96 FLOPs/cluster:
    32 lanes * 2 FLOPs (MADD) = 64
    32 lanes * 1 FLOP (MUL) = 32
    ---------------------------------------------
    64+ 32 = 96 OPs/cluster

    Now I need something for fillrates :p

    Hairsplitting: twice the lanes or stream processors. In my weird book an ALU = SIMD. I wonder if they kept the vec2 FP16 stuff, there's no reason to get rid of it.

    9 or 10 (7XT derived) clusters on roughly the same frequency? I doubt anyone ever said or mentioned when RTL for Furian has been delivered. If partners got it a year ago then the answer is obvious. In any other case Ryan is right.
     
    #6 Ailuros, Mar 9, 2017
    Last edited: Mar 9, 2017
  6. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,407
    Likes Received:
    4,057
    One thing I've noticed is they're not saying how high the cluster count can go with Furion.
    They claim Rogue could go up to 16 clusters but above 12 clusters it would lose scalability, but Furion has a higher limit. They don't say how high it is, though.
    Could this scale up to 32+ clusters and match notebook/desktop-level APUs?
     
  7. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    The exact phrasing for Rogue is: "successful up to 12 clusters, with theoretical limit of 16". Apple has integrated 12 clusters into the A9X SoC and despite that there is a 16 cluster GT7900 it never ended up being materialized. I have severe doubts that even beyond 16 clusters wouldn't make sense for Rogue clusters and a GT7900 in theory is already at low end notebook/desktop level even today https://www.imgtec.com/blog/powervr-gt7900-redefining-performance-efficiency/

    All the slide in question http://www.anandtech.com/Gallery/Album/5508#12 claims IMHO, is that they could theoretically scale even higher in performance with Furian, but as long as there's no interested licensee for it, it's just theoretical marketing wash.
     
  8. tangey

    Veteran

    Joined:
    Jul 28, 2006
    Messages:
    1,406
    Likes Received:
    149
    Location:
    0x5FF6BC
    I haven't checked, but my gut instinct is that 8XT in products late 2018 at the earliest would mean a bigger gap than previously seen in IMG's high-end IP coming to market.

    If true, that would suggest it's a reaction to a lack of desire from their licencees for ever increasing graphics/GPU compute performance.

    Also assuming Apple is a customer that has had very early deliverables, one wonders who the other customer is who is also in at an extremely early stage (announcement cites "multiple partners)
     
  9. tangey

    Veteran

    Joined:
    Jul 28, 2006
    Messages:
    1,406
    Likes Received:
    149
    Location:
    0x5FF6BC
    eetimes article states up to 64 clusters ,although they expect not to see that initially.

    http://www.eetimes.com/document.asp?doc_id=1331445&page_number=2
    So working on the marketing numbers, Furian could extend to around x10 the gaming performance of a A9x at similar frequencies, assuming the bandwidth wasn't a bottleneck.

    Apple might look at a high-ish cluster count Furian for future Apple Macbooks.
     
    #10 tangey, Mar 9, 2017
    Last edited: Mar 9, 2017
  10. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    Could even be some super expensive future TV set SoC with something like a dual cluster Furian...

    I'm too bored now to look it up, but I'm pretty certain that when Rogue was announced company officials claimed 32 clusters and beyond for it. Eetimes also makes the mistake to interpret it as if the max sensible design latency of Rogue is at just 16 clusters.

    With what kind of CPU exactly? In theory the bandwidth problem could be solved with HBM if needed, but it all sounds too complicated for my taste. I'd rather believe Kyle Bennett's wild theory for a SoC with an Intel CPU and an AMD GPU before that.
     
  11. Rys

    Rys AMD RTG
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,138
    Likes Received:
    1,337
    Location:
    Beyond3D HQ
    Without making any product or customer claims, there are significant changes in Furian that make it (a lot) easier for the architecture to scale up to high SPU counts. That said, really the up-to numbers are mostly marketing; if a customer comes and asks for something big and commits, we will build it, announced or not.

    In the early stages of a new architecture or revision, before the RTL is final and customers have committed to licensing something, we pick a set of target cores we think are likely to be popular and start to build those. That set can and does change as the first customers get on board, and from then the roadmap is customer driven. The base architecture is configurable enough that it's practically impossible for us to build and verify every possible scaling configuration, so the customer-driven approach is necessarily the correct one.
     
    Entropy, BRiT and kalelovil like this.
  12. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    That's more or less what my gut feeling was about.
     
  13. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    #14 Ailuros, Mar 13, 2017
    Last edited: Mar 13, 2017
  14. tangey

    Veteran

    Joined:
    Jul 28, 2006
    Messages:
    1,406
    Likes Received:
    149
    Location:
    0x5FF6BC
  15. Rys

    Rys AMD RTG
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,138
    Likes Received:
    1,337
    Location:
    Beyond3D HQ
    Yep, last bit of work before I disappear. If anyone has questions, please ask.
     
  16. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    * I assume baseline featureset for Furian cores is still DX10?
    * Rogue could be optionally be DX11.2 if the customer wanted it; is Furian scaling up to DX12.x if needed?
    * I still can't figure out how to calculate texel fillrate for Furian vs. Rogue at the same amount of TMUs for each (yes I know it sounds dumb...)
    * Front end triangles are at 0.5 Tris/clock if I'm reading your table accurately; assume I have 4 FEs in something like a 12 cluster 7XT GPU. Is the geometry throughput still the same?

    Thank you in advance for whatever you can answer.
     
  17. Rys

    Rys AMD RTG
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,138
    Likes Received:
    1,337
    Location:
    Beyond3D HQ
    Baseline API support is DX10 if you take the Direct3D view of things. It's better to take the Vulkan view of things though for the base cores. There's a tessellator, for example. Peak feature set compliance is DX12.

    For the same TPU count, Furian sample rate is 2x Rogue.

    Geometry throughput is the same per front end, but the number of front ends is substantially different potentially, especially as the GPU gets bigger.
     
    Alexko likes this.
  18. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    Boy that was quick...thanks :cool2:
     
  19. Mat3

    Newcomer

    Joined:
    Nov 15, 2005
    Messages:
    162
    Likes Received:
    8
    A little late but I'll try anyways.. in Rogue what was the utilization rate on the 2nd ALU and how often was it a MUL operation?
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...