PowerVR Series 6 now official

Discussion in 'Mobile Graphics Architectures and IP' started by roninja, Sep 10, 2010.

  1. Rys

    Rys AMD RTG
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,129
    Likes Received:
    1,301
    Location:
    Beyond3D HQ

    Up to the customer but sensibly it's 28nm and lower.
     
  2. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    What happens inside a pipe? Minions having a party? :lol:
     
  3. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    What happens inside a pipe? Minions having a party? :lol:

    Thanks :)
     
  4. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,390
    Likes Received:
    802
    Is it possible to co-issue FP16 and FP32 instructions in Series 6 or 6XT?
     
  5. Rys

    Rys AMD RTG
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,129
    Likes Received:
    1,301
    Location:
    Beyond3D HQ
    The F32 and F16 minions party separately in a given cycle.
     
  6. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,390
    Likes Received:
    802
    Thanks, I thought so. Tiny mistake in Ryan's article, then.
     
  7. Rys

    Rys AMD RTG
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,129
    Likes Received:
    1,301
    Location:
    Beyond3D HQ
    If I remember rightly, Ryan asked me the same question the other day and I was vague in my response, so he was necessarily vague in the article as a result. My fault, in hindsight I could have been clearer so he could be too.
     
  8. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    2,987
    Likes Received:
    100
    Hmm I wonder what these 4 float16 ops really can do. If Series 5 is any indication, my guess would be it can't be just any ops (so more like the EFOs where yes you can technically get double the flops but with quite severe limits on register choices and not really independent instructions).
    Maybe that's why there's confusion if there are 4 fp16 units with 2 ops each or 2 fp16 units with 4 ops each. In any case some more insight in what these minions there can do would be welcome by me too :).
     
  9. tangey

    Veteran

    Joined:
    Jul 28, 2006
    Messages:
    1,403
    Likes Received:
    148
    Location:
    0x5FF6BC
    I. glad I wasn't the only one confused. The diagram and narrative didn't match for me either. The series 6 USC diagram clearly shows each FP16 having 3 flops, whilst the series 6XT USC clearly shows 2 (but the narrative says 4).

    I did ask this in the comments section of the blog.

    Its ok not wanting to say things, but to have a narrative and an associated diagram completely at odds with one another just frankly seems poor proof reading.
     
  10. Rys

    Rys AMD RTG
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,129
    Likes Received:
    1,301
    Location:
    Beyond3D HQ
    I wanted the diagram to match the max "ALU core" count we want to put across for marketing. The text is closer to what actually happens in the pipe. Both add up to the same ops throughput.

    One is for marketing, the other is for those that actually care about how the hardware works.
     
  11. Turbotab

    Newcomer

    Joined:
    Feb 19, 2013
    Messages:
    214
    Likes Received:
    3
    I see a lot of veiled comments to feature bloat on Imagination Tech's, so I take it you believe that perf/watt is superior to Kepler?
     
    #491 Turbotab, Feb 25, 2014
    Last edited by a moderator: Feb 25, 2014
  12. Rys

    Rys AMD RTG
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,129
    Likes Received:
    1,301
    Location:
    Beyond3D HQ
  13. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    2,987
    Likes Received:
    100
    Oh and could someone explain the difference between 6200/6230 and 6400/6430? All the announcements essentially just said the the x30 are "optimized for performance" but on paper they look all the same...
    I thought once upon a time this meant you can reach higher clocks with the x30 parts but intel is saying the G6400 in Merrifield reaches the same clock as the G6430 in Moorefield, yet the latter being a good deal faster so it must be something else. More visibility tests or what?
     
  14. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,390
    Likes Received:
    802
    I can't blame IMG for playing that game given their competition, but it's a little sad that the "SIMD Lane == Core" terminology is now pretty much recognized as the standard.
     
  15. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    2,894
    Likes Received:
    745
    I love that you guys can actually, at least in low profile places like this, tell it like it is.
     
  16. tangey

    Veteran

    Joined:
    Jul 28, 2006
    Messages:
    1,403
    Likes Received:
    148
    Location:
    0x5FF6BC
    What implications has the inclusion of significant numbers of ALU16 on GPU compute capability. Are ALU16s useable as ALU32s, half as usable ?. Does GPGPU / opencl only see ALU32s ? I vaguely understand that the fact they are 16 bit is going to limit their application for maths calculation.

    What I am basically asking is whether describing a rogue core's GPU compute in terms of only its ALU32 count is fair, given that it has far more (albeit, less useful) ALU16 cores.
     
  17. Rys

    Rys AMD RTG
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,129
    Likes Received:
    1,301
    Location:
    Beyond3D HQ
    We can issue instructions to the F16 pipe via compute APIs just as well as we can with graphics APIs. CL supports half precision floats.
     
  18. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    2,894
    Likes Received:
    745
    The question touches on a subject which is difficult but significant - just how much precision do you need?
    Well, it depends on your problem and the algorithms you choose to attack it with. Personally , I gnash my teeth in frustration every time I see the "64-bit FP for scientific computation" trope.

    Generalizing broadly based on the cumulative error propagation behaviour, you can group algorithms as convergent, neutral (stochastically accumulated error) and divergent.
    If your algorithm is convergent, you don't need more precision than is required to represent your data.
    If your algorithm features stochastically accumulated error, then what precision you need is dependent on the number of iterations you run, and the desired numerical precision of your answer. (In my field, chemistry, this typically means 32-bit FP is perfectly OK, although 64 bit FP is often used by tradition anyway.)
    If your algorithm is divergent, you're in trouble, and you will have to keep a close watch on your code behaviour under all circumstances. Having more precision helps, obviously, but is only a band-aid, and there is nothing really saying that 64, 128 or any other number is going to be enough - ideally you should go back and try to reformulate your problem in order to be able to avoid the problematic algorithm.

    Anandtechs decision to only count 32-bit FP in their BogoFLOP chart seems strange to me. For the most part the GPU will process graphics, and the bulk of graphics operations seems as if they could be done in 16-bit, (limited precision needed, minimal iteration) so why focus on 32-bit performance alone?
     
  19. Ryan Smith

    Regular Subscriber

    Joined:
    Mar 26, 2010
    Messages:
    590
    Likes Received:
    911
    Location:
    PCIe x16_1
    Two reasons:

    1. We've traditionally only focused on FP32 performance in both mobile and desktop.
    2. I honestly didn't have a ton of time to work on this article. I don't have verified FP16 perf data handy for most other architectures, and while I have a pretty good idea of what it should be I didn't want to publish anything I wasn't sure of. And there wasn't enough time to get that data verified on a weekend.
     
  20. Rodéric

    Rodéric a.k.a. Ingenu
    Moderator Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,953
    Likes Received:
    811
    Location:
    Planet Earth.
    Would be nice to analyse current precision workload and do a pro-rata of the ALU to get an idea of FLOPS in "usual" cases...
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...