PowerVR Rogue Architecture

Discussion in 'Mobile Graphics Architectures and IP' started by Rys, Oct 17, 2014.

Tags:
  1. Lazy8s

    Veteran

    Joined:
    Oct 3, 2002
    Messages:
    3,100
    Likes Received:
    19
    Those extra mobile CPU cores may be underutilized in general device usage and currently in graphics usage under GL ES, but Vulcan's (and similarly Metal's) multi-threaded capabilities help fix that for graphics and GPU compute, at least.

    Great demo for clearly showing the benefits of multi-processing the regenerated command buffers and keeping the GPU well fed.
     
  2. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,468
    Likes Received:
    187
    Location:
    Chania
    True. However the purpose of big.LITTLE isn't to have all 8 cores at 2 quad clusters f.e. to run always at full tilt. If then you'd actually burn far more power then it was ever purposed for. As with all things what we need is a fine balance between performance and power consumption or better battery life. It won't do me any good if I get all my "8 cores" maxed out as much as possible if I have to run for a power plug about every hour.
     
    Grall likes this.
  3. Rys

    Rys Graphics @ AMD
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,174
    Likes Received:
    1,545
    Location:
    Beyond3D HQ
    The thing to remember is there are CPU benefits to be had whether you have one CPU or many. Even outside of multi-threaded use, there's just less CPU work happening in a Vulkan app and client driver in order to have the GPU do work.
     
  4. Infinisearch

    Veteran Regular

    Joined:
    Jul 22, 2004
    Messages:
    779
    Likes Received:
    146
    Location:
    USA
    Don't you mean potentially less? Since the responsibilities of the driver have been handed off to the application, doesn't it depend on exactly how the application deal with what the driver was responsible for before?
     
  5. Rys

    Rys Graphics @ AMD
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,174
    Likes Received:
    1,545
    Location:
    Beyond3D HQ
    Drivers for APIs like GLES do far far more than a Vulkan application now has to do, even though some of the responsibilities have shifted.
     
  6. Infinisearch

    Veteran Regular

    Joined:
    Jul 22, 2004
    Messages:
    779
    Likes Received:
    146
    Location:
    USA
    Can you give me some insight into this? I'm at a loss... I thought all the responsibility was shifted but since the application knows things about what its doing it can handle said responsibilities more efficiently depending on implementation.
     
  7. Rys

    Rys Graphics @ AMD
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,174
    Likes Received:
    1,545
    Location:
    Beyond3D HQ
    I don't want to do a disservice to the story of how a GPU driver goes about the business of commanding a GPU via a client API, but things like workarounds for badly behaved apps (you'd cry if you could see how much of that crap happens), online shader compilation, support for inherently branchy host work like render state validation, code to figure out what to do because the spec is so loose and ill-defined: they are all things that are either completely gone from either side or significantly reduced in either Vulkan driver or Vulkan-using app.

    That's not to say Vulkan cruft can't accrue on either side of the app-driver contract over time, but the clean slate is fundamentally liberating and removes swathes of code from the overall interaction.
     
    entity279 and Simon F like this.
  8. tangey

    Veteran

    Joined:
    Jul 28, 2006
    Messages:
    1,527
    Likes Received:
    278
    Location:
    0x5FF6BC
    Forgive my ignorance, but do mobile OSes such as Android and IOS, use standard APIs (i.e. gles3.0) within the OS to drive the user interface, composition etc, or is that all done at a lower level. Basically asking if existing Oses(as well as apps) are also suffering due to the use of glesx.x
    Is Metal perhaps Apple just exposing/formalising as an API, what they have been using internally within the OS for quite some time ?
     
  9. Rys

    Rys Graphics @ AMD
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,174
    Likes Received:
    1,545
    Location:
    Beyond3D HQ
    Can't tell you what happens in iOS for obvious reasons, but in Android hardware accelerated drawing and composition is via standard APIs.
     
  10. Lazy8s

    Veteran

    Joined:
    Oct 3, 2002
    Messages:
    3,100
    Likes Received:
    19
    As part of the performance enhancements in iOS 9, Apple mentioned that Metal will now be used in place of GL ES (on applicable devices) for the OS's Core Graphics and Core Animation APIs.

    Internally, I imagine they've been tapping the GPU in a relatively direct manner since iOS 1 for at least some aspects of OS graphics operations as well as some limited compute (browser acceleration, camera/photo/video acceleration, etc.)

    What I've been wondering since Metal's introduction is how it compares to the proprietary PowerVR SGL API.
     
  11. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,468
    Likes Received:
    187
    Location:
    Chania
    Is SGL still alive? (honest question)

    For Metal: is it me or is Apple even without Metal getting a high rate of efficiency out of their drivers already for Rogue GPUs? I'm asking because there might be a benefit with Metal in Gfxbench results, but the persentages are relatively small (=/<9%). Unless of course Gfxbench isn't a good indication of what Metal can do in general for GPUs.
     
    #131 Ailuros, Aug 21, 2015
    Last edited: Aug 21, 2015
  12. Rodéric

    Rodéric a.k.a. Ingenu
    Moderator Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,061
    Likes Received:
    958
    Location:
    Planet Earth.
    PowerSGL ? no.
     
  13. Rys

    Rys Graphics @ AMD
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,174
    Likes Received:
    1,545
    Location:
    Beyond3D HQ
    There's no real general indication when it comes to this kind of thing; it's all case-by-case. GFXBench was already heavily GPU limited, so the gains to come from switching to Metal were (and are) slim. You can't make a call about overall driver efficiency just by taking at look at that case (unfortunately).

    SGL is still alive, but there's no point comparing it to anything since it'll never see the light of day in a way that helps a non-IMG person understand its utility.
     
  14. I.S.T.

    Veteran

    Joined:
    Feb 21, 2004
    Messages:
    3,174
    Likes Received:
    389
    It's internally developed for various, well, internal uses?
     
  15. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    16,826
    Likes Received:
    4,129
    So Rys can stll play Ultim@te Race Pro ;)
     
    sebbbi, Grall and Simon F like this.
  16. rapso

    Newcomer

    Joined:
    May 6, 2008
    Messages:
    215
    Likes Received:
    27
    that's a back and forth in history.

    in GL1.0 the process was quite simple
    1. you set your drawing settings (e.g. flat shading, smooth,...)
    2. you set your texture
    3. you draw by telling what type (e.g. GL_QUAD ) and then push vertex by vertex to the rasterizing device.

    problems:
    the settings and data you set is usually not the format the GPU wants (e.g rasterizers used fixpoint vetices, not float, texel could be R8G8B8A8 or A8R8G8B8 or R4G4B4A4 or...) and every time you do that, the driver had to convert it. that's why in the old days everyone was sorting drawcalls by texture switches (even nowadays some have that mindset without knowing the historical reason)

    GL1.1 solution : Displaylists
    in display lists you can
    init:
    1. start display list recording
    2.1. you set your drawing settings (e.g. flat shading, smooth,...)
    2.2. you set your texture
    2.3. you draw by telling what type (e.g. GL_QUAD ) and then push vertex by vertex to the rasterizing device.
    3. stop recording
    drawing:
    4. replay that recording as many times as you want, the driver will not do any conversion

    problems:
    1. how do you change something? you need to record the display list again.
    2. hardware appeared that support an awesome new feature: multitexturing, but opengl was overwriting the previous "thing" you've set, thus you always had just one texture.

    GL1.2 solution texture objects, vertex arrays (I think those were actually earlier there, but slower)
    init:
    1. create texture objects
    drawing:
    1. you set your drawing settings (e.g. flat shading, smooth,...)
    2. you set your texture obejects (barely driver work)
    3. you draw by pointing at a vertex array in memory and telling GL how many primitives to draw

    problems:
    1. we are at Riva128 and Voodoo graphics times now, those were actually way faster than CPUs by using very smart pipelining and dedicated memory. dedicated memory is fast, but moving data to it was super slow (I think that was still ISA or EISA? time. thus you really become limited by the vertices you can copy to the rasterizer chips. With GeForce256 TnL hit consumers and the situation was even more unbalanced.

    from now on mostly extensions took over
    quick solution: VAS (vertex array storage) don't kill me if I'm calling it wrong, that's like ~1999 I think
    you can specify to GL that the array you point at will not be alternate until you tell so, that way the rasterizer can keep it in memory and just redraw. TnL was taking care of transforms, thus the CPU was not involved at all.
    problem: but you still had to copy.

    now we got all the memory handle, for Vertex (VBO), Rendertarget (RBO), uniform/constant (UBO, I think that was in GL 3.2)

    at this time nobody maintained Displaylists, because they become overly complicated to track by the driver. Displaylist allowed some data to be static (e.g. textures) but some data to be dynamic (whatever you set outside that was not recorded, thus overwritten inside the displaylilst). with VBO,RBO.. it went beyond the specs. I think the last attempt was by nvidia that supported for a short time PBuffers (kind of predecessor of frame buffer objects). but the driver guys said this became insanity.


    from now on the API was pushing all commands good old opengl 1.0 way, the driver recorded the commands into some buffer and pushed it to the driver thread.

    but why a driver thread if all data is on GPU and we just push commands? well, the GPU guys figured that everything you add to a GPU and which isn't used all the time is a waste. hence lets remove everything static and make (aka emulate it in shaders) it dynamic.
    I think PowerVR is the pioneer of this (I don't know exactly to what extend they've gone, but if you look at their GL extensions, you'll get quite some hints). As an example: transparency. That's not needed for most objects and if you need it, the shader could do it just as good, right? ok, but OpenGL has a dozens of settings, how do we know which combination to create? we cannot compile all 100 different permutations.... well, let's do that in a driver when it's needed.
    problem: there are tons of settings that can change every drawcalls, texture formats, framebuffer formats, blend settings, vertex layouts, shader, sampler...... and all of those trigger a new permutation of those super flexible units.
    well, the set of permutations you really need in a game is small, because there are 100 trees that render the exact same way, but every game has a different way to render its trees, thus the driver needs to evaluate all settings on the first drawcall and the consecutive drawcalls need to at least check all settings for a possible change... insane work nowadays... that's why it takes a lot of CPU time.
    and it's not just the average cost, but the unpredictable cost that makes this solution bad. if one frame some more object/drawcalls appear, the CPU will spent way more time preparing those draw calls than the GPU needs to executed'em.


    what was the GL 1.1 solution for the "the driver does it every drawcalls, but the data doesn't change between frames"? ah, yes: Displaylists... or lest call those command list or command buffer now :)

    as you can see in the presentation http://blog.imgtec.com/powervr/gnomes-per-second-in-vulkan-and-opengl-es , the world is divided into those display lists like back then, once you see a new one or an existing one needs to be modified, a new display list is recorded. for all the other frames the CPU just tells the API to replay the list...

    problems with Vulkan and DX12 will be obviously the same as back then in GL1.0
    "1. how do you change something? you need to record the display list again"

    my prediction for DX13 and emm... (Mantle...Vulkan...) Magma is a programmable command processor (which is just like moving the GPU back to the CPU and do it GL1.0 style).
    The PCP will allow you to evaluate a scene on the GPU and push data in a flexible way to the GPU backend....

    I hope everybody is sleeping well by now
     
    Ailuros, Lazy8s, liquidboy and 3 others like this.
  17. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    16,826
    Likes Received:
    4,129
    PCI
     
  18. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,175
    Location:
    La-la land
    @Davros Damn, I remember Ultimate Race Pro... I owned that, once upon a time, via a PowerVR PCX2 card I bought 2nd hand off of a guy on FidoNET back in the day if any of you guys remember that old thing (a card which I subsequently burnt while hardware overclocking it slightly too enthusiastically by the way... :( Not that it really matters anymore though now that PCI is legacy junk.)
     
  19. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    16,826
    Likes Received:
    4,129
    You can still play it today either with a glide wrapper of a powervr wrapper
     
  20. Infinisearch

    Veteran Regular

    Joined:
    Jul 22, 2004
    Messages:
    779
    Likes Received:
    146
    Location:
    USA
    Thank you for the trip through time. Didn't know that about sort by texture, I always assumed it had to do with textures being uploaded to video memory. Any other insight you wanna get off your chest, you have my attention. Thanks once again.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...