Software/CPU-based 3D Rendering

Discussion in 'Rendering Technology and APIs' started by 3D_world, Oct 28, 2012.

  1. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    A processor moves data and it operates on it. The more interesting aspect to the PPU was the data movement part that it used to send state updates to and from the processing units.

    I'm not sure the elements on a GPU or CPU are a perfect match to that.
     
  2. Dominik D

    Regular

    Joined:
    Mar 23, 2007
    Messages:
    782
    Likes Received:
    22
    Location:
    Wroclaw, Poland
    I think you're slightly mad. ;) Today even your mouse has CPU. And you definitely need one to translate whatever data you want your GPU to push to the GPU-specific format. There's no way around it, really, unless you end up with a HW monopoly.
     
  3. Rakehell

    Newcomer

    Joined:
    Jul 25, 2013
    Messages:
    10
    Likes Received:
    0
    Of course I meant high-performance CPUs in personal computers. Your mouse can't run OpenGL.
     
  4. nutball

    Veteran Subscriber

    Joined:
    Jan 10, 2003
    Messages:
    2,492
    Likes Received:
    979
    Location:
    en.gb.uk
    The high-performance CPU I had in my personal computer 20 years ago was utterly incapable of doing what a simple mouse does today in real-time. Neither can/could run OpenGL, and of course that didn't/doesn't stop them being good at what they did/do.

    Personally I think you lack depth of perspective. Perhaps I'm wrong.
     
  5. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,541
    Likes Received:
    964
    What does a mouse do in real time that is so demanding?
     
  6. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,382
    Not that I agree with the no-CPUs in x years theory, but the amount of processing done in optical mice of today must be quite substantial: AFAIK it's all processing of CCD images at very high refresh rates. (Though a major part is probably fixed function image processing...)

    Still 20y ago, we were talking 486 at 33MHz? It shouldn't be too hard to find tiny micro-controllers with more integer performance than that. :wink:
     
  7. Blazkowicz

    Legend

    Joined:
    Dec 24, 2004
    Messages:
    5,607
    Likes Received:
    256
    What kind of res and refresh is that?, 160x120 and 125Hz? OK the refresh can be higher but maybe it's lower res?

    20 years ago we had that 486 at 33MHz, or even a DX/2 66 which is perhaps the first common CPU we can say is powerful. Those beasts were for real time 3D rendering, sometimes with textures, or high end raycasted games like Doom and Duke3D.
     
  8. Rakehell

    Newcomer

    Joined:
    Jul 25, 2013
    Messages:
    10
    Likes Received:
    0
    And here you create an imaginary problem of a generational gap that completely misses my point.
     
  9. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    832
    Likes Received:
    505
    You may remember the discrete FPU, maybe you were too young to know these.
    You could say these have disappeared, but in fact they are still there...
     
  10. Novum

    Regular

    Joined:
    Jun 28, 2006
    Messages:
    335
    Likes Received:
    8
    Location:
    Germany
    Not sure what you mean. At least for Intel the "FPU" is part of the same scheduler logic as Integer, so it's not even remotely a discreet part.
     
  11. nutball

    Veteran Subscriber

    Joined:
    Jan 10, 2003
    Messages:
    2,492
    Likes Received:
    979
    Location:
    en.gb.uk
    That's the point. At one point in history it was a separate ASIC (the 8087).
     
  12. Nick

    Veteran

    Joined:
    Jan 7, 2003
    Messages:
    1,881
    Likes Received:
    17
    Location:
    Montreal, Quebec
    A GeForce 4 wasn't good at physics at all. That's when Ageia was founded. GPUs became good at physics purely because graphics itself evolved toward generic computing. That evolution hasn't stopped, and at the same time CPUs are becoming good at graphics.

    AVX-512 offers eight times more FLOPS per core than what most software renderers are currently using. Replacing the integrated GPU with more cores would practically double that, while TSX reduces the synchronization overhead. AVX-512's gather support, its 32 registers, and its optional exponential instructions should also make a significant difference. And if extended to 1024-bit, they could execute them in two cycles to help hide latency and remove front-end bottlenecks while saving some power.

    So CPUs can become really, really good at graphics. And the possibilities for new APIs and algorithms are endless.
     
  13. Nick

    Veteran

    Joined:
    Jan 7, 2003
    Messages:
    1,881
    Likes Received:
    17
    Location:
    Montreal, Quebec
    I was five years old when the 80387 was launched. Of course I remember it. Vividly.
    Sure, but there is no area on any chip to date which can under any definition be designated as the Physics Processing Unit. It has completely vanished. That doesn't mean the functionality itself is lost. It just moved to software that is executed on programmable cores. Likewise I claim that in the distant future we will no longer be able to distinguish the GPU, and all functionality will move to software.

    The FPU's history shows us that functional units and instructions do survive unification. We can observe that GPU-like SIMD units and their associated instructions are also finding their way into the CPU cores, and AVX-512 appears to be the next big evolutionary step. Everything else you might desire for graphics can be added as relatively generic instructions as well.
     
  14. Blazkowicz

    Legend

    Joined:
    Dec 24, 2004
    Messages:
    5,607
    Likes Received:
    256
    Some dedicated hardware is still there, for some years there's been that craze for h264 decoders in graphics card, then cell phones and we're almost done with it.. Because most every hardware integrates it by now. Not still sorted out as linux is seeing nascent support for them in open source drivers for AMD/ATI and nvidia cards/chipset/APUs.
    And then we'll need h265 and/or VP9 decoding (unless everyone sticks to h264)

    That's an example, not the whole grand future of things but look at what we find inside a phone SoC or even a PC CPU : CPUs, GPU, image processor (tons of phone/embedded chips but also Intel QuickSync), audio codec and other audio-related DSP, video decoders, video encoders, hardware blocks dedicated to software radio, crypto accelerators, TCP/IP off-loading (in e.g. Gb and 10Gb ethernet interfaces).

    It's even increasing : Moore's law give more transistor benefits than power benefits so in critically power/battery limited chips you have a lot of "dead silicon" i.e. units that are turned on occasionally, and then power gated for several milliseconds or more (down to turning off CPU cores, or an entire CPU in the 4+1 or 4+4 arrangements)

    The HSA foundation basically exists to promote stuff working better together.
    It's not necessarily incompatible with stuff also getting more generic, i.e. graphics cards gained GPGPU abilities at the same time their video decoding abilities were increasing (and soon shaders were doing scaler/filter things)


    Regarding external FPU example : we ended with a FPU inside the CPU, rather than the CPU becoming exceptionnally strong at emulating it with integer code. [/edit: well you're making that same point, Nick]
    So now we have the GPU getting inside the CPU (which even brings the h264 decoder and stuff in) but not quite getting games run on software renderer yet.

    Maybe they will fusion a bit more, so we end up with those 512bit wide CPUs doing graphics duties. But who knows, there may be texture filtering units built right in the CPU pipeline, S3TC and friends decompressors or whatever critical stuff is needed.
     
    #294 Blazkowicz, Jul 31, 2013
    Last edited by a moderator: Jul 31, 2013
  15. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    832
    Likes Received:
    505
    If you look at the evolution of SOCs, you see a very different story.
    Look at the floorplan of the dies, a lot of silicon area is dedicated to specialized non CPU units.
     
  16. Nick

    Veteran

    Joined:
    Jan 7, 2003
    Messages:
    1,881
    Likes Received:
    17
    Location:
    Montreal, Quebec
    Those are all excellent examples of I/O computing. There's no strong need to unify them, because there's no data locality between the components. The data flows in one direction, in or out. In effect, it's not collaborative heterogeneous computing.

    This is also why I think discrete graphics will survive for a fairly long time to come. Things like conditional rendering (based on occlusion query results) and the ability to spawn new tasks on the GPU contribute to making graphics unidirectional. That said, there are many more motivations for unification than to benefit from data locality. The GPU's vertex and pixel processing will never split up again, despite the unidirectional dependency. Making the GPU more independent, requires making it more CPU-like, which attracts more non-graphics workloads, which calls for closer integration...

    So it's important not to draw the wrong parallels. Granted, my PPU example was about highly interactive collaborative generic computing so unification was clearly inevitable, but I'm just trying to illustrate that there's a (sliding) scale to these things and multiple arguments are at play. Low-end graphics is slowly but surely approaching unification.
     
  17. Nick

    Veteran

    Joined:
    Jan 7, 2003
    Messages:
    1,881
    Likes Received:
    17
    Location:
    Montreal, Quebec
    Don't mistake that for an increase in heterogeneity. They're just bringing what used to be external chips, onto the same die. So you should look at the entire system. This is convergence, not divergence!

    Depending on the type of processing being done, this may lead to unification or not. Pure I/O functionality won't unify, but for all intents and purposes the FPU did unify. Graphics is somewhere in the middle of this scale. It evolves slowly, but the forces of convergence are stronger than those to keep things separate, and it's getting stronger with every generation.
     
  18. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Even though there's definitely a trend towards it, graphics is not just compute. Rasterization and texture filtering are two examples that come to mind, where dedicated units seem to compare favourably versus generic computation core usage.

    You might say of course that both techniques themselves are clutches that will be done with in the near future, but that's been proposed for quite a while as well and has yet to catch on in commercial production.
     
  19. Nick

    Veteran

    Joined:
    Jan 7, 2003
    Messages:
    1,881
    Likes Received:
    17
    Location:
    Montreal, Quebec
    Not having dedicated units doesn't mean there shouldn't be sufficiently specialized hardware for common graphics operations!

    They should just be generic SIMD operations. Texturing is little more than a generic mipmap LOD calculation, a generic texel address calculation, a generic gather operation, and a generic filter operation. All of this can and has been done in shaders already. Likewise programmable rasterization is currently a hot topic in graphics research.

    Now why would someone replace a fixed-function pipeline that can do all of these operations in parallel and which produces one filtered texel per cycle, with programmable units that take multiple instructions? For the same reasons why this happened to the fixed-function vertex and pixel pipelines! Shaders can have hundreds of instructions, which are even broken up into scalar operations these days. But this is acceptable because instead of a few deep pipelines we now have a massive number of shallow ones. This has enabled programmability and all the goodness it has brought to the world of graphics, at a small cost that was well worth the added flexibility.

    Programmability also eventually improves performance, especially for complex algorithms. It would be unthinkably slow or even practically impossible to try to implement some of today's graphics algorithms with fixed-function pipelines. Also, some of the classic graphics pipeline stages are more often than not disabled, and can be replaced with programmable operations which overall is a more efficient use of the hardware. In other cases fixed-function hardware is a bottleneck, which would disappear if all cores were capable of the necessary basic operations.
    I am not denying that this will take a while. The bandwidth wall cometh but several viable solutions exist to keep scaling it a little while longer. Also keep in mind that this is a binary thing, like vertex and pixel processing unification: there's nothing in between. So just because you haven't seen anything in commercial products yet doesn't mean it's not getting nearer.
     
  20. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,563
    Likes Received:
    171
    Location:
    In the Island of Sodor, where the steam trains lie
    A pity you didn't attend SIGGRAPH / HPG where there were a few talks showing just how many times bigger a performance-equivalent programmable unit was compared to the fixed-function implementation.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...