Software/CPU-based 3D Rendering

Discussion in 'Rendering Technology and APIs' started by 3D_world, Oct 28, 2012.

  1. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,796
    Likes Received:
    2,054
    Location:
    Germany
    From what I've gathered not so much in application research but from common sense is that processor designs are much more power focused than area focused. Take for example Nvidia dropping their hot clock: Area-saving feature vs. power saving feature.

    With even smaller process geometries, you get an increasing amount of transistors per area (in effect: per dollar), but less improvement in calculations per watt. So, unless you're in a business that does not care about power (yet), you might already be designing to meet specific power targets rather than specific area targets. IOW you deliberately spend more area because you know that your processor cannot switch all it's gates (in the cores) at once in your power target anyway.
     
  2. Nick

    Veteran

    Joined:
    Jan 7, 2003
    Messages:
    1,881
    Likes Received:
    17
    Location:
    Montreal, Quebec
    That's comparing apples and oranges. Of course a unit which can do anything is much bigger. A fixed-function units is a waste of hardware except for the times you can have it do the one single thing it was designed for, and can be a bottleneck at other times.

    So what was your point anyway? Everything should be fixed-function and GPUs are heading in the completely wrong direction?

    Did those talks take SIMD into account? It makes programmable units many times more powerful, at a relatively low hardware cost. It is what has made it perfectly feasible for GPUs to become highly programmable thus far. Also, a lot of things can be optimized with a handful of new instructions. The area required for these new instructions is often negligible when you already have the rest of the programmable cores as a framework.

    There's no point in having smaller units anyway. Today's GPUs have massive amounts of fully programmable computing power, but they start running into the bandwidth wall. Future architectures can have lots more computing power, and will be completely bandwidth limited. So you can choose between programmable functionality and be bandwidth limited, or fixed-function and be bandwidth limited. Not a hard pick.
     
  3. Nick

    Veteran

    Joined:
    Jan 7, 2003
    Messages:
    1,881
    Likes Received:
    17
    Location:
    Montreal, Quebec
    I didn't say area is an issue.
    None of this implies that fixed-function is the solution.

    You're right that calculations per Watt doesn't increase at the same rate as transistor count, but it still increases at a very substantial rate. In particular, it increases faster than bandwidth, at all levels. And this brings us back to power. Moving data around costs more power than performing operations on it. So no matter how power efficient a fixed-function unit is, sending data to it from a programmable core and back eventually costs more than performing the same operations with a few instructions.
     
  4. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,796
    Likes Received:
    2,054
    Location:
    Germany
    I was under the (laymans) impression, that fixed function hardware is more efficient in almost every way except area than fully programmable units.

    Moving data around in a programmable core seems also not unproblematic, since you'll be moving data from ultra-high-speed registers to cache and vice versa (at the said ultra high speeds). With texture data whose latency can be hidden, you only need 1/4th of the data to move into the programmable cores (i.e. the bilinear filtered sample instead of the raw data) at the very energy intensive speed.

    But surely there are more knowledgeable people than me to judge this. I'll just lean back and look at the trends in current power efficient hardware and upcoming generations.
     
  5. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,432
    Likes Received:
    261
    Fixed function hardware should be more area efficient. Assuming you can power the programmable hardware a primary tradeoff becomes how often is the function needed.
     
  6. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,486
    Likes Received:
    396
    Location:
    Varna, Bulgaria
    GPUs have steadily ditched many dedicated hardware blocks for various functions and effects, to keep only the most critical stages of the pipeline we still see today -- primitive setup and rasterization, texturing, pixel writes/blends, etc. Functional units are cheap now, with the advancement of the manufacturing process and the billions of transistors that can be crammed in a midrange chip, maintaining custom hardware logic is not a problem. Data movement and availability is what makes or break a design now. Anyone -- big and small -- can do fine grained power and clock gating of different parts of a chip, depending on the load type, but few can overcome the bandwidth limitations and the power drain of moving too much data in and out.
     
  7. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,796
    Likes Received:
    2,054
    Location:
    Germany
    I meant from a global point of view: FF units add area to the whole chip, most of the time more so than enabling their functionality inside of programmable units.
     
  8. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,325
    Likes Received:
    93
    Location:
    San Francisco
    GPUs haven't "ditched" any significant FF HW in ages.
    Several works at HPG/SIGGRAPH this year actually show renewed interest in adding FF HW from a few hardware/IP vendors, to not mention the ever increasing amount of FF HW present on mobile SoCs.
     
  9. Nick

    Veteran

    Joined:
    Jan 7, 2003
    Messages:
    1,881
    Likes Received:
    17
    Location:
    Montreal, Quebec
    The texture units are also fed from a cache. So you can't reduce the overall data movement by shifting the problem to a fixed-function unit.

    In several years from now for any given TDP you can have more programmable computing power than what you can feed with data. So there's no point in having fixed-function units. It may still sound reasonable now, just to increase the battery life even more or reduce the cost, but in the future it would be as insane as suggesting to go back to a Direct3D 7 architecture for the same reasons. Fully programmable hardware enables new possibilities that will eventually make current hardware look very restrictive.
     
  10. Nick

    Veteran

    Joined:
    Jan 7, 2003
    Messages:
    1,881
    Likes Received:
    17
    Location:
    Montreal, Quebec
    I wouldn't call it ditching, but the fast evolution from fixed-function OpenGL ES 1.x to highly programmable OpenGL ES 3.0 does mean that proportionally ever less hardware is spent on fixed-function units. Project Logan, aka Mobile Kepler, also shows that desktop-level functionality isn't a major issue for power restricted designs.

    And like I said before, a lot of this is binary in nature. You either have the fixed-function unit, or you don't. So any evolution toward "ditching" a specific fixed-function unit will appear sudden. For instance the utilization of MSAA hardware has gone down steadily in recent years due to many image-based anti-aliasing techniques. Those use programmable shader cores instead (and they often want unfiltered, non-mipmapped, non-perspective texture samples). So sooner or later that dedicated hardware will 'suddenly' disappear and the programmable cores will be made more suitable to take over this functionality.
    That's no different from any other year. Researchers love to think that their technique is more important than anyone else's and deserves dedicated hardware. But you can't cater to all of them. So eventually they have to settle for an efficient shader-based implementation. If they're lucky they get a few new instructions.

    What also has to be taken into account is that hardware manufacturers are scrambling to find the next 'killer app' that makes people want to buy a big piece of dedicated silicon. Integrated graphics is gaining market share, and with the CPU's throughput computing power skyrocketing it's only a matter of time before the integrated GPU gets unified. Unless GPU manufacturers can find something that even a million programmable units with specialized instructions can't handle and which consumers will value dearly...
    That's not an increase.
     
  11. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    14,815
    Likes Received:
    2,231
    Note to self :
    Must harass Nick to add directX 5/6/7 support to Swift-Shader (especially 16bit dithering)
     
  12. Rakehell

    Newcomer

    Joined:
    Jul 25, 2013
    Messages:
    10
    Likes Received:
    0
    And GPUs are becoming even better at graphics than that. In addition, they're becoming better at things that were once CPUs tasks like physics and video encoding.
     
  13. Nick

    Veteran

    Joined:
    Jan 7, 2003
    Messages:
    1,881
    Likes Received:
    17
    Location:
    Montreal, Quebec
    Sure, but there will be a point where the difference between good and better no longer matters. We could have had an Ageia PhysX card that became even better at physics than the GPU has become, but it still went the way of the dodo because any minor advantage in efficiency didn't justify the cost.

    The same thing is happening to graphics, albeit much more slowly. In fact the GPU is converging towards a more CPU-like architecture, so becoming "even" better at graphics than the CPU still means the gap is closing.
    That's just a consequence of graphics becoming more like generic computing. I mean, when two programmable architectures converge then for a while there is bound to be some overlap in the applications they run. It doesn't mean one has become superior to the other. Note that not all physics and video encoding is now happening on the GPU. The average consumer GPU is pretty low-end and doesn't outperform the CPU at these tasks.

    So once again this is a case where becoming "better" at something just indicates convergence. It will eventually lead to unification.
     
  14. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,796
    Likes Received:
    2,054
    Location:
    Germany
    I am not arguing your last couple of sentences, but some before.
    While I also think it is true, that you can have more math or programmable computing power than you can feed with data, I would tend towards a shift in paradigms as well. And I won't type the same area vs power sermon for the nth time, because of...

    (my bold) Isn't that stuff in CPUs used for video encoding called QuickSync and mainly composed of FF hardware?
     
  15. Nick

    Veteran

    Joined:
    Jan 7, 2003
    Messages:
    1,881
    Likes Received:
    17
    Location:
    Montreal, Quebec
    The argument I was responding to is "[GPUs are] becoming better at things that were once CPUs tasks like physics and video encoding". QuickSync doesn't support that. Nor does it support the opposite for that matter...

    Allow me to quote my previous answer to video decoding and such using fixed-function hardware: "Those are all excellent examples of I/O computing. There's no strong need to unify them, because there's no data locality between the components. The data flows in one direction, in or out. In effect, it's not collaborative heterogeneous computing."

    So I'm not opposed to fixed-function hardware. All I'm saying is that programmable computing is converging and will eventually unify and enable new possibilities. The GPU has some remaining fixed-function hardware for graphics that makes this non-trivial, but it is becoming underutilized, its use cases are diversifying, and it can be efficiently replaced by specialized instructions. So that's not going to stop this convergence of programmable computing.

    That said, QuickSync might also be a transitional thing. Note that MP3 decoding/encoding used to benefit from fixed-function hardware, but now the CPU can easily handle any (consumer) audio processing. Also, many new codecs have emerged that make dedicated hardware a waste and call for programmability instead. The same things are happening with video. It used to take a lot of CPU power and there were only a few codecs so it made sense to add dedicated hardware to offload the CPU for the most important use cases. Eventually it will also only require a fraction of CPU power, so the dedicated hardware can be dropped.
     
  16. HAL-10K

    Newcomer

    Joined:
    Jul 28, 2002
    Messages:
    32
    Likes Received:
    0
    Interactive graphics will always be limited by computing resources.

    I think that we will already know in this decade what rendering principle is most efficient and practical for infinitely scalable graphics rendering with classical computer architectures.
    Additionally we will have to face the definite and abrupt end of the historic pace of falling prices for computing performance in microelectronics within two to six years.

    Thus, the future undoubtedly belongs to fixed function hardware.
     
  17. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,560
    Likes Received:
    157
    Location:
    In the Island of Sodor, where the steam trains lie
    ...and the biggest one, for the foreseeable future, is likely to be power.
     
  18. Nick

    Veteran

    Joined:
    Jan 7, 2003
    Messages:
    1,881
    Likes Received:
    17
    Location:
    Montreal, Quebec
    Which itself is limited by bandwidth.
    Yes, the CPU and GPU won't unify if process technology stops scaling or if the moon crashes into earth. Neither of those things are likely to happen in the foreseeable future though. A massive industry depends on it, with far-reaching consequences if it were to grind to a halt. Some might prefer the moon crashing into earth instead. Ok, I'm obviously being overly dramatic there, but I'm just trying to point out how terrible your argument is, in every way. A lot of things can be claimed not to ever happen if figuratively something 'fell out of the sky'.

    Of course the laws of physics won't change just because a trillion dollar industry wants them to. But the end of semiconductor scaling has been claimed many times before and has never come true. It's like doomsday predictions; you might expect something to happen, but the day after ends up being just like the day before. People don't seem to learn from that though and keep coming up with new fatalistic events that will happen within their lifetime. Although the earth will certainly eventually stop turning, in all likelihood it won't happen in an event that is 'visible' to us. To get back on topic, Pat Gelsinger said something very similar about silicon scaling: "We see no end in sight and we've had 10 years of visibility for the last 30 years."

    As long as you have a decade of visibility, it is highly likely that after this decade has passed, you have many more years of continued scaling. If all research labs in the world came to a conference empty-handed, that's when you have to start worrying. We're nowhere near that. Worst case we're seeing that the progress has slowed down a little. But it's worth noting that we're ahead of the curve for several things, so it's to be expected that thing slow down until we're back to a more natural pace, especially in today's global economic climate. And even if the slowing down is an early sign of a gradual tapering off, we'll still have several decades of very significant progress ahead of us.

    Which brings me to how much time might be required for the CPU and GPU to unify. If hypothetically the integrated GPU was dropped, an 8-core CPU with AVX-512 could be pretty mainstream at 14 nm, and would be capable of 2 TFLOPS. That would really push the limits of what the DDR4 bandwidth will be able to help sustain. That's plenty for doing all graphics on the CPU, for the low-end market. And while 14 nm certainly is too soon for something this disruptive to happen (because there's much work left to be done), the basic building blocks are theoretically available and we'll have 10, 7 or 5 nm by the time it's really the next evolutionary step.
    So we're going back to DirectX 7 graphics?

    Undoubtedly not. Programmability is a must since more computing power is worthless unless you can do something the previous generation couldn't, while still retaining legacy capabilities. And you can't cater for one technique because chances are that developers want something different and your fixed-function block becomes a waste of money. But we can have the best of both worlds with 'specialized' instructions. This means unification can happen while still getting most of the same benefits that fixed-function hardware offers. Determining instructions that can do a maximum of useful work while at the same time being generic enough to be useful for a diverse range of applications, is the real challenge going forward.
     
  19. ninelven

    Veteran

    Joined:
    Dec 27, 2002
    Messages:
    1,699
    Likes Received:
    117
    Even if you are not limited by bandwidth you are going to be limited by die size and cost.
     
  20. HAL-10K

    Newcomer

    Joined:
    Jul 28, 2002
    Messages:
    32
    Likes Received:
    0
    Bandwidth will always be a problem, but I think the latest FPGAs demonstrate that internal and external data bandwidth can be increased dramatically with modern IC manufacturing technologies.

    The slowed pace of performance/price scaling for IC manufacturing is already an established fact of reality right now and will naturally increase exponentially with time. What Intel says are just sweet unspecific words to keep investors and their personal pride calm.

    Why would you want to stall a gigantic serial processing pipeline with massive parallel streaming/processing? Right! It makes no sense at all.

    There is already barely any cost advantage per transistor from the 32/22 nm step.

    You should stop thinking about the classical OGL/D3D rendering pipeline. Computer graphics rendering is no magic that needs endless iterations of new approaches.
    The list of all rendering problems can put on a single page and the list of unique concepts to solve them can be condensed to a couple lines.


    Btw.: Tim Sweeney should STFU and realize that he never published/introduced a novel solution for interactive graphics. He says so many obviously wrong things that my eyes are bleeding.
     
    #320 HAL-10K, Aug 31, 2013
    Last edited by a moderator: Aug 31, 2013
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...