GPU vs Multi-Core CPU

Discussion in 'Architecture and Products' started by epicstruggle, Jul 15, 2006.

  1. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,344
    Likes Received:
    176
    Location:
    On the path to wisdom
    How much supersampling do you think is feasible at the mid-high-end?

    I doubt it. A lot of triangles submitted for rendering don't need to be stored at all. Show me one wireframe shot of a current game where the triangles are really that small.
     
  2. SPM

    SPM
    Regular

    Joined:
    Dec 18, 2005
    Messages:
    639
    Likes Received:
    16
    Well if DSP type CPU cores functioning as GPUs takes off, I can tell you where it will happen first - in mobile phones, mobile phones with movie clip cameras, and HDTVs. These devices will have pretty awesome DSPs to get every bit of compression/decompression possible out of movies, and when you want to play games on them (not the most cutting edge of course), you won't be using the DSPs for decoding/encoding, which means you might as well use them as a GPU rather than add hardware for that.
     
  3. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    I'm not talking about supersampling, just multisampling, and multisampling has the same effect on z-buffer accesses as supersampling (which is the most stark difference between TBDR's and IMR's).


    Well, two things:
    1. You can only effectively remove triangles that are submitted that are either backface culled, or are entirely clipped outside of the view area.
    2. Since one a triangle requires significantly more storage than a pixel, triangles can be quite a bit larger than pixel-sized. Exactly how much would obviously depend upon how many attributes are used.

    Anyway, don't expect any game tests from me for at least another week. I'm off in Santa Fe for a conference until then :)
     
  4. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    No possible way. Mobile phones are exceptionally sensitive to power requirements, and thus you'd definitely want to go for the power advantages of having specialized hardware.
     
  5. SPM

    SPM
    Regular

    Joined:
    Dec 18, 2005
    Messages:
    639
    Likes Received:
    16
    Are you suggesting that having a GPU and a DSP on a mobile phone, is going to consume less power than the DSP on it's own?

    Are you suggesting that DSPs can't be power efficient, but GPUs can?

    Mobile phones have exceptionally high bandwidth cost and low bandwidth requirements for video, so this is the real limiting factor, not battery life. This is also the reason why Toshiba and Sony have been talking about use of Cell in mobile phones (probably one or two special low power SPEs teamed up with a separate embedded low power CPU like an ARM processor).
     
  6. JF_Aidan_Pryde

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    601
    Likes Received:
    3
    Location:
    New York
    A few games are starting to use a fully deferred rendering engine (eg. Stalker). How does this relate/affect deferred rendering hardware?
     
  7. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,344
    Likes Received:
    176
    Location:
    On the path to wisdom
    But multisampling can be an "always-on" feature in the low-end as well, especially on a deferred renderer where it is even cheaper than on an IMR. So that does not increase the "resolution span".

    1. There's more. If you compare the list of submitted triangles to the list of those triangles contributing to the final rendering, you will find triangles missing from the latter for several reasons: outside the view area, backfacing, not hitting a single pixel/sample, hidden by other triangles, rejected by stencil test, and maybe others.

    While it is extremely hard to get the ideal, minimal list of contributing triangles under all circumstances, all of the above reasons can in some way be used to reduce the number of triangles that have to be stored.
    For example, if a game doesn't use a geometry LOD system and a 10,000 triangle object viewed at a distance happens to cover only 20 pixels/samples, at least 9,980 triangles can be safely culled. Of course such a case usually means really bad aliasing and should be avoided.
    There are also methods where an object may be stored but never read, like predicated rendering.

    2. Obviously. And on how much bandwidth deferred rendering can save in other areas. But don't forget that, as triangles get smaller, IMRs lose efficiency as well.
     
  8. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,344
    Likes Received:
    176
    Location:
    On the path to wisdom
    It depends. If the API allows the application to tell the driver that MRTs used to store surface properties are to be read 1:1 later, a TBDR can keep that data entirely on-chip, thus saving a huge amount of bandwidth and memory space. It could even be combined with multisampling.

    A Z-first pass hurts a TBDR because it means doing the same work twice. However it should be very easy for any application to skip such a pass.
     
  9. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    Absolutely not. What I'm suggesting is that you just can't make generalized hardware be as power-efficient as specialized hardware. If you want to make the DSP to the graphics as well, you're either going to have very subpar graphics (which sort of defeats the entire point), or have to have a much more powerful DSP. And not only that, but dedicated hardware can do much better at managing things like memory bandwidth than specialized hardware.

    The primary concern with dedicated hardware, of course, is that parts that are not in use may still draw power. For this you need to have good power savings circuit design, but the technology is already there.
     
  10. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    I don't think you can possibly do the stencil/depth tests while binning the triangles, except perhaps in a very gross sense (Hierarchical-Z, for instance). To do so would require you to have an external z-buffer. Your comment about sub-pixel triangles makes some sense, but due to the aliasing inherent in that, I don't think that's something that IHV's should seek to optimize for.

    Yes, but the things that cause a loss of efficiency in IMR's with small triangles will cause a similar loss in efficiency with TBDR's (ex. you need to have a quad to calculate the partial derivatives for texture coordinates).
     
  11. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    You need to do a Z-first pass for stencil shadows, though.
     
  12. JohnH

    Regular

    Joined:
    Mar 18, 2002
    Messages:
    595
    Likes Received:
    18
    Location:
    UK
    No more gross than the hierachical Z used on any modern IMR.

    Only a low resolution version, this may even easily fit on chip depending on tile size.

    Its actually the culling triangles that don't cross sample points, this does not cause aliasing as by definition they are never rasterised.

    That isn't actually a given, there is some dependency on the arrangement of your pipeline and the presence of certain other provisions.

    Cheers,
    John.
     
  13. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,344
    Likes Received:
    176
    Location:
    On the path to wisdom
    Hierarchical Z doesn't require an external Z-buffer on IMRs.

    But if you have a well working geometry LOD system you shouldn't have much trouble with very small triangles requiring too much bandwidth.
    Sub-pixel triangles do happen, unfortunately, and "optimizing" for them can be cheaper than keeping them. Degenerate triangles need to be detected as well (since many people use them to generate long triangle strips).

    For the rest, see JohnH's answers.


    It's not the culling itself that causes aliasing, but the fact that only one (almost randomly selected) of many small triangles contributes to the final image. By ignoring part of the information you get "sampling holes". The only way around this is massive supersampling or a geometry LOD system.
     
  14. JohnH

    Regular

    Joined:
    Mar 18, 2002
    Messages:
    595
    Likes Received:
    18
    Location:
    UK
    If a triangle doesn't cross a sample point its not rasterised anyway so optimising it out doesn't cause holes to appear that wouldn't have been present anyway. This behaviour comes from clearly defined rasterisation rules, any resulting holes are actually the result of an incorrectly defined mesh (generally as a resultof non shared, non exactly = vertices on common edges).

    Regards,
    John.
     
    Jawed likes this.
  15. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,344
    Likes Received:
    176
    Location:
    On the path to wisdom
    I'm not referring to holes in the rendered geometry, but "sampling holes" as in the distance between two samples being too large in relation to the frequency of the sampled signal, i.e. geometry. A highly complex mesh covering only a few pixels (sampled only at a few points) often results in aliasing. As I said, it's not the culling that causes aliasing.
     
  16. JohnH

    Regular

    Joined:
    Mar 18, 2002
    Messages:
    595
    Likes Received:
    18
    Location:
    UK
    The only way of fixing that is to increase your sample rate i.e. apply AA, maybe I've missed Chalnoths original point, but it still remains valid to cull non sample crossing poly's.

    Later,
    John.
     
  17. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    Hence the word, "except."

    Degenerate tris are easy. But to omit triangles that don't cross any sample points, but are within the view area would require a significant amount of computation per-triangle, for information that would most likely have to be thrown away.

    Of course there is. But it's not like you don't have these problems in a TBDR as well. It's all about optimizing for small triangles, which is going to have to happen either way.
     
  18. JohnH

    Regular

    Joined:
    Mar 18, 2002
    Messages:
    595
    Likes Received:
    18
    Location:
    UK
    Not true, its actually very cheap to discard the bulk of small non sample point crossing triangles.
    There are techniques that require HW that already exists one form in certain tilers, not to say they can't be done on an IMR the relative area cost is just higher.

    John.
     
  19. epicstruggle

    epicstruggle Passenger on Serenity
    Veteran

    Joined:
    Jul 24, 2002
    Messages:
    1,903
    Likes Received:
    45
    Location:
    Object in Space
    Is it safe to assume that Amd noted this thread, and seeing the threat decided that Ati needed to be bought? I think so. :cool:
     
  20. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    Of course. You didn't know that we here at B3D control the entire industry?
     
    Geo likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...