IMR "Wall" Limits V PVR

Discussion in 'General 3D Technology' started by PVR_Extremist, May 21, 2002.

  1. Althornin

    Althornin Senior Lurker
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,326
    Likes Received:
    5
    Hence the word "trying" in MFA's post.... :)
     
  2. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    DreamCast

    The DreamCast was a platform where every single developer knew they were coding for a deferred architecture and yet the games developed did not blow away games on IMR systems. You can't blame it on the developers.

    The IMR systems to date have been bottlenecked in other areas such as fillrate and geometry performance. It's all well and nice that they don't need 500 Mhz DDR to work, but they skimped on the fillrate and T&L.


    Developers can't very well push 20x overdraw/multipass with massive architecture if the CPU/T&L/fillrate of the unit can't handle it.

    But of course, many people have been harping on this for years while all the naysayers bashed IMR vendors for boosting fillerate and wasting efforts on T&L.
     
  3. Althornin

    Althornin Senior Lurker
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,326
    Likes Received:
    5
    Re: DreamCast

    dont you mean "the TBR/Deferred rendering systems to date...."
    And if so, then i mostly agree with what you have to say.
     
  4. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,062
    Likes Received:
    1,021
    To adress the original question, there obviously doesn't seem to be any hard limits. The performance development has been quite predictable, with greater increases when the memory subsystem has gotten a factor of two architectural boost.

    Generally speaking, graphics is well suited to parallell processing which would seem to point us in the general direction of tilers, though not necessarily deferred renderers.

    Looking at the trends of game graphics, we see
    1. Increased polygon count
    2. More complex environment = more overdraw
    3. More work per pixel

    1 would seem to favour IMRs, 2 would seem to favour DMRs and 3 could go either way with a theoretical favour for DMRs but with problems too.

    (As usual, programmers will adapt to the limitation of the platforms available, so for DMRs to take over the market, they have first got to outperform IMRs on their own turf so to speak, as IMRs set the standard. But that is market dynamics, not technology.)

    As has been pointed out, memory bandwidth would seem to be the factor that places the upper bound on IMR performance. (And for that matter DMRs, but at a slightly different point and for slightly different reasons. Data flow is _always_ limitid by bandwidth. doh.) This will be the year when 256-bit DDR takes off, we can expect the usual clock ramps, and we have 4-bits-per-pulse tech waiting in the wings if necessary. Extrapolating, this should take us to a nominal 100GB/s within five years or so. Not too shabby, but not too exciting either, as it is only a factor of five after all, and the estimate is not pessimistic. However, that is time enough for GPUs to be able to carry sizeable amounts of memory on-chip, which is one way of to reduce the dependance on off-chip memory bandwidth.

    The problem for IMRs is that the bandwidth development is still pretty slow compared to the overall performance increases we could envision in that time frame. So we need to get smarter with how we use it, and indeed we are, using different techniques to both reduce unecessary rendering examplified for instance by HyperZ, and to reduce unnecessary polygon load with Matrox's depth adaptive tesselation as the latest but certainly not last example. Peering deeply into the crystal ball in order to predict the farthest front of technology (five years or so) we should be able to expect doubled rendering performance every year during that period.

    (So extrapolating from the latest benchmark/demos Commanche4 and CodeCreatures, in five years we will be able to marvel at large numbers of nicely modelled static trees rather than either or. Oh joy.)

    Deferred rendering is attractive due to the fundamental reasonableness of only rendering what is actually seen. But it doesn't remove all bottlenecks, and introduces some extra work of its' own, and the real question is whether the bottleneck it removes is so much more limiting than the next bottleneck down the line + DMR overhead.... If not, extending and improving IMRs may be more practical in an application environment where IMR limitations are taken into account in graphics engine development and applications.

    Entropy
     
  5. SA

    SA
    Newcomer

    Joined:
    Feb 9, 2002
    Messages:
    100
    Likes Received:
    2
    Probably the major memory bandwith advantages for tiling comes from the locality of reference of the depth and frame buffer information more than from deferred rendering as such. Since the depth and frame buffer information for a tile can fit entirely on the chip, the z's and frame colors stay on chip for all the depth and color computations. This allows very high on-chip memory bandwidth to be used much like edram solutions, only without the large on-chip memory requirements.

    Deferred rendering is an added bonus for memory bandwidth since it primarily reduces texture bandwidth which is generally less intensive at the moment. In the future, it will eliminate wasted pixel shader computations which will become critically important.

    However, by using a combination of compressed z's, hierarchical z buffering, and multiple z checks per pixel, combined with application driven deferred rendering (an unshaded pass followed immediately by a shaded pass), IMRs get almost all of the memory bandwidth savings of a deferred rendering tiler without any of its problems (API incompatibilities, etc.).

    In the future, z queries will help reduce memory bandwidth even more (though they work equally well for both TBRs and IMRs)

    The memory bandwidth "wall" is a bit illusory. There are many memory bandwidth technologies yet available. 256 bit buses are currently popular. Embedded RAM of one type or another is still a bit off but holds a lot of promise. MCM's open up many possibilities. Frequencies continue to climb. Better caching mechanisms, especially for geometry are on the horizon. On chip tessellation and displacement maps will also help in the geometry bandwidth department.

    Chip designers forecast as best they can what the technology and cost structure of memories will be like when their design is built a couple of years out. Different 3d vendors take different memory approaches, but they all create a design that provides the memory bandwidth to meet their goals, using whatever they think is going to be the least expensive and best approach at the time of product launch. That's their job of course.

    If any remember my posts of the past, they know that I like tilers. However, it is no coincidence that all of the major 3d hardware vendors at the high end including Nvidia, ATI, 3dlabs, and Matrox all use immediate mode renderers. Their engineers are all very aware of the tradeoffs between tiling architectures and IMRs and they have chosen IMR's for a reason. So it may seem that I prefer IMRs. I do not. I simply prefer whatever works best. Other than that I have no preferences either way. If TBRs really are the fastest solution and can produce the highest quality, high-end 3d graphics then they must demonstrate it with purchasable products the way IMRs have been doing for some time.

    I think if TBR was so clearly the way of the future the way high precision color, programmability, and high quality AA are, then vendors would have pushed for the API changes to really support it long ago and would now all be using it. The fact that all the largest players in the market have not done this means their engineers feel there are better alternatives, and until there are purchasable products to demonstrate otherwise, they have not been proven wrong.

    I for one would really like to see a fully maxed out TBR with all the pixel pipelines, external memory bandwidth (plenty of this is still needed of course), programmability, vertex shader performance, high quality AA, etc. needed to fully show what the approach is capable of. A real contender on the TBR side would be interesting to say the least.
     
  6. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    6,807
    Likes Received:
    473
    High quality AA is the future? More like the past repeating itself :) (Warp5)
     
  7. Jerry Cornelius

    Newcomer

    Joined:
    May 5, 2002
    Messages:
    116
    Likes Received:
    0
    This argument only holds so much water. Look at EAX and A3D, Beta and VHS, 4 stroke engines with pushrod valvetrains etc...

    I think sometimes the first thing with it's foot in the door get's all the marbles. Once it becomes accepted and understood it's a high risk to depart from the norm and do something else, especially when you have to sell it.

    I don't know what "the future of 3D rendering" is but it's a safe bet it's going to involve realtime shadows and lighting. ONce ray tracing get's into the "picture" scene capturing will be inavoidable. Once you have that you may as well have a tile based deferred rednerer.
     
  8. gking

    Newcomer

    Joined:
    Feb 9, 2002
    Messages:
    130
    Likes Received:
    0
    I haven't given this much (any) thought, but I wouldn't be surprised if homogenous recursive descent rasterization introduces some added difficulties with deferred rendering.

    I'm sure there is a deferred implementation that could work with unprojected, unclipped geometry; however, it probably wouldn't be very fun to implement in hardware (like I said -- I haven't given this any thought, so if you know of an algorithm to do this, I'd love to see it).

    Most future engines will probably use a multipass technique like Doom III in order to avoid running costly shaders on occluded pixels -- render just Z in one pass, and then do all lighting and shading in subsequent passes. The big loss with this technique is geometry throughput; however, we're rapidly approaching a point where burning thousands/millions of (untextured, unlit) triangles to save fillrate is a given, since vertex throughput is so high (and cheaper to add than pixel throughput).

    I like having occlusion query capability; however, games use their visibility information for *so much* more than just rendering (i.e., AI, collision detection, physics, sound, etc.) that adding occlusion query capability to graphics cards isn't going to revolutionize game engines. If you're careful about pipeline stalls and flushes, it does improve performance.

    Yes, but that "technological wall" is also shared by deferred renderers. Even with fancy Z-rejection circuitry, for any given scene, you will need to be able to fill log2(n)*resolution pixels (n is the average overdraw for the scene) every frame. In comparison, ray-tracing doesn't have this requirement. The common argument is that as depth complexity and resolution continue to increase, the added per-fragment cost of doing ray tracing is more than made up for by the fact that you only need to trace 1 ray/fragment. Multi-pass visibility algorithms like Doom III's help skirt this wall; however, many people have argued that ray-tracing hardware will be a necessity, since all Z-buffer hardware is subject to the same theoretical shortcomings.
     
  9. psurge

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    939
    Likes Received:
    35
    Location:
    LA, California
    Ok so assume 1 ray per fragment. IMG already achieves this with PowerVR (re: your comment on deferred renderers needing to fill log2(n)*res fragments per frame).

    If you mean ray-tracing as in shadows, reflection, and refraction, then this means multiple rays per fragment AFAICS.
     
  10. gking

    Newcomer

    Joined:
    Feb 9, 2002
    Messages:
    130
    Likes Received:
    0
    Deferred renderers still need to perform all the depth tests -- there isn't much that a deferred renderer offers over a multipass technique such as Doom III's.

    And WRT shadows, reflections, etc -- each of those effects is an additional 1 ray/fragment (per layer of reflection/refraction), as opposed to an expected log2(n) using a Z-buffer renderer. There is a logarithmic advantage to using ray tracing (over both deferred and immediate mode renderers); however, the constant cost is so high that Z-buffering is still advantageous.
     
  11. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,420
    Likes Received:
    179
    Location:
    Chania
    Dumb layman's question: Both IMR and TBR approaches seem to have advantages as disadvantages. Why not attempt within the realm of possibility to combine both approaches' advantages into one architecture in the future (edram included when it becomes mainstream) trying to overcome as much as possible either sides' disadvantages out.

    From my rather simplistic viewpoint I don't see vendors so far not taking the advantages of defered rendering into account, rather making small steps in the above direction. Someone correct me please if I'm wrong.
     
  12. mboeller

    Regular

    Joined:
    Feb 7, 2002
    Messages:
    922
    Likes Received:
    1
    Location:
    Germany
    You mean something like Fluid-Studio's REVi-3D-engine? I don't know if it is still in development, cause the info is gone, but IMHO they had this sort of 3D-engine in development.
    Link : http://www.flipcode.com/cgi-bin/iotd.cgi?ShowImage=05-27-2000
     
  13. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    6,807
    Likes Received:
    473
    We are rapidly reaching a point where we will start to want to fill a couple of shadow buffers for every frame, for which we will need all the transform power we can lay our hands on.

    Remember, reality is 80 million polygons :) Only a minority of those go directly on screen.

    Fluid-studio's method is an unknown as far as performance is concerned, but its creator does not present it as a fully general way of occlusion culling ... it does need preprocessing.

    Ive said this before, but Ill repeat it anyway ... raytracing does not make sense for first hits and shadow rays. Raytracing might only have shade a pixel once, but it has to test rays against all the surfaces which are potentially visible for a pixel ... compared to a renderer ala Greene's hierarchical Z-buffer paper this will result in almost the same number of tests per pixel. With deferred shading thrown in the mix all raytracing has over the Z-buffer (for first hits and shadow rays) is a slight storage advantage (because it can shade a pixel immediately) and its ability to subsample a scene ... which is only really usefull if you are reusing samples from previous frames (otherwise subsampling == aliasing).
     
  14. PVR_Extremist

    Newcomer

    Joined:
    Feb 7, 2002
    Messages:
    194
    Likes Received:
    1
    80 Million Polygons? I thought reality was death and taxes :roll:
     
  15. pascal

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    1,830
    Likes Received:
    49
    Location:
    Brasil
    The same price range with much better performance is the key to consagrate the deferred rendering idea.
    The technology has already been proved by PowerVR. It works and work very well.
     
  16. psurge

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    939
    Likes Received:
    35
    Location:
    LA, California
    gking,

    Unless I'm missing something, you are comparing multiple depth tests per fragment againts multiple ray-triangle intersections per fragment.

    basically i don't see how ray-tracing a fragment is a constant time operation...

    Regards,
    Serge
     
  17. Ty

    Ty Roberta E. Lee
    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,448
    Likes Received:
    52
    Well not much has changed since the original 3Dfx came out in this regard. That has always been one of the advantages over IMR yet still to this day (trying to get this back topic on track), DRs have not overtaken IMRs. This was one of the original questions that started this thread, "Is there still some kind of "technological wall" which will hamper IMR performance in the future?" which implies that DRs would surpass IMRs (because they would have to rely on expensive memory, etc.). To this day, it still hasn't happened nor does it appear to be happening anytime soon imo.
     
  18. pascal

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    1,830
    Likes Received:
    49
    Location:
    Brasil
    The wall is in front of you right now.
    How to play Doom3 at 1024x768x32 at 60 fps with a $100 card now?

    The technology wall are just avoided by game developers when they develop a new game. Sometimes a developer push a little and the wall appear.
     
  19. Ty

    Ty Roberta E. Lee
    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,448
    Likes Received:
    52
    If the "wall" is here now, then from this point on you are saying that IMRs are going to be surpassed by DRs then. No one mentioned playing Doom3 at that performance level with a $100 card as proof of the demise of IMR though. Or are you implying that a 100 DR card will be able to play Doom3 at that performance level? I'm not sure I understand the reference to it.

    Recapped, the point of the thread was that a long time ago, there supposedly was this memory bandwidth "wall" that would cause IMRs to go away because they couldn't keep up with DRs. It turns out that is no more true today than it was back then which is why Tino asked his questions. In other words, memory and other bandwidth saving techniques have evolved to keep pace with the bandwidth requirements for IMRs and games. I'm not saying that it doesn't exist, I'm just saying that imo, IMRs haven't hit it yet. Maybe soon, maybe not, I don't know.
     
  20. Althornin

    Althornin Senior Lurker
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,326
    Likes Received:
    5
    Do you think at some point we will go away from texturing polys?
    To a world created entirely out of flat one color polygons (if you have enough of them, this is possible!!)
    And would this (if it is the eventual end product) work very poorly on a TBR? because in that situation, geometry, not fillrate, would be king.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...