IMR "Wall" Limits V PVR

Discussion in 'General 3D Technology' started by PVR_Extremist, May 21, 2002.

  1. Jerry Cornelius

    Newcomer

    Joined:
    May 5, 2002
    Messages:
    116
    Likes Received:
    0
    Of course it is. If you're storing the geometry onboard already than the extra space required when compared to an IMR is going to be a percentage of the video memory not a magnitude.

    I don't see HOS presenting a problem either. Since the bounds of the final geometry and the destination output buffer will be known before hand, there shouldn't be any need to generate an entire scene worth of parametric geometry up front.
     
  2. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,902
    Likes Received:
    218
    Location:
    Seattle, WA
    No, that's not true, because, as I said earlier, data in does not equal data out. Quick example: Unreal Tournament 2003. UT2k3 uses prefabs to reuse geometry data. There's an option to disable this "compression" and transform all geometry into world-space at level load time. Vogel said that this takes up to about 16MB in some levels.

    Additionally, don't forget that not all data in the vertex buffers on the card may be visible, and there may be other data that will come straight from the CPU (though hopefully less now with more advanced vertex shaders). In other words, there are too many factors to simply state that scene data will be "a percentage of video memory."
     
  3. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,079
    Likes Received:
    648
    Location:
    O Canada!
    Sorting and Rasterisation occur on the same frame of data on a tile-by-tile basis. Once an entire frame is captured a tile is sorted and that is sent to the rasteriser whilst the next tile is being sorted. So, in reality you only need to capture one frame of data - once tiles have been sorted in a frame that memory space can be reused for the incoming data from the next frame.
     
  4. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,902
    Likes Received:
    218
    Location:
    Seattle, WA
    That doesn't make any sense to me. The incoming data from the next frame operates in an immediate-mode fashion. If the scene buffer data is stored on a per-tile basis, the memory freed will not necessarily be available for incoming data. If the scene buffer data is stored sequentially by incoming data, then it would be freed in a chaotic manner, and therefore pretty much unusable for incoming data.
     
  5. Jerry Cornelius

    Newcomer

    Joined:
    May 5, 2002
    Messages:
    116
    Likes Received:
    0
    IMO we aren't likely to see levels that make extensive use of complex primitives. It's been tried before and only goes so far.

    I suppose that inherently there are some geometry compression techniques that are not going to work well with scene capturing. In the real world, I doubt this will have much impact anytime soon, if at all.
     
  6. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,079
    Likes Received:
    648
    Location:
    O Canada!
    That much is evident.

    Look, you are not rendering one frame and sorting another.

    The data coming is is captured and arranged in memory via pointer lists of tiles to triangles, once you have a full scene the sorting begins on a tile-by-tile basis and once one tile has been sorted rasterisation of that tile begins; so raterisation is operating one tile behind the sorting. Now, once rasterisation has occured the scene data for that tile is no longer needed so its memory can be freed and reused for the incoming data on the next frame. So, in theory you don't need the full whack of memory for two whole scenes worth, just one (and a bit so you maintain efficiency from the swap from one frame to the next).

    AFAIK they don't resuse the memory on a tile basis, but groups of tiles. IIRC there is a patent available for the scheme.
     
  7. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,902
    Likes Received:
    218
    Location:
    Seattle, WA
    All that means to me is that you still need to double-buffer the post-transform data, as well as hold a little bit of extra data for the tile being rendered and the next tile (or group of tiles...).

    That is, I'm reading the procedure as:

    Transform->Sort->Rasterize

    The way you're describing it, it seems some caching is needed between each stage, and double buffering for optimal performance. I still do not see how Transform->Sort can double buffer anything less than the entire scene, as the scene data comes in in an immediate-mode fashion.

    The double buffering from Sort->Rasterize is, more or less, trivial and should be handled by on-chip caches, as it's only for a single tile or a group of tiles.

    I hope you didn't think I meant that two double-buffers of the whole scene were required...
     
  8. Tessier

    Newcomer

    Joined:
    Feb 9, 2002
    Messages:
    10
    Likes Received:
    0
    Location:
    Hungary
    IMHO for a tile based deferred renderer the rendering pipeline is: transform->index->sort->rasterize.

    And I think you need two buffers both for transformed geometry data and index data.
    The T&L unit writes transformed "frame2" data into the first buffer.
    Indexing unit uses the outputs of the T&L unit, and works parallel with it on the same frame ("frame2").
    At the same time the sorting unit works on another frame ("frame1"), using the already transformed and indexed data.
    Sorting and rasterization work parallel with each other, but not on the same tile (sorting is one tile ahead of rasterization).

    In a tile based system transformation and sorting cannot work on the same frame, as you have to know which triangles fall into the tile you are sorting currently - and you have this information only when all triangles in the frame are transformed.
     
  9. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    I don't think the reason to go to a deferred rendering architecture is to increase bandwidth efficiency. In the future, I think the primarily limit of IMR will be shader execution speed.

    If you have a 100 to 1000 instruction shader, an architecture which can avoid executing long shaders for invisible pixels will be much faster.

    Also, filling depth/stencil will also be a limit for unified lighting algorithms.

    There is a sort of 2-pass deferred rendering technique you can do with IMRs. In the first pass, you draw the depth buffer and store in another render tartet, the parameters you need to input to a pixel shader. In the second pass, you read the parameters from the previous pass, use that to select which pixel shader to run and the parameters. The early-Z culling will take of not executing the long shaders for hidden pixels.
     
  10. no_way

    Regular

    Joined:
    Jul 2, 2002
    Messages:
    301
    Likes Received:
    0
    Location:
    estonia
    Wouldnt it be cheaper do transform->binary search&insert instead, i.e. no separate sorting procedure ? Insert procedure itself is always sorted insert ?
     
  11. darkblu

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,642
    Likes Received:
    22
    erm, what's that sorting for in the first place?
     
  12. Tessier

    Newcomer

    Joined:
    Feb 9, 2002
    Messages:
    10
    Likes Received:
    0
    Location:
    Hungary
    Well, ok, not sorting, but z comparing.
     
  13. Kristof

    Regular Alpha

    Joined:
    Jan 30, 2002
    Messages:
    733
    Likes Received:
    1
    Location:
    Abbots Langley
    I am not really paying attention but this sort has to be per pixel correct, not sure how you want to do that, do you want to store things at the per pixel level ?
     
  14. JohnH

    Regular

    Joined:
    Mar 18, 2002
    Messages:
    586
    Likes Received:
    2
    Location:
    UK
    Well the actual sequence is, V Shade, Clip, VP Trans, Cull, Tile/Bin (this bit is the memory consumer, it isn't a sort) and once the whole scene has been gathered, Rasterise (the memory free'r). There is no specific "Sorting" step, per pixel depth test is effectively applied in the same way as a conventional rasteriser, but a tile at a time (memory being free'd by group of tiles).

    The parameter store does not need to be double buffered as its treated more like a fifo. i.e. Tiler consumes memory blocks, rasteriser frees them both proceses happen at the same time. The only time the front end stalls is when there is zero memory left when it requests it. However in reality if a scene can fit in the available parameter memory (free and in use) stalls are kept to a minimum. At worst these stalls are a bit like the stall you get in an IMR solution when the rasterisation of a modest size triangle stalls the whole geometry pipeline (this is why IMR peak rates are normally much higher than their real world sustained figures).

    John
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...