Z3 re-visited

Discussion in 'General 3D Technology' started by Reverend, Mar 14, 2003.

  1. Reverend

    Banned

    Joined:
    Jan 31, 2002
    Messages:
    3,266
    Likes Received:
    24
    Okay, so I was wrong about expecting this to be implemented in the NV30.

    Will we likely see this in the next-gen-hardware AA algo from various IHVs? How expensive can this be in terms of gates?
     
  2. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,420
    Likes Received:
    179
    Location:
    Chania
    Rev,

    Dumb question: why Z3 (or a similar algorithm) and not a Fragment AA algorithm instead (or even a combination of both with modifications - like SA had mentioned once)?
     
  3. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    299
    Location:
    UK
    Don't worry about being wrong on it being implemented on the NV30.

    I supposed the NV30 would support Wu Antialiasing, combined with traditional antialiasing, to determine, based on analyctical coverage, whether a nearby subpixel should be filled / not filled, even if the traditional algorithm says the opposite.
    So, there's no Z problem, nothing. It probably got few disadvantage - beside the high transistor count required ( I suppose - maybe there could be miraculous solutions to make it cheap, but it's unlikely )

    It could, if implemented properly, give quality as high as the optimal sampling pattern for each pixel. So, it's like if you had dynamic sampling patterns, even though they're really ordered.


    Back on topic...
    Z3 is IMO a very nice algorithm, but in worst-case scenarios it ain't that great. The problem may be that companies like nVidia & ATI like to derive workstation products from their standard products, and Z3 disadvantages are "out of question" for a film studio.
    And retrieving Z3 from an architecture would be a fairly substantial modification.

    nVidia is very fond of their workstation strategy: They think that since it's developped on fundamentally the same architecture, it also means it's optimized for that architecture. Which gives them a performance advantage...

    So, you'd have to implement both a Z3 & Traditional path. It's wasted silicon, so I fear GPUs truly supporting Z3 might be quite rare, even in the future.


    Uttar
     
  4. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,435
    Likes Received:
    263
    I don't expect we'll ever see Z3 exactly in hardware, but something similar is possible. As Ailuros mentioned we could see a combination of FAA and Z3. Because both are based on coverage masks the fundamental pipelines are similar. I don't think Matrox has the stomach to risk it anymore and I think the disadvantages will keep Ati away. I think Ati is generally happy with the combination of MSAA and compression. Nvidia, I'm not so sure of. Maybe 3dlabs could modify superscene for a gaming chip where true 16x quality isn't necessary for micro-polygons.

    Frankly I think Z3 has some flaws that weren't found yet because they didn't run enough tests, but I don't have any proof of that. Less flaws than FAA had however.

    As far as the gate cost goes its not too bad. There is some extra logic above the requirements for MSAA. Man power/design time is the main issue there. The main gate cost will come in regards to the data structure. i.e. Is a separate cache needed.
     
  5. SA

    SA
    Newcomer

    Joined:
    Feb 9, 2002
    Messages:
    100
    Likes Received:
    2
    There are better algorithms than Z3 for hardware AA that solve the same problems. I prefer to think of Z3 as an approach to AA that involves using sorted fragment AA with bit masks and an upper limit on the mask depth. As to when these will actually be available in hardware, well that's another issue.
     
  6. psurge

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    939
    Likes Received:
    35
    Location:
    LA, California
    SA: care to provide a link or two for these better algorithms (if they are public that is)?

    Regards,
    Serge
     
  7. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,435
    Likes Received:
    263
    What's everyones opinion on the need for sort independent transparency? Obviously this is a Z3 feature, but the same quality antialiasing can be done without supporting this feature. I couldn't find the paper, but Nvidia has developed a method called depth peeling that does sort-independent transparency with current cards.
     
  8. demalion

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,024
    Likes Received:
    1
    Location:
    CT
    What about an AA FIFO buffer...except storing sample state instead of shader state? Recalling some of my speculations from the R400 guessing thread regarding occlusion culling, it seems to me now that proportional area weighting could be stored as part of the data in this buffer. The sample count would not be based on position, but on primitive association, and would vary for overdraw...I think as few as 4 discrete values could serve if exposure area sorting allowed samples to be displaced, though the trade off of overdraw error versus index (z buffer) checking for this rejection might call for as few as 2.. Hmm...actually, I think some of the things I was thinking of in that thread about the implications of the occlussion culling calculations and a unified shader model facilitate this when this type of buffer is considered. Seems this is a natural fit for an architecture with something like the F Buffer already being considered.

    In any case, by tracking things like:

    x0 intersection : x value of intersection with "top"
    y0 intersection : y value of intersection with "left"
    x1 intersection : x value of intersection with "bottom"
    y1 intersection : y value of intersection with "right"
    bias : which way the polygon extends from this edge line
    xc : x value of corner
    yc : y value of corner
    w_trans : transparency weighting for the color data, to determine how much weighting is given the portion of "behind" color data that is occluded

    for each buffer color, couldn't an "infinite resolution" blend occur at the end due to the coherency of the color data for the pixel pipeline the fifo buffer allows? With 4 bit accuracy for the x/y values, the equivalent of 256 sample OGMS would occur, wouldn't it? And that should be less expensive than a 4xMSAA/2xSSAA method in both memory and bandwidth usage, since the bit value total for that to occur would just have to be <= 64 (assuming a cap of 4 discrete values), and opportunities for compression would exist. Another question is if this would be feasible as an addition to pre-existing AA methods rather than a replacement...depends on the flexibility of the execution of the more traditional methods whether that would make sense, I think.

    The trick would then be the optimize the evaluation of coverage interactions...and it seems to me that the latency for the evaluations for the data to be stored for final sampling could be hidden in waits for texture fetches and pixel shading calculations. Also, I think some opportunities exist for "threshold blocking" factors based on which x/y 0/1/c values are 0 above to speed this up...all in all, I think it might even be hidden by the basic pipelining latency.

    As is often the case, I get the distinct feeling we've discussed these details, or very similar ones, before. I'm sure if I could search for "Z3" I'd find something addressing atleast parts of this idea, but in the absence of that I apologize for any thoughts, and errors, I repeat from the past. Also, sorry advance for any "Monday Math" type errors in my assumptions.
     
  9. micron

    micron Diamond Viper 550
    Veteran

    Joined:
    Feb 23, 2003
    Messages:
    1,189
    Likes Received:
    12
    Location:
    U.S.
    <looks at above post>"I am simply not smart enough to hang out here"
     
  10. SA

    SA
    Newcomer

    Joined:
    Feb 9, 2002
    Messages:
    100
    Likes Received:
    2
    Concerning transparency sorting, I much prefer a method like Z3 where it comes for free as a opposed to a method like depth peeling that is expensive and must be explicitly coded for.

    However, one important rendering aspect that Z3 does not address is providing the programmer precise control over what the rendering order of transparent surfaces should be, since it always renders them in z order.
     
  11. PurplePigeon

    Newcomer

    Joined:
    Mar 7, 2003
    Messages:
    46
    Likes Received:
    3
  12. demalion

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,024
    Likes Received:
    1
    Location:
    CT
    That was most definitely not the simplest way to describe my thoughts, just a way that provides a lot of details and speculations for errors to be pointed out by others.

    Just because you can't bridge the gap from your understanding to what was said in it right now doesn't mean you aren't capable of doing so. Just hang around and keep an open mind and learn what you can. My $0.02
     
  13. micron

    micron Diamond Viper 550
    Veteran

    Joined:
    Feb 23, 2003
    Messages:
    1,189
    Likes Received:
    12
    Location:
    U.S.
    Thank you Demalion.
     
  14. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,560
    Likes Received:
    157
    Location:
    In the Island of Sodor, where the steam trains lie
    Gosh I'd forgotten how complicated that was. The Dreamcast method was so much easier to use.
     
  15. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    6,805
    Likes Received:
    473
    Complicated and slow, useless.
     
  16. psurge

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    939
    Likes Received:
    35
    Location:
    LA, California
    SA, what kind of drawing orders (other than Z) are necessary?

    Would the addition of a primary sort key to transparent fragments address this ? (i.e. each transparent object generates fragments with a some specifiable "layer id", non-interpolated). Sort transparent fragments by id, then sort fragments with identical id in z...
     
  17. SA

    SA
    Newcomer

    Joined:
    Feb 9, 2002
    Messages:
    100
    Likes Received:
    2
    The issue of render order is more of programmatic control of the rendering and letting the programmer decide what special effects they want to generate whether physically plausible or not. It is also a question of compatibility since older applications may have rendered transparency in a particular order for a special effect and this would be lost. It is important therefore when the hardware provides a means of ordering the rendering of transparency that it be done under programmatic control with the default being (input) sequential ordering.

    A more important consideration is being able to defer calculations, since the calculations for a transparent surface may need to be deferred until all the transparent fragments are present. Since the calculations may vary from surface to surface for different surface types, you may need to defer several sets of calculations and associate each with its corresponding surface. With Z3, this is a bit of a problem since it must merge fragments on the fly, it must perform the calculations of the merged fragments before all the fragments are present. However, the final results are generally good enough.
     
  18. Fred

    Newcomer

    Joined:
    Feb 18, 2002
    Messages:
    210
    Likes Received:
    15
    If quality is a problem for Z3, I wonder how useful it would be to take it to second order by storing additional 2nd derivatives.
     
  19. arjan de lumens

    Veteran

    Joined:
    Feb 10, 2002
    Messages:
    1,274
    Likes Received:
    50
    Location:
    gjethus, Norway
    Given that 2nd derivatives of Z tend to give very small values, I wouldn't expect that including 2nd derivatives makes much of a difference quality-wise. AFAIK, the problem with Z3 is that if you get too many fragments affecting the color of a pixel (the case with very small polygons and/or very many transparent layers), you get a buffer overflow and need to combine or remove fragments. While this can be made to work satisfactorily most (>99%?) of the time, working with a renderer that it is known can be made to glitch, even if only in unusual corner cases, just doesn't feel quite right for professional/cinematic use.
     
  20. psurge

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    939
    Likes Received:
    35
    Location:
    LA, California
    well, thinking about it, it seems like the fb should be optimized such that it has fixed storage for say 1-3(?) fragments per pixel, with a fall back to an arbitrary sized fragment list for those pixels with very high transparent overdraw or which contain lots of opaque fragments. I.e. basically, buffer overflow spills to an A-buffer like structure instead of triggering a fragment merge operation.

    Even if overflow is pretty rare, being able to operate on a complete fragment list for a pixel (for programmatic control of fragment ordering, blending in the PS) seems really expensive. This (once again) sounds like something a tiler would be able to handle rather well compared to an IMR.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...