New are you ready video

Discussion in 'Architecture and Products' started by Ascended Saiyan, Oct 29, 2002.

  1. Mephisto

    Newcomer

    Joined:
    Feb 7, 2002
    Messages:
    200
    Likes Received:
    0
    No, there isn't. R300 does 16 textures per pass, it does depth and color buffer compression as well it has an efficient hierarchical z-buffering with a per-pixel depth test as the final pre-z-test if nessesary. In addition to this, there is a crossbar memory controller. Where is there room left?

    The only thing left to improve efficiency requires either developer support (sappy idea IMO) or is based on tile based approaches (either true defered or just tile-based IMR without the full overdraw removal through geometry raycasting, but with the benefits of onchip blending).
     
  2. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,325
    Likes Received:
    93
    Location:
    San Francisco
    Yeah..sure..fine..whatever...
     
  3. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,079
    Likes Received:
    648
    Location:
    O Canada!
    Thats not a particularily constructive statement. While I agree that I hardly think we've seen the end of the road for improving IMR efficiency / overdraw reduction, something a little more useful would be good... :)
     
  4. Humus

    Humus Crazy coder
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,217
    Likes Received:
    77
    Location:
    Stockholm, Sweden
    Ah, that reminds me of something ...

    "Everything that can be invented has been invented."
    - Charles H. Duell, U.S. Commissioner of Patents, in 1899.

    ;)
     
  5. Gollum

    Veteran

    Joined:
    May 14, 2002
    Messages:
    1,217
    Likes Received:
    8
    Location:
    germany
    Great display of open-mindedness Mephisto! There's always a million things in a chip's design that can be improved upon. One of them is inventing new technolgies or ways of doing things differently, others require tweaking and changing of existing parts to greater efficiency. You just got to look at the history of x86 processors to see how much can be done with an architecture given enough time.

    SA, one of this boards most respected contributors, only recently made a post about just this topic:
     
  6. Mephisto

    Newcomer

    Joined:
    Feb 7, 2002
    Messages:
    200
    Likes Received:
    0
    I know, but a lot of his suggestions require either developer support (don't you think developers already have enough to care about? Or don't you think spending CPU cycles on boring things like sorting triangles is a good idea for todays CPU limited games?), are tile based approaches (like I mentioned) or slight improvements over current implementations (hierarchical z).

    My question was meant seriously. The big steps are over, all the cool features we discussed over the last one or two years are implemented in some way in todays hardware, except for may be the fancy Z3. My response was targeted at Chaloth's "a lot more" can be done. Might someone tell me what? I'm not talking about small percentages, but the 20%++ the NV30 needs to match the R300 performance in bandwith limited situations.
     
  7. T2k

    T2k
    Veteran

    Joined:
    Jun 12, 2002
    Messages:
    2,004
    Likes Received:
    0
    Location:
    The Slope & TriBeCa (NYC)
    ...just as you or Chalnoth don't have ANY idea about NV30 - so, all these kind of assumptions based on stupid speculations, isn't it?

    :roll:
     
  8. T2k

    T2k
    Veteran

    Joined:
    Jun 12, 2002
    Messages:
    2,004
    Likes Received:
    0
    Location:
    The Slope & TriBeCa (NYC)
    :D
     
  9. arjan de lumens

    Veteran

    Joined:
    Feb 10, 2002
    Messages:
    1,274
    Likes Received:
    50
    Location:
    gjethus, Norway
    Got me thinking - where in the basic IMR architecture is there any potential for any improvement over R300? (that is, other than adding brute force: bandwidth, pipelines, texture and vertex units etc)
    • Z-buffering/Early Z-test/Z-compression/hierarchical Z: R300 is ATI's third pass at hierarchical Z - I doubt there is much left to gain here other than in conjunction with bounding volumes.
    • Bounding volumes rejection - may be useful combined with Hierarchical Z - requires developer support.
    • Anisotropic mapping - very little left to gain. Given the kind of performance hit R300 takes when doing aniso, it looks like ATI has actually superceded the Feline algorithm (!).
    • Texture compression - Some room for improvement over S3TC - VQTC looks like a better method in general. Requires some developer support.
    • Immediate mode tiling - requires extensive developer support, as long as OpenGL/Direct3d don't get scene graph support. You can do this on an R300 today, using OpenGL's scissor test to define a 'tile', if you feel so inclined :-?
    • Geometry data compression - R300 supports N-patches and displacement mapping, which are, after all, just compact ways to represent complex geometry - other than that, there may be a little room for compressing vertex arrays.
    • Antialiasing - with any given number of samples per pixel, R300's compressed multisampling should be about comparable to Z3 wrt bandwidth usage - Z3 may offer slightly better quality. There are faster AA methods as well, but they tend to require substantial developer effort in order not to break down all the time.
    • Stencil buffer compression - here, there seems to be room for substantial improvements (I guess; Nvidia and ATI have been silent on this issue so far)
    • Framebuffer compression (other than collapsing same-color samples for multisampling) - potential for moderate improvements for skies, featureless walls and other surfaces with sufficiently gradual color changes. Possibly difficult to do efficiently enough to be useful.
    • In vertex and pixel shaders, conditional jumps may be used to skip useless calculations, in particular lighting calculations for vertices/surfaces facing away from a light source. Can easily be used to speed up static T&L (I suspect NV30 is doing this); otherwise, requires developer support.
    • Any other ideas, anyone?
     
  10. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    Ned Greene's heirarchical-z occlusion culling. Wavelet compression of texture data. Geometry compression (not amplification) via stuff like topological surgery, etc. Scenegraph acceleration, using bounding volumes, etc.
     
  11. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,902
    Likes Received:
    218
    Location:
    Seattle, WA
    Well, one thing:

    I don't feel immediate-mode tiling necessarily needs to have developer support.

    So that you understand what I'm trying to say, what I mean by immediate-mode tiling is simply an architecture that forward-caches geometry in order to do occlusion tests not only on geometry that has already gone through the pixel pipelines, but also on geometry that has yet to go through them (by a reasonable amount...depending on how much geometry is cached).

    On the programming side, from what we've seen, it may take fewer passes and/or CPU power to do some algorithms used in the near future on an NV30. The primary example seen in the NV30's white papers is matrix blending for skeletal animation. With the NV30, you could potentially use a single program for the entire model, whereas with the R300 you'd need to split up your model, doing more CPU work overall. Not a huge improvement in speed, but I'd be surprised if this was an optimal situation for describing the NV30's programming strengths.

    Other than that, there are certainly better ways to compress the frame and z-buffers than what ATI is currently doing (Not based on any special knowledge of ATI's design...more based on the fact that it is an impossibility for the best possible algorithm to have been discovered yet).

    The NV30 will also likely use an 8-way crossbar memory controller, based on the doubling of pipelines over the GeForce4. nVidia's experience with this sort of controller will also likely lead to a more efficient design than ATI's.

    And regardless of which way you slice it, it's never "as good as it's going to get." There's just no such thing.
     
  12. Althornin

    Althornin Senior Lurker
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,326
    Likes Received:
    5
    Pfft.
    *crackle* Pot to kettle! Pot to Kettle! Come in, Kettle! *crackle*
     
  13. arjan de lumens

    Veteran

    Joined:
    Feb 10, 2002
    Messages:
    1,274
    Likes Received:
    50
    Location:
    gjethus, Norway
    OK ... Wavelet compression offers an obvious and compact way to store a mipmap pyramid, but to decompress even one texel, you need to read about 5x5 or so texels from every mipmap level above it, making hardware decompression at texture fetch time surprisingly slow and difficult.

    Geometry compression is a subject I probably need to read more about.

    I believe I more or less mentioned the other points?
    I'm afraid that I don't understand :cry: - I don't see how this can become 'immediate-mode tiling' - sounds more like a partially-deferred scheme to me, and I don't see the connection to tiling. Care to explain further?

    Which doesn't preclude the current ATI algorithm from being within, say, 0.1% of the "best possible" algorithm (although I do believe there is a bit more room than that left).... For frame/Z compression, you can always improve the compression ratio by making each block larger, like 16x16 pixels instead of the 8x8 that ATI is currently using. Doing so can increase bandwidth usage substantially, though, due to the fact that every time you touch even one pixel in a block, you need to decompress and recompress the entire block. Also, you can get better compression ratios by using algorithms like Huffman or arithmetic coding on each block - such algorithms will result in very slow reads because the data must be unpacked serially. So in all, there is a tradeoff in block size, algorithm complexity & parallellism, and bandwidth usage - if you have better suggestions than the ATI method, come with them.
     
  14. SA

    SA
    Newcomer

    Joined:
    Feb 9, 2002
    Messages:
    100
    Likes Received:
    2
    It is possible to render to an IMR with no sorting, no triangle binning, no tiling and yet have no overdraw and use very little z buffer bandwidth even for large depth complexities.
     
  15. arjan de lumens

    Veteran

    Joined:
    Feb 10, 2002
    Messages:
    1,274
    Likes Received:
    50
    Location:
    gjethus, Norway
    :eek:

    How? What kind of preprocessing is needed on polygon data to do this kind of magic?
     
  16. OpenGL guy

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,357
    Likes Received:
    28
    None! Just make them all backfacing and cull them ;)

    P.S. What's the correct answer, SA?
     
  17. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,079
    Likes Received:
    648
    Location:
    O Canada!
    Can anyone give a clue as to how this differs from what ATI has already?

    I have my doubts over that actually, whether its a 256bit bus or not. If its 128Bit then I'd say almost definitly not. Plus, we still don't know the configuration of the pipes - is it 8 pixels per clock only in FP16 mode?
     
  18. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
  19. no_way

    Regular

    Joined:
    Jul 2, 2002
    Messages:
    301
    Likes Received:
    0
    Location:
    estonia
    hmm .. batched primitive processing can be improved i guess. i.e. if you draw a 1000 batch of triangles, generally you dont care in what order they are being drawn, so the chip can take care of sorting and culling within the batch. maybe thats what SA is talking about ?

    otherwise, i just dont understand. Lets say you just begun to draw a scene, and draw a single large tri across half a screen. the chip doesnt know, whether youll be doing a endscene next, draw something else or not. it doesnt have no info about next primitives you are going to draw. So how can it decide by itself that the triangle needs to be drawn or not ? It _has_ to draw it, or defer the rendering until more info becomes available ( more primitives are sent, or scene is finished )
    So where's the catch here ?
     
  20. Prometheus

    Newcomer

    Joined:
    Jul 9, 2002
    Messages:
    97
    Likes Received:
    2
    Location:
    Greece
    I think nvidia should stop with this stupid "are you ready" game and just release nv30. :wink: We have been ready for a few months now and getting impatient with continues delays.Screensavers and flash games,how lame!!! :evil:
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...