X1800/7800gt AA comparisons

Discussion in 'Architecture and Products' started by Nite_Hawk, Oct 5, 2005.

Thread Status:
Not open for further replies.
  1. RoOoBo

    Regular

    Joined:
    Jun 12, 2002
    Messages:
    308
    Likes Received:
    31
    That's a good question and I would like to know the answer too. That thing called 'recursive rasterization' that I'm using and I have doubts anyone else is using even when Akeley, in that 2001 Stanford course slides, seemed to suggest that NVidia (others?) used it, allows traversing triangles in parallel and can generate, for the same tile, all the fragments generated from multiple triangles, in a single recursive traversal step. That feature could help to fill tile based batches for sure if you are rendering closely connected triangles (the usual triangle mesh) that don't overlap.
     
  2. RoOoBo

    Regular

    Joined:
    Jun 12, 2002
    Messages:
    308
    Likes Received:
    31
    In fragments? Varies a lot, even more if you are removing fragments before shading with HZ and early Z. I have never tried to get the average per triangle batch fragments with a game trace. Imagine particle rendering, lot of triangles and very few fragments.

    In triangles I have seen all kind of batches from less than 10 triangles to tens of thousands in the game traces that I have briefly skipper over. But at the end the average must be in the hundreds to thousands if you want good performance with a GPU as they have a big overhead when starting a new batch (all the very large GPU pipeline must be filled again) and for changing render states.
     
  3. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    Well, it seems to me that it would be stupid to build an architecture that can't fill the batch with a set of triangles that use the same texture and pixel shader.
     
  4. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,344
    Likes Received:
    176
    Location:
    On the path to wisdom
    The new GDDR3 ones have a configurable burst length of either 4 or 8, while the older, 144-ball ones only allow 4.

    NVidia fills batches with quads from multiple triangles. I'm not sure about R300/R420, but I think the 4x4 pixel threads of R520 certainly belong to one triangle each.
     
  5. RoOoBo

    Regular

    Joined:
    Jun 12, 2002
    Messages:
    308
    Likes Received:
    31
    And I keep saying that (unless I'm proved as being stupid by any IHV engineer sounding off :lol: ) that I doubt current GPUs are pipelining triangles (or said in another way OpenGL primitive batches or could be called draw commands) with different render states. The graphic program changes some state, sends a draw command for X triangles, GPU renders those triangles until pipes are empty, graphic program changes the render state again, sends another draw command and GPU renders those triangles. I think some pipelining of state changes and the end of a draw command with the start of the next one is possible. But having triangles or fragments with different associated render states on the same stage? No way. Why would be state changes and small batches so costly? Ignoring that they spend CPU cycles ...
     
  6. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    I've always assumed in R3xx...R4xx that the batch size is the screen-tile size, i.e. 64 quads (256 pixels). Dunno for sure.

    In R520 we know the batch size is 4 quads, so the relationship between screen-tiles and batches is relatively soft.

    In NV40 it's in the region of 256 quads per fragment-quad with all four quads in 6800U/GT working together (i.e. 4096 pixels in total). In G70 it's 256 quads per fragment-quad, but each fragment-quad is independent - so 1024 pixels per batch.

    But there's a wishy-washy factor at play in NVidia architectures that somehow brings the effective batch sizes way down (to about 80% of the sizes I've stated). A real mystery what's going on there...

    In G70 I think there are, effectively, 6 concurrent batches working on a large triangle. i.e. if a triangle consists of at least 6 quads (24 pixels), then each of the fragment-quad pipelines will share the workload of rendering the triangle. As far as I can tell NVidia GPUs rasterise each triangle in a round-robin fashion across the available fragment-quads.

    Jawed
     
  7. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    If each triangle in a mesh has different gradients and texture coordinates from its neighbours (actually I don't really understand this stuff) then that means that the "batch state" in the GPU has to be able to hold the triangle data for multiple triangles.

    If that's the case, then that would seem to create a limitation on a GPU's ability to fill a batch with triangles. Sure the limit might be, say, 16 triangles instead of 1, but it still creates problems when rendering distant, fairly high-poly objects, where each triangle is 10 or 20 pixels.

    So, are GPUs capable of holding multiple-triangles' data like this?

    Jawed
     
  8. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Chalnoth, you incessant rants and refusal to accept even the most obvious facts about NV's few glaring shortcomings in the past piss everybody off here.

    I showed you many times that GF4 took an enormous performance hit with anisotropic filtering. It had very little to do with extra samples for off angle surfaces. NV30 also had angle independent AF (or very nearly so), and it had a performance hit very similar to ATI all else being equal. 90+% of rendered pixels in a Quake3 demo are vertical or horizontal. Yet we see this from GF4:
    http://graphics.tomshardware.com/graphic/20020206/geforce4-17.html#anisotropic_performance
    117 fps at 1024x768 w/ 8xAF; 132 fps at 1600x1200 w/o AF
    Fillrate drops to almost 1/3! My 9700P shows 10-20% hit max in Q3 with 16x Quality AF. This is an extreme case, but usually the GF4 had 3x the performance drop with AF.

    No one cares if the GF4 quality is a bit better when it has a performance hit way higher. It's always been performance first, quality second (up to a certain point, obviously).

    Today, graphics cards from ATI and NVidia are often within 20% of each other, which is hard to notice when not looking at a graph. Games use more off angle surfaces now too than back then since gamers are demanding more varied environments. Hence the focus on AF quality. You being pissed about this tells more about your bias than the media's. In fact, even today only HardOCP has stated that it makes a noticeable difference. Some other sites are even writing off ATI's higher quality AF as insignificant.

    MSAA is a speed optimization, and GF4 barely outpaced theoretical SSAA (given the same RAMDAC downsampling) - 4xAA reduced fillrate by 70% instead of 75%. Colour compression was the real innovation. The shader hardware in the original Radeon was unbelievably close to DX8 PS1.0. Both had 8 math ops, both had fixed mode dependent texturing, but the Radeon had a 2x2 matrux multiplication first instead of 3x3. It had 3 texture multitexturing instead of 4. I worked at ATI, and I am (rather was) very familiar with R100/R200 architecture. The vertex shaders were barely changed - R100 just didn't quite meet the spec, which rumour says was changed too late for ATI, so they couldn't call it a programmable vertex shader according to Microsoft. Saying who invented what in any field is often a wash, and realtime graphics is no different.

    I'm not agreeing with the statement that ATI is way more innovative or forward-looking than NVidia, but rather that these innovations are very evolutionary. Both companies are driving each other similarly, especially when you consider how early design decisions are made. If that's what you're saying too, then maybe you shouldn't come off like your saying ATI is just a follower.

    Either way, lay the AF thing to rest. GF4's AF speed was pathetic.
     
  9. ERK

    ERK
    Regular

    Joined:
    Mar 31, 2004
    Messages:
    287
    Likes Received:
    10
    Location:
    SoCal
    I was just thinking in terms of the following:
    1. Cache hits reduce latency penalties (over misses).
    2. The X1K architecture masks latency with thread swapouts.
    3. Therefore, cache duplication, even though storage inefficient, would not cause any penalty because any threads needing different cache data would just be scheduled around the latency.

    If this does not apply to this situation, my bad.
     
  10. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    Tens of thousands seems a mite large, considering that 10,000 pixels is a 100x100 pixel block. This, of course, may be incentive for looser constraints in what constitutes a batch.
     
  11. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,344
    Likes Received:
    176
    Location:
    On the path to wisdom
    At least some are.



    Mintmaster, seems like both ATI and 3dfx weren't happy about PS1.0 ...
     
  12. Fred da Roza

    Newcomer

    Joined:
    May 6, 2003
    Messages:
    178
    Likes Received:
    2
    Unless they are claims made by nVidia.


    http://www.beyond3d.com/forum/showthread.php?p=62174&highlight=driver#post62174

     
    AlphaWolf likes this.
  13. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    Sorry, yeah, you're right that's exactly how R520 is able to skirt the issue more effectively.

    Jawed
     
  14. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    Er, you're taking that statement rather out of context.

    Edit: give me a moment, I'm trying to figure out the proper context myself.
     
  15. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    Okay, I think I've got the proper context now.

    I was arguing that future driver improvements should improve the performance of shaders in the NV30. I was stating this in part because I had heard that the NV30 was a VLIW design, which are notoriously hard to write compilers for. In part because this was what nVidia was claiming themselves would help.

    Now, I believe I've been very consistent over the years in separating how I think performance will change and what people should consider when buying something. I've always stated that you should buy a product for what it can do for you now.

    But that doesn't mean I can't speculate as to how performance will change with newer drivers. If I remember correctly, the NV3x did indeed inrease its shader performance more than the R3xx over the next year, but it took an entirely new architecture to become truly competitive (the NV4x). This is a prime example of what I was talking about: if you expect driver improvements to save an architecture, you're setting yourself up for failure.

    Expectations on how future drivers will improve a video card should only be factored in if you consider the comparison between the cards to otherwise be a wash. An example of this perspective can be seen in how I chose to purchase an SLI motherboard. I never really expected to use SLI, but as I was browsing through the nForce4 motherboards available at the time, attempting to select a layout that I liked, I threw out motherboards until I was within a few dollars of the excellent Asus A8N-SLI motherboard. So, the marginal probability that I would ever make use of SLI made me decide to purchase the SLI motherboard, in a situation where it might have otherwise been a wash.

    I think that's the only situation where you'd want to buy a video card based on the promise of improved drivers. If, for example, the X1800 XL was available for the same price as the GeForce 7800 GT, and the XL performed a few percent better (which it may in a month or two) so that the performance was essentially even, it might be a good idea to select the X1800 XL because it's much more likely to improve performance via drivers than the 7800 GT (although I still wouldn't, because linux and OpenGL support are very important to me).
     
  16. Fred da Roza

    Newcomer

    Joined:
    May 6, 2003
    Messages:
    178
    Likes Received:
    2
    The context is right there for everyone to see. And no optimization were made to Doom at the time but of course it "makes lots of sense". Thats the second time you have outright contradicted yourself to put nVidia in a good light and ATI in a bad one.

    Reminds me of the last lame response.
    http://www.beyond3d.com/forum/showpost.php?p=223030&postcount=438

     
  17. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    Do you have any concept of the limit on the number of triangles in a batch in NVidia GPUs? Do the triangles have to have the same normal?

    I kinda suspect that ATI GPUs are strictly one-triangle.

    I seem to remember that the 16x16 size was described as a trade-off between small triangles and cache.

    http://www.beyond3d.com/reviews/ati/r420_x800/index.php?p=5

    Reducing the tile size allows for higher efficiency with smaller triangles, while larger triangles favour texturing efficiency.

    There was a forum post by one of the ATI guys describing this - but I can't find it.

    Jawed
     
  18. Fred da Roza

    Newcomer

    Joined:
    May 6, 2003
    Messages:
    178
    Likes Received:
    2
    Do we really need to go through what you have said about improving nVidia performance with driver improvements. You realize there is a search tool on the forum.

    http://www.beyond3d.com/forum/showthread.php?p=97373&highlight=driver#post97373

    http://www.beyond3d.com/forum/showthread.php?p=239451&highlight=driver#post239451

    http://www.beyond3d.com/forum/showthread.php?p=239590&highlight=driver#post239590

    For someone that doesn't believe you should consider driver improvements a reason to buy a card, you sure talk about it a lot when it involves nVidia.
     
    #338 Fred da Roza, Oct 14, 2005
    Last edited by a moderator: Oct 14, 2005
  19. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,344
    Likes Received:
    176
    Location:
    On the path to wisdom
    No, this part of the pipeline has no concept of a surface normal (actually, no part has). The Z-gradients don't have to be identical, and the face register depents on the winding of the vertices.

    I don't know how many different triangles can be in a batch, but I would be surprised if it's more than 16.
     
    Jawed likes this.
  20. WaltC

    Veteran

    Joined:
    Jul 22, 2002
    Messages:
    2,710
    Likes Received:
    8
    Location:
    BelleVue Sanatorium, Billary, NY. Patient privile

    I think it would be really, really nice if you'd bother quoting me in context, as you might try at least reading, if not quoting, the remarks I specifically responded to. Instead, you've quoted only my remarks separately, made the mistake of characterizing them in inflammatory terms, while you completely ignore the set of inflammatory, incorrect, utterly inaccurate remarks I responded to in the first place. Remarks not made by me.

    Amazing, really, that you'd single out my remarks as if I brought up the topic, uh, which I did not do. Your comments are very sad and utterly out of place. Try reading what I write in context, please.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...