Carmack's comments on NV30 vs R300, DOOM developments

Discussion in 'Architecture and Products' started by boobs, Jan 30, 2003.

  1. antlers

    Regular

    Joined:
    Aug 14, 2002
    Messages:
    457
    Likes Received:
    0
    When a discussion has momentum, it's hard to make it change direction.

    I do agree that it is a little surprising that the NV30-specific codepath on the NV30 is only beating the ARB2 codepath on the R300 by a little bit. Everyone thought that Doom3 with an NV30-specific codepath would be kind of a best-case scenario for the NV30. It looks like instead the relative performance is going to be more typical of the results seen for other games.
     
  2. RussSchultz

    RussSchultz Professional Malcontent
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,855
    Likes Received:
    55
    Location:
    HTTP 404
    I agree, I was confusing the vertex vs. fragment shaders about whether for sure it was gone or not.

    Regardless, distilling a persons motivations and beliefs from several disconnected actions amounts to reading tea leaves. Which is why I said you were stretching to come to your conclusion (that being him keeping the NV30 fragment path was because he didn't believe the ARB path would be up to snuff).
     
  3. Thelacky

    Newcomer

    Joined:
    Jan 27, 2003
    Messages:
    36
    Likes Received:
    1
    The ATI R300 is just one brilliant piece of engineering. ATI truely surpassed themselves. :D
     
  4. Joe DeFuria

    Legend

    Joined:
    Feb 6, 2002
    Messages:
    5,994
    Likes Received:
    71
    All speculation is reading tea leaves. You keep on trying to make the point that basically "I'm specualting." And I keep saying..."Yes I am. Obviously."

    No need to make the point again. It has been implicitly conceded from the start.

    All I ask (yet again), is that if my speculation is so "stretched", then it should be very easy to provide a separate scenario / reason for his "maintaining" (generic term) the NV30 path, that has nothing to do with anticipated ARB2 drivers getting up to snuff.

    Or is it that you just term "all speculation" as "stretched?"
     
  5. Sxotty

    Legend

    Joined:
    Dec 11, 2002
    Messages:
    5,496
    Likes Received:
    866
    Location:
    PA USA
    Do you fellas, remember analyzing Shakespeare and other literary works, well that is kinda what this seems like to me, everyone is reading all kinds of crazy things into a statement, every ambiguity is twisted to support the idea that an individual personally subscribes to. Usually in this case people are divided on company lines, that personally I see no reason to really care about.

    Do youguys have a lot of stock in ATI, or Nvidia? I mean what is the point of getting huffy, the truth is both companies make good stuff, hopefully neither will get as far ahead as Nvidia once was, and ati was before. As long as competition remains strong then the products we end up with, whoever designs them will just get better. Everyone is entitled to their opinion, maybe they like red and ATI, or green and Nvidia, but whatever the reason that does not mean you should get angry. It is too much like a football game and the fans starting to fight instead of just enjoying the game.

    (I personally still respect Carmack and do not think that he is paid off by Nvidia to say what they want as some have suggested, he seems to say what he thinks and whether it is correct youguys can argue all you want about)
     
  6. RussSchultz

    RussSchultz Professional Malcontent
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,855
    Likes Received:
    55
    Location:
    HTTP 404
    Speculating about motivations and beliefs when there's nothing that addresses motivations and beliefs in anything he wrote is where I call it stretching.

    Speculating that I'm a vegetarian because you've never seen me eat meat is fine and dandy. Saying I'm doing it for ethical reasons based on those same facts is simply stretching it.

    Along those lines, he says it (the NV30 arb2 path) performs worse than the NV30 path. He also says NVIDIA has assured him it resolved shortly. No where is there any shred of anything that suggests he doesn't believe them or that he doubts the NV30 performance will be similar between the two paths, so why do you attribute those beliefs to him as the "only reasonable explanation "?

    Why the code path is still there (assuming it is truly there and being actively developed) is something only he can answer with any certainty.

    And Joe, I already agreed with you that its a possibility that what you say is the reasoning behind it. I just take particular exception that "its the only reasonable explanation". (Mostly because I don't really find it complelling at all.)
     
  7. Saem

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    1,532
    Likes Received:
    6
    I believe that was in context of him trying to experiment, this is not connected to what's going on in Doom III, in the realm of supported features.

    As for complaints of the NV30 path, he had this done before, the performance of the NV30 with the ARB2 path is obviously lacking, roughly half that of the R300. The optimization is necessary otherwise it'll run like crap. In the case of the R300 it has high precision rendering which is just shy of the NV30, optimizing by creating a R300 code path wouldn't get you much for the time invested.

    Simply put, the R300 can run everything on and fast with the ARB2 path. The NV30 cannot run everything on and fast enough with the ARB2 path. Which would likely be the best ONE to optimize for?

    Further more, the NV30 path might lend to Carmack's research for his next creation/project(s).
     
  8. OpenGL guy

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,357
    Likes Received:
    28
    Actually, it won't. If you are losing important data just loading the values as 16-bit floats, no amount of internal precision can bring that back.
    Thanks for conceding a point.
    Then a better example would be 9.6E-04. If the second digit were in error, then you *might* notice a difference.
    DAC precision is a factor, but, if you do enough computations, DAC precision becomes a moot point.
     
  9. Humus

    Humus Crazy coder
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,217
    Likes Received:
    77
    Location:
    Stockholm, Sweden
    "He cannot be certain" was supposed to be read as "he may have confidence, but no garantuees are given". He cannot drop it until fast drivers for ARB2 are delivered, but he may still have confidence that this will happend eventually.

    Yes, 60fps vs. 30fps during prototyping matters. You have to navigate your little world, and in the case of a full-blown game the navigating time can be much higher than for simple techdemos. If you're stuttering your way through the world it'll take more time, especially if you're going to study details as your spatial precision goes down. It also has the same psycological effect as when you play games at low framerate. More annoyance and sometimes even some kind of odd distrust of your code.
     
  10. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    I just don't see how a driver and optimizations can have such huge repercussions in pixel shader speed. I suppose some out of order scheduling may increase shader speed by reducing stalls, but would it really make that much of a difference?

    Also, how is NVidia performing so poorly in things like the PS 1.4 benchmarks, both in 3DMark and ShaderMark? Surely PS 1.4 is a subset of this so called "NV30 path".

    Nvidia has had SO much time to develop the NV30. I always thought of them as kings of optimization. More so on the software side, but their GF3/GF4 hardware was very efficient (well, after recycling so much old tech, you'd expect it to be, I guess). I think I'm going to just ignore NV30's pixel shader performance for a month.

    I suppose you could be right, but it still seems very unlikely to me. Would NVidia really be that cocky? They're talking about 4 times GF4's pixel shader performance, a brand new architecture that blows the competition away, etc, etc.

    However, your argument matches well with that interview we heard a while ago about keeping the register combiners.
     
  11. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Don't worry Grall, I was rather engaged by your post. I was conjuring up a reply, and then your next paragraph had exactly what I was thinking!

    Well discussed. I thought the same thing, and figured that was true after seeing NVidia's PR graphs.

    However, when you think about it, the texturing passes are very important. The Z only and stencil passes only need 1 clock per pixel, and this will be very optimal since there are no bandwidth limitations (or there shouldn't be in a well designed GPU). However, he's using 7 textures per pass when lighting everything, so that's at least 7 clocks per pixel. Throw in trilinear filtering, and it could potentially be 10+ wherever minification is happening (although NV30 is supposed to do single cycle trilinear, I think).

    I still think the deficiency goes beyond this, but I'm doubtful we'll ever know the real truth. R300 is also one fp op per pipe per clock, but is clocked way lower. :?
     
  12. Crusher

    Crusher Aptitudinal Constituent
    Regular

    Joined:
    Mar 16, 2002
    Messages:
    869
    Likes Received:
    19
    No, neither normal volumetric shadowing or Carmack's Reverse method touch hidden pixels. Both methods make comparisons on the Z-Buffer, which contains no hidden pixels at all. The way Carmack's Reverse differs is that he renders the back face of the shadow volume, incrementing the stencil buffer when there's a z-fail, and then render the front side of the volume and decrement the stencil buffer if there's a z-pass. Normal method does front side first, incrementing on z-pass, then the back side, decrementing on z-fail. Culling is disabled for both methods, which is part of the reason the performance is eaten up, and is probably where you got confused about hidden pixels.

    You don't render stencil shadows on the back side of models. It's also difficult to "burn through" them, since the calculations necessary to build the shadow volumes are time consuming, and the workload increases along with the polygon count of the occluders (might be one reason the Doom 3 models seem slightly lower in polygon count than people expected them to be).

    As for the NV30 codepath, the two reasons I could think of are:

    a) once he tried the ARB2 path on the NV30 and found it wasn't performing very well, he added an NV30 path to see if the proprietary extensions were any faster, and when he discovered they were, he kept it

    b) there was an NV30 emulator for Cg developers long before the NV30 was available to work on, perhaps he added the NV30 path to play around on the emulator with.

    Either way I'm sure it doesn't take a lot of time and effort for Carmack to convert ARB OpenGL paths to proprietary extensions, so I guess I don't see why anyone feels the need to worry about their existence.
     
  13. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Crusher, what Basic means by "hidden pixels" is "z-fail". If they fail, they are in effect hidden. He's just talking about how ATI's Heirarchical Z doesn't work when drawing the stencil volumes because of this. Trust me, Basic is a smart, knowledgeable guy that doesn't easily get "confused".

    And I don't know what you mean by "backside of models", but you do render the backside of the stencil volumes, which is what Grall is talking about. He said "stencils", not models.

    Finally, we are obviously talking about situations that aren't CPU limited. What good does a fast GPU do you then? While the graphics card is handling the intense texturing for one frame, the CPU is doing the stencil volumes for the next frame. NV30 should very well be able to burn through them.
     
  14. Crusher

    Crusher Aptitudinal Constituent
    Regular

    Joined:
    Mar 16, 2002
    Messages:
    869
    Likes Received:
    19
    That's not what it sounds like he is talking about to me. Nor does your description of what Hierarchial Z does sound correct. You say HZ doens't work when drawing stencil volumes because it throws away hidden pixels, and you claim the hidden pixels are the parts of the volume face that fail the z-test (i.e. is this pixel in front of or behind the pixel stored in the same location in the z-buffer). Throwing away pixels that fail the z-test is PERFECTLY FINE. It should never be keeping those pixels anyway, since you explicitly disable z-buffer writes before you render the volume. All you care about is the result of the test--the fact that it did fail--so that you can alter the entry in the stencil buffer accordingly. As long as the driver accurately reports the result of the depth test, everything should be peachy. And I don't see any possible reason why HZ would affect the depth tests themselves.


    He said models, but perhaps he was talking about the stencil volumes instead of models, in which case I misunderstood him.

    Nowhere in Carmack's .plan update did he even suggest that Doom 3 was not being CPU limited, and since the comments I'm responding to are referring to the FX's performance in Doom 3, I don't see how it could be obvious that we're talking about situations that aren't CPU limited.

    NV30 might be able to burn through the z tests, and the rendering pass to add the shadow from the stencil mask might not take too long, that I could agree with. My point is, while the NV30 might be able to handle it's share of the workload for the stencil volumes, the end performance probably isn't going to be "blazing fast" like it sounded he was expecting it to be, since there are still lots of things that have to be done to calculate them. And your comment implies that the NV30 has other things to do while the CPU is computing the volumes, which isn't normally the case.

    The rendering process is usually that you build the volume for one occluder, do the z tests and update the stencil mask, then build the volume for the next occluder. In this situation, if the GPU can do the transform and z tests faster than the CPU can compute the volume for the next occluder, the GPU will be sitting idle waiting for that information. And if you have this all being done in the same function (or even the same thread), the transformation and z testing won't be done concurrently with the volume production anyway, so they'll both be waiting for the other. You could generate all the shadow volumes before you begin transforming and doing z tests, but I don't think that would be any faster, and you'd have to have to store a lot more vertices in each frame.
     
  15. Nagorak

    Regular

    Joined:
    Jun 20, 2002
    Messages:
    854
    Likes Received:
    0
    Let's see he 1) creates a specific code path for the NV30, while 2) not creating one for the R300.

    Then 3) he pats Nvidia on the back for making such great drivers, when the fact is he is tailoring his code to run on their HW to begin with! How could things not work exactly how he wants, when he goes out of his way to program for Nvidia's hardware?

    4) Nvidia's graphics cards were too slow to demo Doom3, so he ran with an R9700. But then at the first available opportunity he throws the R300 out of his comp and goes back to Nvidia. It's just a load of bullshit, seeing as he already has an NV30 optimized path, he should keep the R9700 and work on making one for it. Frankly he should just keep his mouth shut about driver quality, because he obviously doesn't have an ATi card in his computer enough to judge.

    And saying that the R200 path is fine to use with the R300 is just ridiculous...why not just run the NV30 with the NV10 codepath too?

    Maybe I'm overreacting, but it just annoys me that everyone thinks Carmack is some sort of god, when the truth is he hasn't made a single good game yet! When Doom3 comes out everyone here is going to be wetting themselves over how great it looks, when the truth is you'd probably find more atmosphere in the now-anemic graphics of System Shock 2.
     
  16. gokickrocks

    Regular

    Joined:
    Dec 19, 2002
    Messages:
    465
    Likes Received:
    1
    well IMO, doom3 (alpha) doesnt look all that great, i think resident evil on my gamecube looks better
     
  17. Mulciber

    Regular

    Joined:
    Feb 7, 2002
    Messages:
    413
    Likes Received:
    0
    Location:
    Houston
    The way I read it the ARB2 path is already optimized for ATI, so that point is moot.

    And Carmack doesnt make games, he makes engines...so moot I say.
     
  18. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    I don't want to turn this into a big back-and-forth argument, so I'm going to try and clarify things.

    HZ on R300 keeps the max of all Z values in each tile (assuming the convention where higher Z is further away). If a current polygon's closest z (i.e. lowest z) is bigger than the value in the corresponding HZ tile, the polygon is discarded, as it entirely fails the test. This is ordinary depth pass rendering.

    If you now want to do depth fail rendering, you have no HZ acceleration. HZ holds the max, so you can't tell when a polygon entirely passes and thus should not be rendered in depth fail mode. In other words, pixels from the stencil volumes that pass the depth test will not change the stencil buffer, but cannot be discarded rapidly by HZ. Polygons that fail can be discarded, but you don't want to discard them since they need to update the stencil buffer one pixel at a time.

    NOTE: when I say polygons wrt HZ, I mean tiles or blocks of pixels within polygons.

    This is what was written in ATI's HZ performance guidelines: If you change the Z function from depth pass to depth fail, HZ can't work. I just explained to you why, and this is what Basic was talking about. ATI then falls back to ordinary Z-buffering.

    Grall said stencils, then said models in a statement enclosed in parentheses immediately thereafter. There is no need for you to nitpick.

    If we are CPU limited, why the hell would he be talking about video card performance? All you have to do to test the video card is raise the resolution so that framerates are significantly faster than when disabling rendering altogether.

    Whenever you talk about video card performance, you mean not CPU limited. Otherwise you are either talking about driver overhead or have no idea what you're talking about, neither of which apply to John Carmack's statements.

    The CPU does not wait for the GPU to finish the stencil drawing, nor the other way around. Things get queued up, with the GPU finishing rendering one frame while the driver caches the draw commands for the next. The drawing calls in the function do not wait for the GPU to finish before returning. This is probably the most fundamental of driver enhancements to reduce CPU usage.

    The only time it fails is when you change rendering resources like textures or vertex buffers in the middle of a frame, or if you need to get a result back, like doing a framebuffer read or using occlusion query, in which case you empty the queue. Even the latter has mechanisms for issuing the query and retrieving results later. If working with dynamic vertex buffers, then the driver can make a copy of the vertex data, via CPU (or AGP, I think), and queue it.

    Carmack knows very well how to optimize a program. He will not let both GPU and CPU have any significant idle time in the same frame. If the driver doesn't do what I said, then he will ping-pong between vertex buffers from frame to frame.

    So even if you generate your shadow volumes and send them to the GPU one at a time, the driver will effectively wind up drawing them all together some time later.
     
  19. elchuppa

    Newcomer

    Joined:
    Jan 20, 2003
    Messages:
    56
    Likes Received:
    0
    you're right about analyzing shakespear.. I was actually thinking more the bible hehe. Like a bunch of priests arguing over the meanings in some obscure passage of the old testament.

    It's interesting how some people seem to hate Carmack, to me he has never been particularly abrasive nor judgemental in style or manner. I suppose that he hasn't ever really said anything I didn't want to hear. Still, to be so angry at the guy seems peculiar. He seems to know more about what he's doing than most...

    I always like reading his .plan updates that's for sure.
     
  20. no_way

    Regular

    Joined:
    Jul 2, 2002
    Messages:
    301
    Likes Received:
    0
    Location:
    estonia
    Plus, he's a great rocket scientist. er no, make that engineer.
    Still, i wonder what code path will the original Radeon use ?
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...