GeForce FX: 8x1 or 4x2?

Discussion in 'General 3D Technology' started by Dave Baumann, Feb 10, 2003.

  1. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,079
    Likes Received:
    648
    Location:
    O Canada!
    I'm getting a number of reports from people saying that they have not managed to get more than 4 pixel per clock out for GeForce FX. Normally, if running in 32bit, the 3DMark fillrate tests will not show more than four pixels per clock on the GFFX becuase of bandwidth limitations - however, even if the colour is reduced to 16bit and 16bit textures the multitexturing performance is still twice the the single texturing performance and the single texturing is still less than half the theoretical performance of an 8x1 card (1.4Gp/s single tex, 3.4Gt/s multitex).

    When Radeon 9500 PRO is run with these setting is does achieve a rate that is greater than 4 pixels per clock (as I pointed out to XBit labs in their pulled review).

    Obviously I'd like to veryify these claims myself, but I don't have a board at the moment.

    Brent - fancy doing a little more testing? If you can can you clock down the core but keep the memory high then run a fillrate test in 16bit.
     
  2. Joe DeFuria

    Legend

    Joined:
    Feb 6, 2002
    Messages:
    5,994
    Likes Received:
    70
    Oh man......if 4x2 turns out to be the actual case, someone (*cough* nivida, *cough*) is going to have a lot of explaining to do...

    Yes, would be great if we could see 16 bit fill rate scores with the core at 300, and the memory at 500....
     
  3. Livecoma

    Newcomer

    Joined:
    Feb 6, 2002
    Messages:
    67
    Likes Received:
    0
    Location:
    Limbo

    Like how they compete against R300 with less memory bandwidth and now half the pixel pipelines?

    I am not trying to question you or your sources Dave, but I would be surprised if this turned out to be the case...
     
  4. Joe DeFuria

    Legend

    Joined:
    Feb 6, 2002
    Messages:
    5,994
    Likes Received:
    70
    Actually, it would explain how they "merely compete" with the R300 in High resolution, no AA, and particularly with older "designed optimally for dual texturing" games like Quake3.

    As you know, based on theoretics (?) we were all expecting NV30 to soundly beat R300 at high resolution, non AA benchmarks....assuming that the pixel rate advantage was a factor of 1.5

    If The FX actually has a slight pixel rate disadvantage factor of 1.3, but still holds a TEXEL rate advantate of 1.5, I believe that would actually explain pretty nicely performace numbers we've seen...
     
  5. Livecoma

    Newcomer

    Joined:
    Feb 6, 2002
    Messages:
    67
    Likes Received:
    0
    Location:
    Limbo
    Wow you just said the same thing I said with an ATI bias.
     
  6. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,079
    Likes Received:
    648
    Location:
    O Canada!
    It would certianly explain some of the shader numbers we've seen so far - beyond that of simple driver inefficiencies.

    However, at this point I want to see some testing before reaching any conclusions. Even if it were the case that testing corroborates these results then there may be other explainations as well...
     
  7. Nebuchadnezzar

    Legend

    Joined:
    Feb 10, 2002
    Messages:
    974
    Likes Received:
    141
    Location:
    Luxembourg
    If this is true, then say goodbye to nvidia! :shock:
     
  8. demalion

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,024
    Likes Received:
    1
    Location:
    CT
    Alternate theory 1: What if all backbuffer handling is done at 32-bit? Seems like a driver possibility depending on how the color compression/AA works.

    I'm assuming AA is off for these test, but maybe the drivers aren't changing behavior for that.

    One way to maybe verify this (but not necessarily disprove it) would be to check for how fog or maybe translucency effects look in 16-bit.
     
  9. Livecoma

    Newcomer

    Joined:
    Feb 6, 2002
    Messages:
    67
    Likes Received:
    0
    Location:
    Limbo

    If this is the case NVIDIA must have fooled John Carmack because he seems confident driver optimizations will remedy that performance disparity. Considering his experience working with NV30, he should have noticed it. I am sure he would have pointed it out in his plan file.

    However I completly agree. More testing needs to be performed.



    Please, no conspiracy theories...
     
  10. Althornin

    Althornin Senior Lurker
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,326
    Likes Received:
    5
    You didnt say anything....

    And whats your attitude problem?
     
  11. antlers

    Regular

    Joined:
    Aug 14, 2002
    Messages:
    457
    Likes Received:
    0
    To me, Carmack didn't seem confident. He said NVidia seemed confident and was trying to interpret facts in that light.

    I think if Carmack really were confident that drivers would fix things (i.e., he had figured out what was wrong) his tone would have been different.
     
  12. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,079
    Likes Received:
    648
    Location:
    O Canada!
    We've been sitting around scratching our heads wondering if it can be doing 1 or 2 FP16 instruction per pipe, and based on the evidence so far thinking that it must be 1 FP16 instruction.

    Perhaps the reality is that its 1 FP32 instruction and 2 FP16 instructions, but over 4 pipes. This would corroborate with JC's findings, being that the ARB path (using 128bit / FP32) is half as fast as ATI's ARB path, but with the NV30 path (which is using FP16) its on par/faster. With a little more compiler optimstation, at 2 FP16 instuction per clock then the Ultra could end up faster than R300 - i.e. 2 FP16 instructions x 4 pipes x 500MHZ vs 1 FP24 instruction x 8 pipes x 325MHz. It could also explain the 'twitchy'ness in the compiler that John talks of - it could be tough to get two FP16 instructions per clock to actually run efficiently.

    Again, I'm not drawing conclusions but I'm suggesting that it can still fit based on JC's findings so far.
     
  13. Nappe1

    Nappe1 lp0 On Fire!
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,529
    Likes Received:
    3
    Location:
    South east finland
    oh godi godi godi...
    another flame war coming up...

    ahh I can already sense it. ;)
     
  14. Joe DeFuria

    Legend

    Joined:
    Feb 6, 2002
    Messages:
    5,994
    Likes Received:
    70
    As opposed to your nVidia bias? :roll:

    "What you said" made it seem like it was some amazing feat that NV30 could even "compete" with R300 if the NV30 was 4x2. I merely explained how it's not amazing at all. A 4x2 clocked at 500 Mhz SHOULD "compete" with an 8x1 architecture at 325 Mhz, in certain situations.

    In any case, to be perfectly clear, the EXPLAINING that nVidia would have to do doesn't really have much to do with performance at all. Performance is performance. It's just to do with what they said their architecture is. Which I believe they claim as 8 pixel pipelines...

    From the B3D "interview":

    Heh...now that I re-read that quote, nVidia never actually said 8 "PIXEL" pipelines. A 4x2 architecture would satisfy their response to the B3D question: "8 pipelines....8 textures per clock."

    Though I'm betting every "review" of the FX states 8 pixel pipelines "just like the R300".

    And nVidia's web-site proclaims "8 Pixels per clock Rendering Pipeline"

    The bottom line is, we expect certain performance characteristics from an 8x1 architecture, and they are different from that of a 4x2 architecture. And I'd have to say, the performance characteristics of a 4.2 architecture seem to more readily explain the current performance profile of the NV30.

    So as far as I'm concerned, this is a matter of whether of not nVidia is lying to us about their architecture. A few possibilities:

    1) They are lying. It's 4x2.
    2) It's acts as 8x1 only in some very specific situations?
    3) The hardware is 8x1, but current drivers limit the actual operation to be 4x2. Future drivers may "enable" 8x1 operation.

    Please be clear that I'm ceratinly with Dave on this....this is all speculation, and I'm not assuming that FX is actually 4x2. More data is needed. I'm just responding to "what would need to be explained" if it is determined to be a 4x2 pipeline.
     
  15. Livecoma

    Newcomer

    Joined:
    Feb 6, 2002
    Messages:
    67
    Likes Received:
    0
    Location:
    Limbo
    So the miscommunication wars begin...
     
  16. gkar1

    Regular

    Joined:
    Jul 20, 2002
    Messages:
    614
    Likes Received:
    7
    So thats what they meant by "New forms of marketing"
     
  17. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,902
    Likes Received:
    218
    Location:
    Seattle, WA
    If this turns out to be the case, then it would be very nice if we could get a benchmark of stencil fillrate...
     
  18. martrox

    martrox Old Fart
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,065
    Likes Received:
    16
    Location:
    Jacksonville, Florida USA
    If this is true, can you imagine how many pairs of undies got messed up at nVidia when the R300 was introduced? :shock:
     
  19. antlers

    Regular

    Joined:
    Aug 14, 2002
    Messages:
    457
    Likes Received:
    0
    There was that NDA'd .pdf that the review sites had well before the R300 was released that had the planned NV30 specs.

    I know people here have seen it (I don't know if it is still confidential)--did it refer to 8x1 or 4x2? From what Dave had said in the past, it sounds like it referred to 8x1. So their claiming 8x1 is unlikely to be a reaction to the R300 release, right?
     
  20. LeStoffer

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,253
    Likes Received:
    13
    Location:
    Land of the 25% VAT
    I have a hard time believing the 4x2 pipeline since nVidia promised us 8 pixels per clock, but I can follow what you're suggesting here. It would in theory give us 8 pixels per clock with FP16 - but where does that leave us with the integer path, Dave?
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...