GeForce FX: 8x1 or 4x2?

Discussion in 'General 3D Technology' started by Dave Baumann, Feb 10, 2003.

  1. LeStoffer

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,262
    Likes Received:
    22
    Location:
    Land of the 25% VAT
    Thanks a lot for the clarification, MDolenc!

    I think that I was thrown off by understanding the claim so that the register combiners could work in parallel with the fragment processor instead of in serial (e.g. working on different shaders instead of different instructions simultaneously).

    Sorry for being so dense, but I guess it’s about semantics with regard to the claim about double throughput.
     
  2. jasal

    Newcomer

    Joined:
    Mar 4, 2003
    Messages:
    4
    Likes Received:
    0
    Location:
    Italy
    What we know so far....

    OK now it's my turn to use some fantasious formula to describe NV30 architecture.
    Like was already said, we suffer a lack of terminology here, so perhaps we should use an extended notation:
    a simple traditional pipeline could be described with an AxBxC expression, being

    A=number of pixels processed
    B=number of textures applied
    C=number of Z/Stencil calculations performed

    while a "modern" pipeline, where FP calculations are needed to describe pixels, could be represented with AxBxCxD, where

    D=number of FP calculations performed.

    A conventional pipeline is 1x1x1 or 1x2x1, while a modern one could be 1x1x1x1 etc.
    According to what we've learned so far, NV30 is a kind of 4*(1x2x2x1), while R300 should be 8*(1x1x1x1); simplified:

    NV30= 4x2x2x1

    R300= 8x1x1x1

    So while as regards fillrate, multitexturing and stencil operations NV30 can be considered a very balanced part (bandwidth limitations and clockspeed considered) the main problem appears to be FP throughput, that on a clock x clock comparison should be only half of what is found in R300; data seen so far seem to confirm that. I personally thought that NV30 could operate 8 FP16 or 4 FP32 ops per clock (acting like a 4x2x2x2 when the highest precision is not used, like a 4x2x2x1 otherwise), while R300, if I understand well, can operate 8 FP24 all the time (8x1x1x1), but if the functional schemes shown in this thread (thanks UncleSam) are correct that's not the case: only 4 FP ops can be performed, wether FP32 precision is used or not. Again, data shown in this thread seem to confirm that.. Someone tried to find some complex explanation about scheduling and reallocation of resources (the NV30 'flexible architecture') that could still accomodate the original ipothesis, but probably the reality is much simpler: the stuff isn't there, i.e. the computing power just isn't there (like in the case of the 'eight' pipelines).
    What do you say (I wait for correction and integration)? And what about the performance of the two different architectures when FSAA is enabled?
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...