GeForce FX: 8x1 or 4x2?

Discussion in 'General 3D Technology' started by Dave Baumann, Feb 10, 2003.

  1. UncleSam DL iXBT

    Newcomer

    Joined:
    Feb 27, 2003
    Messages:
    36
    Likes Received:
    0
    if we look at _curent games - NV30 ultra equal or faster in _most cases. from that standpoint architecture or per clock balance not matter - end result in game is final number only. but this dont matthers becouse you simply cannot buy nv30 ultra :)

    i am look at nv30 as

    1. card for developers
    2. card for analytic's like me ;)

    more products based on this technology and blocks arrive soon. i saw nv31 and nv34. And nv35 also in near by final. they will be produced in _real quantities. 100000 of nv30 most of them nv30gl is not a quantity enought to change a picture of today market.

    but nv31 surely can.

    most interesting battle for me - nv31 vs 9600 pro or how it will be called.

    its not first time, when NV debug on its customers ;)
     
  2. UncleSam DL iXBT

    Newcomer

    Joined:
    Feb 27, 2003
    Messages:
    36
    Likes Received:
    0
    i understand that you about combiners but question is realy they AFTER looped FP alu or looped WITH fpu together.

    may be you right. i am ask John for more details, may be 22 when he came to Moscow and we meet another time may be by phone at days...

    about one big FPU - one wide for all 4 pixels, as i guess before???
    from what points you came to this conclusion?

    They say that they make almost "no latancy" texture fetchings even if they dependent. i was exited but they say that they cannot open details HOW but surely it so :D

    so seems that this fpu is REALY deep piped.
     
  3. mboeller

    Regular

    Joined:
    Feb 7, 2002
    Messages:
    923
    Likes Received:
    3
    Location:
    Germany
    Yes and no. In the short run the bad perMHz performance of the NV30 will not hurt Nvidia, but in the long run it will them hurt badly; simply because they have already used all of the technical overhead to compete with the R9700Pro.

    NV30 : 500MHz 0,13µm
    R9700Pro : 325MHz 0,15µm

    So; with an lowK process out of reach for the NV35, an already overclocked and overheating chip, what can they do to improve the yield, speed and perMHz performance of the NV35? The headroom for improvements is IMHO rather slim (except for an 256bit bus).

    ATi on the other side can improve their chips a lot if they want. Higher clockspeed and an new process (0,13µm) will make it possible to improve the next chips (R9900; see nvnews.net) without being caught in an technological trap like Nvidia now.
    IMHO this trap is not entirely TSMC(?) fault, but also Nvidia's fault, cause they (as it seems) are not able to design high performance chips with an comparably low power usage.
     
  4. Reverend

    Banned

    Joined:
    Jan 31, 2002
    Messages:
    3,266
    Likes Received:
    24
    Thsi will be going off the thread's excellent investigative course but what the hell...

    I'm not sure if you're joking or not with your first sentence (with the winky there) but I find your second sentence to be immaturely ridiculous if based solely on what Dean/Croteam said on their website.

    Not a single developer in their right mind would camp with any one IHV. They are in it for the long term, and they want to make a game that sells well, or at least stays in the minds of folks, and, because of this, their next game will as well. There is more money to be made from selling lots of copies compared to being on the continuous "payroll" of IHVs for "choosing sides".

    My "point", and it appears to be the same with Dean, is from a programmer's perspective. How many people (programmer, casual gamer or otherwise) would make GF FX purchasing decisions based specifically on whether a card is actually 4x2 or 8x1 or 16x2 where such differences affects performance? None, really. The only thing that matters are the games/apps made and the final performance.

    The pertinent posts here discusses the possibilities of such differences and proving that the differences exists and affects performance with the given apps/games. Please tell me if such apps are coded in such a way that the programmer/developer knows beforehand that "I created this app because I know or didn't know if a card is a 4x2 or a 8x1".

    The short of it is that, IMO, if a developer encounters performance problems with a certain card, one of the last things they'll think of is "Hmmm... is this because this card is a 4x2 or a 8x1...". They don't. They have a set of priorities about what they want to achieve, graphically, with a game, according to the codes they know best, and according to API specs.

    If you were to tell me that what the pertinent posts here talks about is about proving which (4x2 or 8x1) architecture results in the best performance for the masses when other considerations are taken into account (MSAA, for example), I'd follow this up with you but subject to clockspeeds of a video card.

    But if you were to tell me that the discussion here is about PR tactics & deceptions that nobody but those with the most inquisitive-and-investigative 3D tech in mind, then I'd suggest we take this into the "3D Graphics Companies forum. There's more to talk about this (see below).

    We're here (in this particular forum) to talk about investigating the differences about 4x2 and 8x1 in a purely synthetic sense. If Dave, for instance, can reveal, in his forthcoming GF FX preview, that the differences in such different architectures affects performance in current and future games and will affect the purchasing decisions of the masses (i.e."buy this card because it has a 8x1 and not a 4x2... you can see that the difference in performance is because of the advantage a 8x1 has over a 4x2 and nothing else... always ask if a card is a 8x1 or a 4x2 before you make a decision to buy a card or not"), then I have nothing left to say.

    Lastly - Can you tell me for certain that every officially announced spec of a video chip is correct? Who left out what? Did they, or didn't they? Do you know? Did we know whether ATI explained to developers that their R300 pixel pipeline isn't actually IEEE-32bit SPFP but that if a developer "expects" 32bit that its actually converted to 24bit FP internally on the R300? Would ATI, unprompted, have revealed this? How much of a difference in importance is this compared to NVIDIA's ambiguous NV30 information - public, or to developers under NDA - about the NV30's pipeline config? Or, for another example, did NVIDIA reveal publicly that their GF3/NV20 fails to filter between different cube maps and that this will be a problem if I use low-rez cubemaps if bilinear is used? Who picked up on this? Would it have mattered to the degree this GFFX-4x2-or-8x2 apparently has if some website actually reported on this?

    The fact that Dave's investigative and curious nature, for something he either encountered by accident or specifically investigated on, revealed something about the NV30 doesn't mean that it is the only thing we can "doubt"... regardless of the IHV or what the IHVs revealed publicly. The only real benefit derived from this thread is this - that we cannot take anything fed to the press or the public for granted, whether if it is from NVIDIA or ATI, with NVIDIA being the unfortunate example here. If only we have the time, and the resources, to tackle everything. Yeah, sure, maybe, just maybe, the findings here actually result in IHVs paying attention to how they design hardware :)

    You said that "a lot of people made assumptions about the card based on the information that nvidia made available". When it comes to this 4x2-or-8x1 controversy, what assumptions are those, and how do those assumptions matter in the general scheme of things? Perhaps you want to substitute "card" with "company"... that way, I have no argument with you. But then, it's the product that actually finally, and financially, matters, right, not the name of the company?

    Perhaps, the best thing is that IHVs only reveal specs to developers and none to websites/press. That should make for all-around more interesting reviews from websites, no? :)

    PS. In case folks are thinking I'm truly a "nvidiot" because of what I say above, I'd like to say that the GF FX (Ultra) really either lacks true optimizations or there are some really serious hardware design flaws, based on the various feedback here and from developers to me (I don't have a card, can't test). It is incomprehensible, to me, that performance (the main issue, not features) are unexpectedly low given its clockspeeds (core and memory), and I'm not even talking about DX9 stuff. Never forget that whether a card - or a company - implements a 4x2 or a 8x1 or whatever depends on what they think they can achieve when clockspeeds are considered. This isn't something they base their design decisions on from developer feedbacks.

    pps. Enough with this (much more to say but, honestly, I don't really care to) and on with playing Splinter Cell, doing more interviews and finding out more 3D tech behind the latest games!
     
  5. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    What will become critical for the developer is not how the pipeline is organised, but whether their application runs as they would expect it should - if it doesn't then it may be a byproduct of the pipeline orgaisation.

    What I'm getting at is that NVIDIA have told people, including developer "this has 8 shader pipes", so the immediate expectations is that, clock-for-clock the performance will in the same ballpark as R300, however the shader performance we've seen thus far has been less than stellar, but I know that it can be one hell of a lot better than we've seen so far.

    I would suggest you pay particular head to Carmacks comments of NV30's driver shader compiler being 'twitchy' (but that NVIDIA will be able to optimise that in the driver) whereas R300's seems to take what thrown at it. My advise would be that this is due to the configuration of NV30's shader pipeline needing instrutions in an order that will be good for for it, which may not be the case with hand written shader code (see the importance of Cg now?) - whereas R300 may perform as expected, NV30 may perform much worse initially, making the developer wonder whats up. I would suspect that if they had adopted a more linear 8 pipe design this issue would not have arisen.
     
  6. K.I.L.E.R

    K.I.L.E.R Retarded moron
    Veteran

    Joined:
    Jun 17, 2002
    Messages:
    2,952
    Likes Received:
    50
    Location:
    Australia, Melbourne
    How many pages has it been? Has anyone yet come to a solid conclusion? :|
     
  7. martrox

    martrox Old Fart
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,065
    Likes Received:
    16
    Location:
    Jacksonville, Florida USA
    This may be the most important/intellegent post in this whole thread.... I really do thing that the FX is coming into view. I think maybe nVidia bit off more than they can chew ATM - sound like 3dfx a bit? While the FX may not be as bad as many here think - and not nearly as good as others think, it does seem like it's different enough to be a major pain to both classify AND program for! I still think nVidia's big mistake was/is it's halftruthing (outright lying?) PR..... I think we would all have been happier, and nVidia would have been much better served, IF they had taken the same amount of resources as it spent condeming FM and instead explained the FX.......

    Thanks once again, Dave......
     
  8. RoOoBo

    Regular

    Joined:
    Jun 12, 2002
    Messages:
    308
    Likes Received:
    31
    Scheduling less than 100 instructions (more or les the maximum practical size for a PS Shader, without taking into acount NV30 1024 limit) shouldn't be that hard. May be if they have to schedule instructions from different pixels (something like 'unrolling') in a weird way that could be that hard. So I still don't see what their problem really is. That is why I think that knowing about the real architecture of the pixel shaders is interesting.

    In the thread about tests to perform with the GeForceFX there were some about pure math shaders, may be that will show some light (if the problem was texture loads for example). More synthetic tests I could think are small and large shaders. Shaders with most texture instructions. Shader with no dependant instructions. Shaders with dependant texture loads. Shaders that only use some kind of instruction (mov, add, dp3) or group of instructions (complex math, simple math). Shaders using swizzle or not using it. Shaders using more or less temporal registers (I think someone posted a link to a DXDev list post about how that could be implemented in current GPUs). Shaders using or not using constants (remember that NV30 seems to use a inlined or memory read approach rather than a large register bank/memory on die like R300). Shaders with scalar instructions only or with vector instructions only. There is also the fact that NV30 PS shader implementation is quite different from PS2.0, comparing NV_fragment_program shaders with equivalent PS2.0 shaders could also be interesting.

    In less words, the kind of synthetic tests that wouldn't do any real work but that may be could help to discover how the shader compiler in the drivers or the hardware work. The fact that anything goes through another compiler layer doesn't helps to know what the real hardware is.
    Of course if NVidia was so kind to tell us what their problems really are all that stuff would be a bit of wasted time :lol:
     
  9. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    Well, I'd like to see one of the shader tests we have so far replicated via Cg - check the performance of the two and see what the difference between the written assembly and the Cg generated assembly is (if there is a performance difference), and that may tell you something. Dunno if its possible though.
     
  10. RoOoBo

    Regular

    Joined:
    Jun 12, 2002
    Messages:
    308
    Likes Received:
    31
    And comparing the output for that same shader program for NV_fragment_shader and DX9 PS2.0. Anyone who uses or knows Cg? In the Cg site may be there are already some examples that could be used.
     
  11. Joe DeFuria

    Legend

    Joined:
    Feb 6, 2002
    Messages:
    5,994
    Likes Received:
    71
    Yeah...all we need is Vince to start throwing insults at YOU to make the thread complete. ;)

    Right...and we're talking more from a consumer's perspective: that is, we have to buy the hardware from a selection of competing parts. The "INFORMED" consumer (like those here) will consider:

    1) How it performs in actual, current games
    2) It's architectural structure, and theoretical "rates", so that we can make judgements of how it might perform in games that have different characteristics of current games. (Games that are "fill rate" or "shader" limited," but not bandwdith limited, for example).
    3) Synthetic tests that bear out number 2.

    Add to that, the "grossly uninformed" consumer who reads sproduct spec boxes and reads box-art to decide which product to buy...

    I disagree completely.

    Read above.

    Again, if we have no REASONABLE picture of the architecture. we have little hope of trying to predict performance in OTHER games that stress other factors of the GPU.

    I'd like to see a "show of hands" of everyone who THOUGHT that the FX would really stomp the Radeon 9700 in "fillrate but not bandiwdth limited" situations" and in pixel shading synthetic tets, because "both had 8 pixel pipes, and the FX has a large Mhz advantage." I'll bet that's mostly EVERYONE.

    That's not a fair assesment. If an IHV believes a card is 8x1, then of course they don't consider it might be 4x2. Why should they? They point the finger at drivers in most cases.

    I think we all know the differences between 8x1 and 4x2 in a "purely synthetic sense." The only real question is, (which seems to be finalized), is the FX 4x2 or 8x1 when writing actual pixels.

    You're looking at this too black and white. No one thinks that a card is "crap" simply because it's 4x2, or a card is "great" simply because it's 8x1. The point is, knowing the difference between the two should be a FACTORfor the consumer in a buying decision. Because they have different performance characteristics.

    See my above question asking for a "show of hands."
     
  12. mboeller

    Regular

    Joined:
    Feb 7, 2002
    Messages:
    923
    Likes Received:
    3
    Location:
    Germany
    So in Your opinion we should benchmark or test the NV30 with Shaders made with Cg too to give an better overview of the capabilities of the NV30.

    Maybe some of the guys which have made the small benchmark programs could make the same programm with Cg too so we could compare the NV30 with Cg-shaders and normal shaders.
     
  13. mboeller

    Regular

    Joined:
    Feb 7, 2002
    Messages:
    923
    Likes Received:
    3
    Location:
    Germany
    I partially disagree. IMHO it's not important if the NV30 is 4x2 or 8x1 cause most or all current games are optimised for 4x2.

    IMHO the important question is; has the NV30 4 PS2.0-pipelines or 8 PS2.0-pipelines.
    This information would very much help to explain the performance in new games and in benchmarks.
     
  14. demalion

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,024
    Likes Received:
    1
    Location:
    CT
    It certainly looks like the tools for that are there. Anyone going to give it a go?

    Maybe now that Wavey has asked for it, someone will.
     
  15. Joe DeFuria

    Legend

    Joined:
    Feb 6, 2002
    Messages:
    5,994
    Likes Received:
    71
    That goes hand-in-hand with the 8x1 vs. 4x2 question though.

    What if the card has 8 PS 2.0 pipelines, but can only write 4 pixels per clock?

    In any case, the "number" of PS pipelines isn't as important as the overall shader instruction throughput. (Assuming shaders are decoupled from the pixel writing engine). I don't think tests will bear out how many "shader pipelines" there are anyway...just how fast instructions can be executed and also "written."
     
  16. demalion

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,024
    Likes Received:
    1
    Location:
    CT
    BTW, Rev, I'm curious how you'd respond to my post on the subject earlier. It might also address a lot of what Joe said if you did.
     
  17. BenSkywalker

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    823
    Likes Received:
    5
    I think I've read all of the posts in this thread, although it is moving to fast, and I have yet to see anyone ask so I'll give it a go.

    Is it possible the NV30 is 4x1+4x0? I keep looking over the benches and neither 4x2 nor 8x1 fits with what is being shown, but 4x1+4x0 does...?
     
  18. Joe DeFuria

    Legend

    Joined:
    Feb 6, 2002
    Messages:
    5,994
    Likes Received:
    71
    It's really none of those. ;)

    It can do apply two textures per pixel pipeline (assuming 4 pipelines), which is why the 4x2 nomeclature is used for pixel writing.

    I'll cut and paste a post I made on a different board:

    ****

    The "issue" that nVidia faces with their "pixel pipes" is the same that 3dfx faces with their multitexturing pipes. They can't describe accurately and do justice to the performance/architecture using today's conventional terms.

    3dfx basically wanted to convey that their "dual textured" pixel fill rate rate, was just as fast as someone else's "single texture" pixel fill rate. Rather than just saying that, they came up with the "texel rate", so they can slap on a number that was "twice as high" as someone else'e "texel rate" number.

    At least 3dfx used a DIFFERENT term ("Texel Fill Rate"), as to distinguish it from a "Pixel Fill Rate".

    Now, here we have the GeForceFX. The main problem here, is that we already have a given industry accpeted understanding and of what a "pixel pipeline" is. That is, something that can generate a PIXEL....something that contains COLOR values and a z-component.

    Unlike past architectures, the FX can do something NEW. It can write "Z" or "Stencil only" values 8 times per clock....twice as fast as it can write color values. Traditionally...all architectures write these values as the same rate. A Color value, a Z value / stencil value, or a color+Z value.

    What would have been ACCEPTABLE for nVidia marketing to do, is come up with a new "marketing term" for this capability. Just like 3dfx came up with the "texel" rate to describe its ability to apply more than one texture per pixel, nVidia should have come up with a new term to describe that they can write twices as many Z/ Stencil values vs. color values.

    nVidia's claims you can't really use "pixel pipes" and "TMU"s to describe its architecture....so it SHOULDN'T.

    Call it 4 "Hyper-Stenciling" pipelines. Come up with a new term for fill rate, like the Zixel rate. Doesn't matter. Just SOMETHING that preserves the TRADITIONAL MEANING of a pixel pipeline....which is something that can output a COLORED PIXEL.

    The FX pipeline is different, and therefore it would not be fair to nVidia to JUST call it a 4x2 pixel pipeline. However, it's also misleading to say it has 8 pixel pipetlines, because it cannot write 8 colored pixels, which everyone expects an 8 pixel pipeline architecture to be able to do.
     
  19. BenSkywalker

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    823
    Likes Received:
    5
    Yet it can draw more then four pixels per clock under certain circumstances.

    That describes a 1x1 pixel pipe, but not a 1x0.

    This isn't new though, there have been ~50Million 3D rasterizers capable of doubling the output of stencil v textured pixels sold to consumers to date. Is the problem because this type of approach hasn't been used on consumer targetted PC based hardware?

    The GS has sixteen pixel pipes, it simply lacks TMUs for those. Based on everything we have seen in terms of benchmarks for the NV30 it appears that the board is running 4x1+4x0. It's performance characteristics lie somewhere between then GS and more traditional PC hardware when comparing per clock pixel/stencil ops. Having a 4x1+4x0 configuration would allow them to combine for an effective 4x2 combination in older titles and allow for an eight pixel pipe configuration for upcoming titles that will rely more heavily on stencil while reducing the amount of transistors utilized.

    I understand your line of discussion on this, I just don't see it as refuting the possibility.
     
  20. Joe DeFuria

    Legend

    Joined:
    Feb 6, 2002
    Messages:
    5,994
    Likes Received:
    71
    XBox and/or PS 2? Although I thought the PS2 could write 16 pixels WITH COLOR, just not with a texture applied. According to the benchmakrs, the FX writes only 4 pixels with color...regardless if it's just a flat-shaded color, or a texture.

    I disagree, because 4x1 + 4x0 to me indicates that it can't apply more than 4 textures per clock. Clearly, the FX can do this.

    I would say it's more accurate to label it as "4x2 + 4x0". EDIT: though even that's still not quite right, because that implies (to me, anyway), that it could write 8 pixels in parallel as long as no textures are applied. Again, this is not the case with the FX.

    It all comes down to the definition of a "pixel". If we agree that a pixel must contain some COLOR, then 8 pipelines don't make sense to me, and is not correct. If we agree that an acceptable definition of "pixel" is z or stencil values only, then 8 pipelines is technically OK.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...