GeForce FX: 8x1 or 4x2?

Discussion in 'General 3D Technology' started by Dave Baumann, Feb 10, 2003.

  1. Althornin

    Althornin Senior Lurker
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,326
    Likes Received:
    5
    LOL!

    So basically, its ATI's fault that nVidia hasnt released agpgart drivers.
    I think Chalnoth, that you are grasping at straws. Why dont you GIVE UP your bias, instead of hiding behind the "everyone is biased" shield?
    Take off the rose colored galsses, and apply some of the logic you can use when your favorite IHV is out of the picture!
     
  2. antlers

    Regular

    Joined:
    Aug 14, 2002
    Messages:
    457
    Likes Received:
    0
    Actually, there are certainly some reasons to prefer the NV30. The Quadro FX certainly spanks the FireGL in workstation-type applications.
     
  3. kyleb

    Veteran

    Joined:
    Nov 21, 2002
    Messages:
    4,165
    Likes Received:
    52
    not much more than the nv28 quatro does in most situations, does it antlers4?
     
  4. Dave H

    Regular

    Joined:
    Jan 21, 2003
    Messages:
    564
    Likes Received:
    0
    While it was certainly exciting to read Dave's startling claim, see the proof roll in, and finally trap Nvidia in their own web of deceit and all that...4x2 vs. 8x1 is really pretty darn close to a non-issue on a card with a theoretical bandwidth of 256 bits/clock. It should tell us something that in order to even notice the effect we needed to run a fillrate tester (extraordinarily non-representative of normal gaming conditions) with 16-bit color, and even then underclock the core just to be absolutely sure. AFAICT there are no real-world single-textured fragments where 4x2 vs. 8x1 will make a lick of difference on such a bandwidth limited card. And all the zero-textured fragments (z/stencil passes) execute at 8 zixels/clock.

    Yes there may be a measurable difference in performance under real-world conditions when applying an odd number of bilinear-filtered textures (i.e. trilinear textures count as two). Which is probably a pretty common occurrence: i.e. an object with, say, two trilinear textures plus a (bilinear) lightmap. And the lightmap in particular is probably a low bandwidth-utilizing item, so yes, you may see a performance difference between 4x2 and 8x1 in that situation.

    But so what? The crucial point is this: the overall performance impact of 4x2/8z vs. 8x1 on a 128-bit DDR card is likely to be less significant than the impact of any number of hardware decisions which are not publicized. Such as:
    • texture cache size
    • triangle switch latency
    • pass/shader switch latency
    • memory latency
    • capacity of various internal buffers
    • latency of particular shader ops
    • particular AF algorithm used
    • particular z/color compression algorithm used

    And so on, and so on, and so on. I don't know nearly enough about 3d hardware to make anything close to an exhaustive list, but I know enough to know it would be pretty long. Why bother complaining because the fixed-function pipeline organization isn't disclosed? Nothing about GPU internals is disclosed. Compared to CPUs, the field is an absolute joke. Why is it that the only part of 3d graphics treated as a science (i.e. with papers published/presented) is the software side???

    But...but...but claiming "8 pixel pipelines" on the spec sheet was misleading! Again, so what? Is this like the first time a spec sheet for a 3d card has exaggerated? Are we really supposed to be outraged at this point? "8 pixel pipelines" is a hell of a lot closer to the truth than "48 GB/sec effective bandwidth".

    Of course the most important unsolved question (although we've seen either some interesting guesses or helpful leaks in this thread) is the functional unit organization and limitations of the shader pipelines. This is the only truly important question about GFfx 5800 Ultra performance (so far as the core goes), as any time you're core-limited in fixed-function rendering you're either at such high framerates that performance doesn't really matter much, or you're at least at such high resolution/AF settings that you can really just take it down a notch without suffering undue hardship. (Of course this is not going to be true for GFfx 5600 or 5200, but then their shader pipelines will likely be different as well.)

    But as far as official Nvidia information goes, we don't know anything about the shader organization, except that it can do "8 [pixel] shader ops per clock," plus some vague PR drivel. Can you imagine a CPU architecture where the number of functional units and the pipeline organization was a secret even six months prior to launch, much less after launch? It's absurd.

    And don't give me that bunk about how details of the shader pipeline aren't important because it's not programmer visible, or how disclosure of cache and buffer sizes isn't necessary because data usage patterns for 3d rendering aren't amenable to targeting particular cache sizes. For one thing, nobody programs in assembly anymore, and there's probably plenty a 3d programmer could do to better exploit the GPU's memory hierarchy and various latencies if it wasn't just a black box.

    But that's not even the point. CPU engineers present papers with nearly complete details on things like their new transistor design scheme which cuts leakage current, or their new SRAM design technique which saves die space or increases scalability, or their new clock distribution scheme which cuts skew. Often the first CPUs to utilize these techniques won't even be released for three years or more. And quite obviously they are completely transparent to the end-user/programmer except in terms of cost, clockability, power consumption, etc.

    Meanwhile GPU IHVs refuse to disclose the slightest details about their chips, even details which significantly affect performance and could in all probability allow better software optimization. Even after their release. Even though the design process is so heavily pipelined that even if their competitor wanted to steal these design "secrets", they likely wouldn't get a chance to until 3+ cores down the line, at which point the techniques would likely be obsolete.

    CPU design retains many of the hallmarks of good, open science. GPU design is just sleazy industry. If you're looking for ethics, you best move along. If you're just noticing the trickery and misrepresentation, get used to it. A 4x2 NV30 is the least of our worries.

    (IMHO 8) )
     
  5. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,902
    Likes Received:
    218
    Location:
    Seattle, WA
    No, it's ATI's fault their drivers depend on working drivers elsewhere. This is particularly a bad idea with Linux, where hardware support isn't always assured. And regardless of where the blame lies, the fact still remains that I cannot use an ATI card in my main machine. Where you choose to place the blame doesn't change that.

    And I still can complain about ATI's video drivers under Linux because they have no support for PCI-66.

    As for nVidia, yes, I have complained about the quality of their nForce Linux support in the past. It is very, very poor compared to their excellent video support.
     
  6. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,902
    Likes Received:
    218
    Location:
    Seattle, WA
    There are still problems with only supporting 4x2, but they do get less and less as pixel complexity increases. The primary problem will be 3-textured scenarios, though I doubt those will be very common.
     
  7. demalion

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,024
    Likes Received:
    1
    Location:
    CT
    I've restricted quoting to what I've disagree with. If I dropped anything important, please point it out.

    Well, as you go on to recognize, any time the output can take more than one clock cycle to produce, differences can occur. Which is why, for example, 8x1 is different than 4x2 and 4x2 is different than 4x1 in some situations and not others...

    Borrowing from someone's car analogy, that's like saying the importance of the number of cylinders in an engine doesn't matter because not every detail about the engine is revealed. The number of cylinders can be an important clue for the characteristics of the car with regards to the meaning of other reported aspects like the top speed, acceleration, fuel mileage, even if other details that have influence are not mentioned.

    I still continue to completely fail to grasp the validity of rationale behind such "so what?" reasoning. Of course 4x2 can perform like 8x1 in many circumstance, but it is exactly those circumstances in which it does not that the difference in nomenclature signifies. Or should nvidia call their 8x aniso 16x aniso, or ATI call their 4x AA, 8x AA? Where does this rationale lead us? Should I go on a campaign lambasting the nv30 for being 4x1? I have as much justification as you propose nvidia has for 8x1...

    Eh? So because they've done worse, the lesser distortion shouldn't be criticized? I'm missing the central component of your rationale today...maybe I dropped something crucial in the text I didn't quote, but I didn't notice anything in it that makes sense to me for the gist of your comments here.

    We agree that the primary significance is not the fixed function pipeline, it is the flexible processing pipeline and I think the discussion has been focoused on this for a while now, not about 4x2 compared to 8x1 texture handling capability. In the R300, each "pixel pipeline" has a complete set of calculation functionality (as I labelled "proxel pipeline"). If the nv30 had 8 sets, your comments would make more sense to me. This is why I don't criticize the discussion of "pixel pipelines"...because for the nv30's problems it doesn't seem to distort the issue at all. That's why in regards to "zixel" performance, terminology like "4x2 plus something" is being used.

    With the type of reasoning nvidia uses (or, rather, the reasoning used to defend their naming) the number of "pipelines" the R300 has could be exaggerated in an unreasonable fashion as well. Complaining about the nv30 being advertised with "8 pixel pipelines" is valid for exactly this reason, IMO.

    Please note the following: As I expect the nv35 to have "8 proxel pipelines" of calculation functionality, I'd expect to agree with many of your points in regards to it. Namely, as I've said before, I don't expect to have signficant issue with the nv35 being called an 8 pipeline part (though I'm partial to the term "proxel pipelinel" since I made it up ;) ), though with nvidia's mindset they could conceivably end up justifying "16 pixel pipelines". In any case, trying to gloss over disadvantages in your hardware should be restricted to phrasing that reasonably represents the truth (which 8x1 does not appear to do for the nv30).

    Of course, it is also possible that it is only 8x1 with the processing functionality still inadequate, (I personally hope nvidia is not that foolish, but with their dedication to integer processing legacy and the high transistor count of the nv30, it might be possible unless the nv30 design is broken in some critical way), in which case I'd still end up defending their 8x1 naming if they used it, and my criticism of the nv35 would be isolated to the chip instead of the company's labelling of it (i.e., I'd still end up agreeing with many of your points).
     
  8. Colourless

    Colourless Monochrome wench
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,274
    Likes Received:
    30
    Location:
    Somewhere in outback South Australia
    Call me stupid, but I read through this entire thread (I think), and I'm not going to read it 'again' just to find out... WTF a Proxel is exactly?

    At least Zixel it sort of indicates what it actually is supposed to be... Proxel though, I'm just lost
     
  9. antlers

    Regular

    Joined:
    Aug 14, 2002
    Messages:
    457
    Likes Received:
    0
    Why 8x1 vs 4x2 is (was) interesting and important:

    Regardless of the implications of 8x1 vs. 4x2 for performance in simple apps (not much), the real interesting part of 4x2 is that it implies that there are only 4 PS2.0 shader pipes, which explains the otherwise unexplainable low scores for PS1.4/PS2.0 shaders, and also implies that the extent these low scores can be fixed by drivers is limited.
     
  10. MuFu

    MuFu Chief Spastic Baboon
    Veteran

    Joined:
    Jun 12, 2002
    Messages:
    2,258
    Likes Received:
    51
    Location:
    Location, Location with Kirstie Allsopp
    I believe it just a process representation. Not sure either, to be honest.

    MuFu.
     
  11. demalion

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,024
    Likes Received:
    1
    Location:
    CT
    ROFL, :lol:. I'll explain the concept again since the explanation is scattered over a massive thread.

    My concept of a "proxel pipeline" is the ability to independently conduct processing in a clock cycle (would you have preferred "procel"? sounds even geekier to me :p ) on a picture element simultaneously with processing other picture elements, and a "proxel" is that output, with an emphasis on "complete" output (a zixel could be considered to be the z/stencil part of that output, atleast while they are linked directly in architectures, but a proxel pipeline is dictated by the most limiting circumstance between all factors of color/z/stencil output, which used to be known as a "pixel" before the nv30).

    The basic idea behind it is to have a "proxel pipeline" relate to "proxel fillrate" as "pixel pipelines" used to be related to pixel fillrate. It is in the "fine" tradition of 'texels" and, more recently, "zixels", to offer more opportunities to accurately portray strengths and weaknesses of a design.
    This could allow 4x? notations that actual made sense for shading (see below), but until people got over laziness to perform multiplication themselves, it would most likely be best to simply count the pipelines or focus on proxel fillrate. The nv30 would still have problems with architecture expressed in proxel pipelines (they couldn't playas fast and loose with the definition of proxel), but that's where proxel fillrate (similar to texel fillrate) comes in. Also, a valid case could be made for calling the nv30 "8x0.5", but simply calling it 8 pipelines would be inaccurate.

    The proposed measurements were minimum proxel fillrate, which illustrates worst case behavior (for the nv30 this would be fp32/texturing, I think), maximum proxel fillrate (for nv30 would be intermixed integer and fp16 with no texturing access), which indicates best case (where actual calculations occur...again see my mention of completeness and think of the z only shader performance figures), and a standardized measurement (which would likely resemble pocektmoon's benchmark testing examples), which would let NV30 (and hopefully more significantly NV35) optimizations and R300 >1 op per clock circumstances to be represented in a real-world usage related way.

    Oh, and MIPS is one valid unit usage for proxel fillrate, it is just the specifications of which MIPS measurements are useful that differentiates proxel fillrate from some of the MIPS figures reported.

    This would both serve the needs of the consumer in clearer information (the details of the proxel are perhaps complicated, but the basic idea of it and how it can be used for comparison is very very simple) and the need of marketing who could tout the maximum proxel fillrate with a great deal of leeway yet with more accountability to actual fact.
    Maximum proxel fillrate should actuallly be more useful than texel fillrate going forward (assuming shader length is the way things will progress)...I don't expect manufacturers to mention the fillrates that don't show them advantage, but reviewers, for example, could balance things more easily for comparison if the concept was adopted (sort of like pixel fillrate compared to texel fillrate...again, before nvidia's approach to defining pixel fillrate).

    Not that I expect something is "non-sexy" sounding as "proxel" to catch on, :p , but maybe "zixel" might which would be a start in the right direction.
     
  12. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,560
    Likes Received:
    157
    Location:
    In the Island of Sodor, where the steam trains lie
    I personally don't see any need to manufacture new (and rather confusing) terms.
    Surely it can be explained simply enough by saying that a system might perform an operation on N pixels in parallel in each clock (or equivalently N operations on one pixel), but might only be able to write results to the frame buffer at a sustained rate of M pixels/clock, where N > M.

    As for performance arguments, as long as the number of operations performed on each pixel is >= N/M (which is increasingly likely) then is it really such a big issue?
     
  13. Joe DeFuria

    Legend

    Joined:
    Feb 6, 2002
    Messages:
    5,994
    Likes Received:
    70
    Surely, it can! Now you just need to go and convince PR Department X-Y-Z, that's how they should promote the tech, and we'll all be happy. Unfortunately, they'll likely give you that "deer caught in the headlights" look, and say:

    "But that's not sexy!" Hey....but what's this I heard about a Zixel? There's both a "Z" AND an "X" in that term!
     
  14. LeStoffer

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,253
    Likes Received:
    13
    Location:
    Land of the 25% VAT
    I agree that's increasingly likely - and I would add that it’s probably increasingly necessary to include more shader ops power per pipeline in the future - but I just don’t like it when a company promote a GPU as a “8 Pixels/Clock Rendering Pipelineâ€￾ when that definition
    goes against anything used up to now.

    But again then I'm a part of the Old Fart Club around here, so I might be a bit too old-fashioned. :wink:
     
  15. horvendile

    Regular

    Joined:
    Jun 26, 2002
    Messages:
    418
    Likes Received:
    2
    Location:
    Sweden
    Wouldn't that be zexy?
    :?
    I'll just shut up.
     
  16. antlers

    Regular

    Joined:
    Aug 14, 2002
    Messages:
    457
    Likes Received:
    0
    But the problem isn't just that it can only output half the pixels per unit time that we expected (as pointed out again and again, not likely to be a bottleneck); the problem is that it has only half the "advanced" shader execution units that the advertised 8x1 led us to believe. 4 FP shader units vs. 8 is in fact an important distinction.
     
  17. Colourless

    Colourless Monochrome wench
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,274
    Likes Received:
    30
    Location:
    Somewhere in outback South Australia
    All you need now is add a "Q" and you'll be set to... take over the world... or some other nonsense.
     
  18. Joe DeFuria

    Legend

    Joined:
    Feb 6, 2002
    Messages:
    5,994
    Likes Received:
    70
    That's basically my position.

    On a related note, I don't think anyone really had any issue with 3dfx coming up with a new term for "miltitextured fill rate." The problem with what 3dfx did, is the actual term they came up with: "Texel" rate. That was indeed a problem because to the industry a Texel Rate was already defined as something completely different, and had nothing to do with writing pixels.
     
  19. Reverend

    Banned

    Joined:
    Jan 31, 2002
    Messages:
    3,266
    Likes Received:
    24
    "proxel" sounds like a medicine.

    Jokes aside, I agree with Simon.
     
  20. antlers

    Regular

    Joined:
    Aug 14, 2002
    Messages:
    457
    Likes Received:
    0
    Isn't it significant when N is 4 instead of 8 (as it appears to be when the operations in question are floating-point)?
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...