R400/R500 guessing game

Discussion in 'Pre-release GPU Speculation' started by T2k, Jan 28, 2003.

  1. MuFu

    MuFu Chief Spastic Baboon
    Veteran

    Joined:
    Jun 12, 2002
    Messages:
    2,258
    Likes Received:
    51
    Location:
    Location, Location with Kirstie Allsopp
    I am not in a better position to know - I'll try and get myself into one, but a few people here could just as easily do that with a little effort. My guess would be 500MHz, but I added 100MHz based on the fact that I "guessed" 250MHz for R300. ;) :lol:

    Can twist the words "adaptive" and "hybrid" all you want but they are just something I heard off-hand and probably don't amount to much. At the time I was fishing for the dirt on R300, lol.

    A PS/VS unit that dynamically "partitions" a combined PS/VS pipe based on demand - is that even possible?! Surely it would have to be coded for. Hmm... I initially thought "hybrid" meant 16x1/8x2 - just a hunch, but since this was *ages* ago it probably refers to mixed-mode rendering of the sort that we see in current parts (i.e. extensive occulsion in the pipe but still essentially IMR).

    Stop the hybrid/adaptive talk now! I still think the shading pipeline is where all the big advances are...

    MuFu.
     
  2. MuFu

    MuFu Chief Spastic Baboon
    Veteran

    Joined:
    Jun 12, 2002
    Messages:
    2,258
    Likes Received:
    51
    Location:
    Location, Location with Kirstie Allsopp
    P.S. All those in favour of clockless graphics rendering architectures say aye!
     
  3. Fuz

    Fuz
    Regular

    Joined:
    Apr 17, 2002
    Messages:
    373
    Likes Received:
    1
    Location:
    Sydney, Australia
    Yes, cockless for me please!

    Aye!
     
  4. CMKRNL

    Newcomer

    Joined:
    Jul 12, 2002
    Messages:
    91
    Likes Received:
    0
    Actually, hybrid and adaptive are probably good descriptions of both R400 and NV40, so I wouldn't dismiss it so quickly MuFu :wink:

    As for the F117A comment, I have no idea what that means or what it's referring to.
     
  5. Ascended Saiyan

    Newcomer

    Joined:
    Feb 20, 2002
    Messages:
    100
    Likes Received:
    0
    Is that even possible? :?


    Anyway,I thought that is where the adaptive/hybrid talk pertained to in the first place in the form of if the R400 has an ingrated shader basically resources can be 'intelligently' allocate where it's most needed whether it be towards triangle setup or pixel shading.
     
  6. MuFu

    MuFu Chief Spastic Baboon
    Veteran

    Joined:
    Jun 12, 2002
    Messages:
    2,258
    Likes Received:
    51
    Location:
    Location, Location with Kirstie Allsopp
    LOL. I love you, man. :)

    MuFu.
     
  7. Maverick

    Newcomer

    Joined:
    Aug 7, 2002
    Messages:
    68
    Likes Received:
    0
    Actually, I'd say that R300 was most like the F117A; It sure as hell didn't show up on nVidia's radar... :wink:
     
  8. elroy

    Regular

    Joined:
    Jan 29, 2003
    Messages:
    269
    Likes Received:
    1
    Hmm, I remember hybrid being a word used to describe Fusion (the 3dfx part after Rampage). Gigapixel tech in NV40 maybe?
     
  9. Gunhead

    Regular

    Joined:
    Mar 13, 2002
    Messages:
    355
    Likes Received:
    0
    Location:
    a vertex
    F177A? Why not F22 instead? Apples to apples...

    F22 has the ability to supercruise naturally so unlike competitive designs it doesn't come with an afterburner :lol:

    Okay, all analogies break down at some point...

    ***

    Then, any ideas on R500? What should we expect for transistor count and MHz at 0.9 micron? And moreover, if R400 is DX10 [and BTW what else is new there but VSPS 3.0?], is R500 DX11, and what features/functionality is that likely to bring? And "Universal Shader 4.0" won't really reveal it to me :lol:

    Rampant, completely untamed speculation welcomed.
     
  10. Humus

    Humus Crazy coder
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,217
    Likes Received:
    77
    Location:
    Stockholm, Sweden
    = Female?
     
  11. Basic

    Regular

    Joined:
    Feb 8, 2002
    Messages:
    846
    Likes Received:
    13
    Location:
    Linköping, Sweden
    F117A? Is that those stealth-planes that look like a failed origami-experiment?

    Does that mean that we should look at French sites for any leaks. I remember that USA were kind of miffed at the French sometime around the gulf war, since the French had radars that could pick up the stealth planes.


    demalion:
    What uttar said is different and rather orthogonal to a Z-first pass.
    It won't remove any hidden pixels (above what R300 already do). But it will reduce VP calcs. Color/lightning calculations in the VP will only be done for polys that are visible (at the time they are rendered).
    It should be possible to combine it with a Z-first pass.

    I'm not sure the idea is worth the extra hardware complexity though.
     
  12. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    It's different from what S3 claims.

    The S3 method goal, AFAIK, is to minimize color/Z writes and use the least fillrate possible. It's also quite useful for developers because not ordering front-to-back is not as important anymore, if you develop with that GPU in mind ( which is unlikely... )

    The goal of my method is to reduce VS usage. And you absolutely need to order front-to-back for it to be of any use. It's significantly more complex to implement, too.

    In which cases can it give signifiant advantages? Look at 3DMark 8 Lights test. The performance drop, compare to 1 Light, is huge. Using my idea, united with front-to-back ordering, the 8 Light test performance should barely be any slower than the 1 Light test performance ( 10% hit maximum ) - with the GFFX, it's currently a 70% hit...

    Basic: IMO, this idea isn't worth the extra complexity in current hardware. However, in a case where everything is done with the same units and where cache is used to have the last few programs which were used, such a system would not be as complex as it seems. The problem would be getting developers support, it would be a lot more efficient if they would separate everything with that in mind. The drivers could do it automatically, but it might never be as efficient.
    So, if that whole "all units are the same" thing is true for the R400, my idea could be in it... But then it isn't mine anymore, now, is it? :D


    Uttar
     
  13. Dio

    Dio
    Veteran

    Joined:
    Jul 1, 2002
    Messages:
    1,758
    Likes Received:
    8
    Location:
    UK
    There's quite a lot of theory around about clockless / asynchronous architectures, mostly in the low-power field, although not too many practical examples.

    The AMULET group at Manchester CS have some interesting things on this. They've been building aysnchronous ARM cores for years.

    http://www.cs.man.ac.uk/amulet/
     
  14. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    Vertex processing is cheap though.
     
  15. Hellbinder

    Banned

    Joined:
    Feb 8, 2002
    Messages:
    1,444
    Likes Received:
    12
    ok, how bout these.....

    R300 = Drag racer
    R400 = Formula One

    R300 = Brute Force
    R400 = Intelligent Design

    R300 = Culmination of generations since R100
    R400 = Instigation of new Generational Relationships for future products

    R300 = Pack the crap in there till we kick their ass
    R400 = Step back, re-evaluate, get Mr. Peabody involved

    R300 = More parts = more fun for U
    R400 = Do we still neeed all these parts???
     
  16. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    It's cheap for two reasons:
    1. So many transistors are going in it. Saving transistors by implementing a smarter architecture is always good.
    2. Current games don't do too complex vertex shading.

    Take the following shader:
    http://www.cgshaders.org/shaders/show.php?id=43

    With the current methods, *everything* in the shader got to be done even if none of the triangles defined by the vertex is drawn. With the method I described, you could only do the following line in such a case:
    OUT.HPosition = mul(WorldViewProj, IN.Position);

    Nice save, eh? :)

    One of the biggest disadvantage, even in an architecture which got a pool of calculators for both VS/PS, is that you need even more cache.
    You've got to cache the vertex X/Y/Z at first. But then, once other variables have been calculated, you've also got to cache that.
    So I'd guestimate a two times bigger vertex cache might be required to get optimal performance out of this method.


    Uttar
     
  17. demalion

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,024
    Likes Received:
    1
    Location:
    CT
    Yes, but I thought lighting wasn't done for the first Z pass? I thought transformation alone was, just like his methodology. If you aren't skipping transformation vertex processor calcs, what are you skipping?

    I thought triangle rejection was also already achieved by Hierarchical Z?

    Feel free to educate me on what I'm missing.
     
  18. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    Hmm, I think you're right on that. S3 First Z pass probably doesn't do lighting.

    But I don't believe it can decide not to do Lighting in the second pass based on the results of the first...


    Uttar
     
  19. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    Actually, thats what I'm getting at. Vertex Processing is cheap because relatively few transistors transitors going onto producing powerful vertex shaders as opposed to producing powerful fragment shaders. Its far cheaper to scale up the number of vertex shading processors - why do you think all 4 are still in 9500? Because the majority of the die is the fragment shader pipes, the 4 vertex shaders are small comparatively.
     
  20. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    You sure of that?

    I know fragment is more costly than vertex. But there are more really more than one reason to that.
    First of all, in most cases, there are less vertex pipes than fragment pipes.
    Secondly, you often include things such as Early Z / Hierarchical Z in fragment pipe count. Those things take fairly signifiant transistors, and their main use is to save fillrate. So it makes sense to include them there.
    Thirdly, fragment include TMUs.

    IMO, ATI also didn't remove the VS because it's easier to reduce texture quality than vertex quality, so it simplifies programmers jobs and all they got to do is design around the "R300", not around all the derivatives ( but that could be completely wrong, feel free to say it and I'll simply stop saying that )

    And anyway, let's suppose the VS take 20% of the die space on the R300. That would be 20M transistors. Imagine if you could reduce that number by 50%, but you'd have to add 5M transistors to implement my method. You're at 15M, which is 25% better. Saving 5M transistors and getting as good performance is really not bad IMHO. Of course, those figures are just guesses.


    Uttar
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...