R400/R500 guessing game

Discussion in 'Pre-release GPU Speculation' started by T2k, Jan 28, 2003.

  1. T2k

    T2k
    Veteran

    Joined:
    Jun 12, 2002
    Messages:
    2,004
    Likes Received:
    0
    Location:
    The Slope & TriBeCa (NYC)
    IIRC, somewhere around the 9700-launch there was a notice like "Orton was more excited about the next chip R400 than the R300" - so, do we know anything about it? At least some rumours?
    :?:

    EDIT: Renamed a bit... ;)
     
  2. Mulciber

    Regular

    Joined:
    Feb 7, 2002
    Messages:
    413
    Likes Received:
    0
    Location:
    Houston
    I think a lot of people are expecting it to have the ability to render up to 16 pixel per clock, but I am very doubtful of this, just my opinion though. I am guessing they will move to a full 128bit pipe through the fb though, and probably support fp16 and fp32 just like nVidia.

    As for other guesses...who knows? Maybe the holy grail we all seek, such as deferred rendering, with the use of embedded memory? hehehe

    I think the nv35 will be a lot easier to guess, since its just a refresh. My guess is they'll move to 256bit bus, .13u low-k dielectric process, and up the core to about 700mhz. Maybe double the vertex units as well.
     
  3. megadrive0088

    Regular

    Joined:
    Jul 23, 2002
    Messages:
    700
    Likes Received:
    0
    my guesses:

    NV35
    *8 pipes
    *2 TMUs each (ala GF256 to GF2)
    *256-bit bus
    *a larger "sea of vertex math engines" than NV30
    * VS/PS 3.0+
    *700-750 Mhz core

    R400
    * 16 pipes
    * 1 TMU/TEV unit per pipe that can do 4 texture per clock!
    * 8 vertex shader units or equivalent of it
    *256-bit bus
    *GDDR3
    *small amount embedded memory
    *500-700 Mhz core
     
  4. hkultala

    Regular

    Joined:
    May 22, 2002
    Messages:
    284
    Likes Received:
    6
    Location:
    Herwood, Tampere, Finland
    not so likely.
    and GF256 could do trilinear / ( cycle * TMU ),
    GF2 needs to use both of it's TMUs to get trilinearry filtered texel,
    so the improvement was not so big.

    IMHO this is the most likely.
    This alone could make quite big performance impact,
    we may not see other major things.

    not likely, NV35 is just refresh of NV30,
    and AFAIK VS3.0's texture lookup seems "too radical stuff"

    GF FX is already clocked very high,
    I don't except clock speeds over 600 MHz.

    4 different textures? 4 texture samples?

    16 pipes with 4 different textures/(pipe*cycle) even at 4 samples(bilinear) would IMHO too much.
    And we are moving to situation where pixel shader FLOPS matter more,
    not texturing speed, so I don't see this kind of "texturing monster" reasonable.

    16* 1 * 8(trilinear) might be possible/reasonable.

    sounds reasonable.

    I except them to have one answer to the bandwith problem,
    not that many.

    Small amount of integrated framebuffer memory with traditional IMR would not help at all. Large number would help.
    Small number of integrated framebuffer memory with tile-based IMR would help.
    But then the buffer could maybe be made small enough to be made from SRAM, not eDRAM.

    anyway, if they were using internal framebuffers in either way, external memory bandwith requirements would be greatly reduced, so they would not need
    >40 GB/s memory system you are suggesting.
    (and if they would not need it, they would not use it as 256-bit GDDR-3 will be expensive, but if it's needed, they will use it, even at higher cost)

    Their push towards GDDR-3 suggest they will not use eDRAM,
    unless they will run their GDDR-3 at only 128-bit bus.
     
  5. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,081
    Likes Received:
    651
    Location:
    O Canada!
    The likes of R300, and possibly NV30, already have a small amount of on board memory to hold the Heirarchical Z-buffer levels. This could be increased so that more levels are added.
     
  6. T2k

    T2k
    Veteran

    Joined:
    Jun 12, 2002
    Messages:
    2,004
    Likes Received:
    0
    Location:
    The Slope & TriBeCa (NYC)
    Hmmm... a large embedded memory... sounds reasonable - Flipper... ;)
     
  7. Tagrineth

    Tagrineth murr
    Veteran

    Joined:
    Feb 14, 2002
    Messages:
    2,522
    Likes Received:
    15
    Location:
    Sunny (boring) Florida
    Yes, as in embedded memory not for the frame buffer, but as maybe an intelligent texture cache to assist one-cycle Trilinear? That would be awesome, ne? :)

    And R300 already uses a tile-based (albeit not deferred) frame buffer, and it already uses on-chip cache for that...
     
  8. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    I know most of R400's specs as of last summer, but who knows what changed. I think R300 was originally planned to be 8x2, but they changed that, seemingly for the better. I'm back studying now instead of at ATI.

    Anyways, a lot of you guys are quite a bit off. I seem to remember Hellbinder saying a couple of things that were fairly close.

    Well, can't say anything, or I'll get in trouble. :wink:

    I'll tell you that I like the architecture a lot, and can see why Orton said what he did.
     
  9. T2k

    T2k
    Veteran

    Joined:
    Jun 12, 2002
    Messages:
    2,004
    Likes Received:
    0
    Location:
    The Slope & TriBeCa (NYC)
    So, is it 8x2?

    OK, just tell us what he said a how close is that! ;)
    I don't remember his prophecies...

    :?:
     
  10. MuFu

    MuFu Chief Spastic Baboon
    Veteran

    Joined:
    Jun 12, 2002
    Messages:
    2,258
    Likes Received:
    51
    Location:
    Location, Location with Kirstie Allsopp
    Last I heard it was this crazy hybrid, adaptive architecture - essentially 16x1 though. The source I always had my doubts about so I'm not sure on this one at all. :?

    MuFu.
     
  11. T2k

    T2k
    Veteran

    Joined:
    Jun 12, 2002
    Messages:
    2,004
    Likes Received:
    0
    Location:
    The Slope & TriBeCa (NYC)
    Wow...

    Ho do you mean "hybrid" and "adaptive"?
     
  12. MuFu

    MuFu Chief Spastic Baboon
    Veteran

    Joined:
    Jun 12, 2002
    Messages:
    2,258
    Likes Received:
    51
    Location:
    Location, Location with Kirstie Allsopp
    Absolutely no idea, sorry - I am pretty much quoting there. Sounds far fetched doesn't it? Like I said - not sure about the source at all (was almost a year ago as well). Could well have been some f*cker spotting the opportunity to get a guillible ATi fanboi worked up over nothing. :lol:

    Hmm... I really have crap-all idea about architectures so it's tricky to make head or tail of the commments. I asked in a PM whether it was an 8x2 or 16x1 architecture - "16 rendering pipelines" apparently. Throws the idea of TMUs out of the window a little I think...

    MuFu.
     
  13. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,081
    Likes Received:
    651
    Location:
    O Canada!
    Well, as the onus moves away from texturing, to shading pipelines I've been wondering about the possability of architectures haveing fewer texture units than actual pixel (fragment) units....
     
  14. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    I don't really expect this to happen, but rather the number of ops per pipe per clock would increase. Handling more pixels at a time isn't really that useful, and can complicate the rasterizing method.

    Well, there is the Doom3 scenario of Z-only and stencil-only operations, but that's not because of the textures->shaders transition.
     
  15. MuFu

    MuFu Chief Spastic Baboon
    Veteran

    Joined:
    Jun 12, 2002
    Messages:
    2,258
    Likes Received:
    51
    Location:
    Location, Location with Kirstie Allsopp
    What about localised "double-pumping" of the architecture? i.e. instead of a loopback which is essentially a "stall" relative to the global clock, the texture units operate at twice the frequency. One for the analogue guys for sure - but perhaps it's a way of alleviating situations where PS/VS units are waiting on textured output in order to fully implement a shader routine. I remember reading a while back about how the move to *x1 rendering arrays made sense in terms of PS/VS being the way forward instead of multitexturing, but paradoxically left the shading units cycling redundantly in some situations.

    Again, I must apologise for my mediocre understanding of the way these things work - way too caught up in circuit theory/maths right now. Urgh! Hate maths... :?

    MuFu.
     
  16. MuFu

    MuFu Chief Spastic Baboon
    Veteran

    Joined:
    Jun 12, 2002
    Messages:
    2,258
    Likes Received:
    51
    Location:
    Location, Location with Kirstie Allsopp
    Screw that, how about R400 being totally async? :lol:

    MuFu
     
  17. Nagorak

    Regular

    Joined:
    Jun 20, 2002
    Messages:
    854
    Likes Received:
    0
    I thought Nvidia swore off GDDR-3?
     
  18. Luminescent

    Veteran

    Joined:
    Aug 4, 2002
    Messages:
    1,036
    Likes Received:
    0
    Location:
    Miami, Fl
    Yeah Mufu, it would be really cool if the individual vertex and fragment pipes could be partitioned and organized in different ways for different jobs, with the developer being able to code to the metal/have the flexibility exposed.
     
  19. Hellbinder

    Banned

    Joined:
    Feb 8, 2002
    Messages:
    1,444
    Likes Received:
    12
    *cough* I... May... Have ... Some idea.... *cough*

    8)

    After all, You guys all seem to have forgotten that i was dead on with the exsistance and release timeframe of the R350 back before november.... I *may* also have some sort of *pipedream* about the R350 as well of wich i *may* have dropped some subtle hints about at a certain website over the past coupple of months... ;)

    Thus far in the last year, my only big strike out was the 4x4 Nv30 fiasco. But in that case that was truley *speculation* :p
     
  20. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    I don't think so. I think it's been posted more than a few times that multiple TMU's per pipe are relatively useless today. I think that the fragment processor/texture filtering units won't change significantly. Mostly all that will happen are performance enhancements.
    More probable, but I still wouldn't count on it.
    I very seriously doubt this. Its fixed-function power is already incredible (Which, with either driver optimization, or once a high-poly optimized VP benchmark is released, should translate well to high VP performance), and games today just are not vertex processing limited.
    Would be very nice, but again, I wouldn't really count on it. I'd put this around the same probability as the 256-bit bus.
    I consider this somewhat unlikely. A core speed in that range with low-K dielectrics would probably be similar in heat to the current FX Ultra. I have a feeling that nVidia will not release another similar cooling solution anytime soon.

    I seriously doubt it. The die processes just aren't there yet.
    Um, definitely not.
    Again, same reason: die process just not there.
    Almost a given, but it would be nice if it can be shown between now and then that a 256-bit bus is not necessary.
    Highly-likely, according to current rumors, but what does this translate to in terms of performance? Some nice memory bandwidth-bound benches would be helpful here.
    Seriously doubt it, assuming you're talking about embeded DRAM. embedded DRAM still will cause a major hit in core speed. Not until external memory bandwidth becomes a serious limitation will embedded DRAM happen.
    Probably closer to 500MHz. Die process.

    I think the main thing about the R400 is that it is almost assured to have around PS/VS 3.0+. I doubt it will have much more. And yes, I really do feel that the die processes will have finally caught up with ATI with this next processor.

    But, at the same time, nVidia should not be counting on this right now. They should be looking at the best possible video chip they feel ATI can produce by this fall. Then they should increase that number by 50% and target that performance (ASAP...may be impossible to target that performance this fall).
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...