AMD: R8xx Speculation

Discussion in 'Architecture and Products' started by Shtal, Jul 19, 2008.

?

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

Poll closed Oct 14, 2009.
  1. Within 1 or 2 weeks

    1 vote(s)
    0.6%
  2. Within a month

    5 vote(s)
    3.2%
  3. Within couple months

    28 vote(s)
    18.1%
  4. Very late this year

    52 vote(s)
    33.5%
  5. Not until next year

    69 vote(s)
    44.5%
  1. nicolasb

    Regular

    Joined:
    Oct 21, 2006
    Messages:
    421
    Likes Received:
    4
    I suspect it was a hell of a lot better than that, but things were simpler in those days....
     
    #41 nicolasb, Jul 23, 2008
    Last edited by a moderator: Jul 23, 2008
  2. NocturnDragon

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    393
    Likes Received:
    17
    Geometry processing was still done on the CPU, so every GPU didn't have to do it...
     
    #42 NocturnDragon, Jul 23, 2008
    Last edited by a moderator: Jul 23, 2008
  3. nicolasb

    Regular

    Joined:
    Oct 21, 2006
    Messages:
    421
    Likes Received:
    4
    (nod)

    The GeForce 256 and GeForce 2 were doing geometry acceleration by then, but Voodoo 5 didn't - one of the reasons it didn't sell very well....
     
  4. Freak'n Big Panda

    Regular

    Joined:
    Sep 28, 2002
    Messages:
    898
    Likes Received:
    4
    Location:
    Waterloo Ontario
    Oh that explains a lot, thanks!

    I checked some benchmarks here and here and scaling is around 40-50%. And the cards are horribly CPU limited compared to GF2 hardware which makes sense given the CPU based geometry processing.
     
  5. nicolasb

    Regular

    Joined:
    Oct 21, 2006
    Messages:
    421
    Likes Received:
    4
    Voodoo 5 came out very late. If it had come out when it was originally intended to it would have been competing against GeForce 256 (which failed to hit its target clock speeds and had only nominal geometry acceleration) and would have done very well. But it had no chance against GF2.

    Getting off-topic, sorry! :oops:
     
  6. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,166
    Likes Received:
    1,836
    Location:
    Finland
    It was? The 2 chips that got to silicon, Pyramid3D and Axe, were at least both single chips
     
  7. Mat3

    Newcomer

    Joined:
    Nov 15, 2005
    Messages:
    163
    Likes Received:
    8
    With two chips working on different frames, if you wanted them to share memory, what besides textures would need to be shared? Assuming that's all you wanted to send from one to the other, how much bandwidth total (bi-directional) would be needed for that?
     
  8. Pressure

    Veteran Regular

    Joined:
    Mar 30, 2004
    Messages:
    1,336
    Likes Received:
    268
    I vaguely remember someone saying something about hot spots on the core.
    Could be wrong though.
     
  9. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,418
    Likes Received:
    178
    Location:
    Chania
    Nice glimpse into the past with all the antiquities here in this thread.

    I haven't a single idea what RV8x0 could look like, but I for one wasn't laughing at all when the 800SP rumours appeared which were quite rare opposed to the 480 nonsense that circulated in tons. We had seen internal notes that they wanted to improve amongst others texturing and Z fillrates. Increasing the amount of TMUs for a design like R6x0/RV6x0 sounded like a one way street if one wasn't to invest ungodly amounts of R&D resources and the simple fact that ALUs and TMUs are tied together in a SIMD logic.

    There's also a note somewhere that they supposedly tried to increase the ALU frequency in RV670 and failed. I'm not saying they did or didn't, yet if you sit back and think that a 1000 SPs at twice the frequency can do as much work as 2000SPs with half the frequency, I personally consider the 1000SPs not too little but a quite interesting question mark and a quite scary prospect for the foreseeable future.

    Bottomline there are many scenarios that could eventually make sense; without knowing the exact parameters there's nothing in my mind that sounds "too little" or "too lot".
     
  10. kyetech

    Regular

    Joined:
    Sep 10, 2004
    Messages:
    532
    Likes Received:
    0
    Ailuros,

    What exactly is your point? That you think r8xx could double the FLOP performance over r7xx?
     
  11. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    I, for one, went down a blind alley with a misinterpretation along those lines.

    The conclusion was that it was purely about routing congestion - too many wires trying to fit into a small area - a routing hotspot.

    Jawed
     
  12. Pressure

    Veteran Regular

    Joined:
    Mar 30, 2004
    Messages:
    1,336
    Likes Received:
    268
    Ah ok, thanks for clearing that up :)

    It made sense at the time though.
     
  13. nicolasb

    Regular

    Joined:
    Oct 21, 2006
    Messages:
    421
    Likes Received:
    4
    Glaze3D designs were a long time after Pyramid 3D: this was the "Extreme Bandwidth Architecture" part, with embedded-DRAM. I definitely recall discussions about how the multi-chip versions would divide the screen up into tiles, and that this choice was made because it would thrash the texture caches less than (say) the Voodoo 2 SLI approach of rendering alternate horizontal scan-lines.

    My memory is a little hazy but I think they may have talked about a 4-chip version of this, as well as 2-chip.
     
  14. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,166
    Likes Received:
    1,836
    Location:
    Finland
    I know it was after Pyramid3D, but it was also earlier than Axe.
    Anyway, I checked about it and indeed there was apparently plans for multichip, with Glaze3D and Thor chip, where Thor would be both TnL unit and bridge for multichip solutions.
     
  15. Humus

    Humus Crazy coder
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,217
    Likes Received:
    77
    Location:
    Stockholm, Sweden
    Anything that the GPU writes to that the other GPU reads from needs to be shared. Render targets would be the most common thing, but they don't need to be shared if they are cleared and rendered to each frame, which should be true for most render targets. In DX10 it could be StreamOut buffers as well.

    I would say the main problem with AFR is not the actual copying that may be necessary and the bandwidth needed for that, but the synchronization. For instance take a simple exposure implementation. GPU0 renders its frame. Then it averages the pixels to compute overall exposure. This ends up in a 1x1 render target. In the next frame the frame brightness is adjusted using this render target as input. GPU1 now needs to wait until GPU0 is finished rendering to the render target. Although the data copied only amounts to just one pixel, each GPU ends up idle most of the frame just because it doesn't have all its data ready from the other GPU. Even if the GPUs had a shared memory pool it wouldn't help, you'd still see scaling of say less than 10%.
     
  16. Lukfi

    Regular

    Joined:
    Apr 27, 2008
    Messages:
    423
    Likes Received:
    0
    Location:
    Prague, Czech Republic
    A lame question: what is that exposure good for?
     
  17. Humus

    Humus Crazy coder
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,217
    Likes Received:
    77
    Location:
    Stockholm, Sweden
    To compress the dynamic range to something the monitor can show. Otherwise HDR would not look any different from traditional rendering since the highlights would just clip.
     
  18. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,325
    Likes Received:
    93
    Location:
    San Francisco
    Good example, but this can be fixed quite easily as you don't really need the GPU to readback that value.
    Let the CPU do it (in the following frame(s)) and send it back to the GPU(s) as a pixel shader constant. No sync points between GPU(s) and no need to sample exposure on a per pixel basis anymore while tone mapping. Double win :)
     
  19. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Humus, I see you advocate HDR to be done the same way I do. FP10 and similar formats give you plenty of range this way, as the scale factor from that 1x1 lets you span as many orders of magnitude as you want for brightness.

    For this particular application, though, it won't make much difference if you use the 1x1 texture from two frames ago. This is especially true when you consider the time constant for exposure adjustment, as two GPUs will render twice as fast. I suppose there are some minor drawbacks, as you could get some funny stuff happening with, for example, muzzle flash that goes off every other frame.
     
  20. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    I've considered precisely the same thing before, but what kind of latency is there for GPU readback? Can it be done asynchronously like HDD access or will it stall the CPU during this time?
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...