AMD: R7xx Speculation

Discussion in 'Architecture and Products' started by Unknown Soldier, May 18, 2007.

Thread Status:
Not open for further replies.
  1. Nite_Hawk

    Veteran

    Joined:
    Feb 11, 2002
    Messages:
    1,202
    Likes Received:
    35
    Location:
    Minneapolis, MN
    Hi Guys,

    I just got back from Best Buy; I got in on the visiontek 25% off sale and picked up a 4850 for $150. :) I'm going to install it now, so if there are any tests people want run let me know. I don't have any recent games so only tests you can provide for free please. ;)

    Nite_Hawk
     
  2. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    The killer is that T can only get register operands from the other ALUs. Literals, constants and previous results are also possible of course.

    At a cost of 6% of the ALUs or 2.4% of the die, it seems doubtful. If the dark bits are logic, there's a hell of a lot of logic in each SIMD to go wrong.

    I certainly won't discount this, as it was my original guess when the "finger-and-thumb" die shot appeared - 10 regular looking things that are TUs are desperately needed :razz: There's L1, vertex data cache and GDS to account for too. L1 could be 32KB, vertex data cache the same.

    Each SIMD should have 256KB of register file (unless they've changed that too), i.e. 4 lots of 64KB, implying that each block in each corner is 16KB. But there's still a big question mark over multi-porting - are there really 4 physical banks per logical bank? If so the L1 caches and other SIMD memory should look piddly...

    I have to say, right now I'm leaning towards your interpretation. EDIT: [strike]There's nothing to say that 20 MADs can't have 1 ALU lane of redundancy :razz: [/strike] :oops:

    Jawed
     
  3. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    With 16xAF and 4xMSAA I'm hoping for way more than 30%. The Extreme preset in Vantage is rumoured to be 60% faster. Can't believe we don't have any more benchmarks.

    :lol: they've changed so much, we're still guessing.

    Anyway, no point in rushing...

    Jawed
     
  4. hoom

    Veteran

    Joined:
    Sep 23, 2003
    Messages:
    3,264
    Likes Received:
    813
    I think more like this:
    [​IMG]
    Blue = 5SP block (1 spare?)
    Red = LDS/L1?
    Pink = TMU quad
    Green = Scheduler/Dispatch
     
  5. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,382
    Yes, that was a great marketing story but it's really incorrect. If you can pull it off wrt layout, for the similar performance, a crossbar is much better in terms of performance per area than a ring bus, assuming an evenly balance load. With a ringbus, you have to go out of your way architecturally to avoid severe stalling or even deadlock conditions, typically by overdesiging it. Crossbars have no such problems.

    With the area required for a crossbar increasing in a linear fashion while chip area is on a quadratic path, the layout area required to create channels on a chip to place the wires is really a non-issue these days.

    As for its impact on overall GPU performance: an interconnect should never be something that determines performance. Once you're able to feed the agents at their maximum capacity, it doesn't matter whether you do so with a crossbar or anything else. It's reasonable to assume that ATI nor Nvidia have ever been so stupid as to underdesign the bandwidth of their interconnects. That's why that whole hoopla of the existance of the ring bus has always been baffling to me: at the end of the day, it has zero impact on how well your GPU will work. It's not going to make FP calculations any faster...

    FWIW, at my job, we'd use 'crossbar' and 'switch fabric' very often interchangeably to describe the same thing. I'd say a switch fabric is at the core of a crossbar. :wink:
     
    #4365 silent_guy, Jun 21, 2008
    Last edited by a moderator: Jun 21, 2008
  6. kyetech

    Regular

    Joined:
    Sep 10, 2004
    Messages:
    532
    Likes Received:
    0
    I happen to think it was good in the context of consoles... Not just in terms of performance, but also interms of functionality.

    Look at gears of war 2 and remind yourself this chip is v.good for its size, power budget and timeframe.
     
    #4366 kyetech, Jun 21, 2008
    Last edited by a moderator: Jun 21, 2008
  7. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,493
    Likes Received:
    474
    Judging performance of Xenos from any PC part is a waste of time. There were even more changes from Xenos to R600 than there were from R600 to RV770.
     
  8. leoneazzurro

    Regular

    Joined:
    Nov 3, 2005
    Messages:
    518
    Likes Received:
    25
    Location:
    Rome, Italy
    AFAIK a crossbar does not scale linearly with number of units, while a bus can (not necessarily a "ring" one). For all other considerations, I agree.
     
  9. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,022
    Likes Received:
    122
    I must have missed this, why does T not have its own register file? And if that's the case, wouldn't that mean it has to be very close to the other units?
    Maybe what you labeled T-Mad could be texture filter (16 fp16 "bilerp units") which would fit the mostly logic look of this area (with the are right to it texture address, texture fetch including L1 cache, and at the left the sequencer - not much storage there though and a huge amount of logic...).
    Or not...
     
  10. Mariner

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,288
    Likes Received:
    1,055
    Erm, unless I've missed this somewhere has anybody with one of the 4850s actually tested CFAA performance after this obvious hint?

    Comparison with standard AA and versus 3870 performance would be interesting. :smile:
     
  11. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Yeah, that'd be very interesting. Unfortunately, most reviewers were seemingly caught by surprise to see the Performance-NDA lift on the third day (about five days earlier than was communicated initially) after being briefed about the new products...
     
  12. igg

    igg
    Newcomer

    Joined:
    May 16, 2008
    Messages:
    63
    Likes Received:
    0
    The card is scheduled for a July launch: Source.

    It would be awesome if thats true :)
     
  13. Tchock

    Regular

    Joined:
    Mar 4, 2008
    Messages:
    849
    Likes Received:
    2
    Location:
    PVG
    One post in Lowyat.net (think of it as an even cruder VR-Zone. Yes, sites like these actually exist :grin: )


    He later said that these were the ED scores, box 4xAA scores over 30 FPS.
     
  14. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,237
    Likes Received:
    4,260
    Location:
    Guess...
    Its a great chip compared to RSX thats for sure (at least in terms of its design and functionality) but thats more because RSX was pretty poor for its timeframe.

    What I mean is, Xenos is clearly a great design on paper, and it also comes packed with great functionality but the same can be said of R600. We mark R600's "greatness" down because it didn't perform as well as we expected. I'm just not seeing why we should assume Xenos is a superior implementation of the architecture when R600 came second and had time to learn from and refine the Xenos design.

    E.g. in terms of overall efficiency of the implementation it looks like:

    R600 -> R670 -> R770

    That also matches the timing of their releases which is to be expected as each evolved from the one before. Xenos performance is an unknown but timing wise it does slot into the above picture before R600 so if we're going to make assumptions about its efficiently it seems more sensible that those assumptins fit into the above picture. Assuming Xenos is as efficient an implementation of that basic architecture as R770 seems a bit baseless to me. More likely its an as efficient implementaion or less so than R600.
     
  15. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    IMO: Greatness only derives from comparison with the alternatives.
     
  16. Skinner

    Regular

    Joined:
    Sep 13, 2003
    Messages:
    878
    Likes Received:
    12
    Location:
    Zwijndrecht/Rotterdam, Netherlands and Phobos
    I wonder if it only have 512mb framebuffer.?
     
  17. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    That's clever!

    I like that, particularly the solution to redundancy.

    Jawed
     
  18. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    The register file is vec4, I guess. As far as I can tell the register file is banked into 1KB sections, each section being 1 vec4 register * 64 elements (64 * 16 bytes).

    If you download the CAL SDK you can see a hell of a lot of detail about R600 from the ISA document.

    Anyway, I'm abandoning my theory, I like Hoom's theory very much.

    Jawed
     
  19. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,610
    Likes Received:
    825
    If they ditched AFR much more of the aggregate memory would be available than on the old X2 cards, so a per chip 512 MB buffer wouldn't be so bad.

    If AMD against all odds has ditched AFR they should dust off the FASN8 motherboard ... nothing could stand against them in the benchmarks then, nothing could even get close.
     
  20. Tchock

    Regular

    Joined:
    Mar 4, 2008
    Messages:
    849
    Likes Received:
    2
    Location:
    PVG
    Wait... what about 2x GT200 and SmackOver? The latter would propel the combo to something better than FASN8 I suppose.


    That was a conclusion made too soon. :lol:
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...