AMD: R7xx Speculation

Discussion in 'Architecture and Products' started by Unknown Soldier, May 18, 2007.

Thread Status:
Not open for further replies.
  1. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,610
    Likes Received:
    825
    On the other hand ATI could focus on improving what was there since they already did DP and scatter before ... lets just wait for some benchmarks.
     
  2. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    The patent document seems to imply that the output from these disparate units can be returned to the "cluster" (register file, presumably) - i.e. the staging is "real".

    Haven't a clue who he is.

    Even G70 doesn't use the same hardware for vertex data fetches as for texture filtering, does it?

    Kinda disappointed you guys haven't wheedled this stuff out...

    Until it's benchmarked we won't know the practicalities - clearly total throughput is up for both point and filtered sampling.

    Yeah, sadly at least some of the 9800GTX 16xAF/4xAA benchmarks are blighted by the card apparently running out of memory, so the comparisons with HD4850 are unreliable.

    Double-precision was already there in RV670, so that particular aspect shouldn't have changed...

    I am intrigued by the idea of 10 SIMDs though - that's quite a lot of control overhead in comparison with 4 SIMDs.

    Jawed
     
  3. leoneazzurro

    Regular

    Joined:
    Nov 3, 2005
    Messages:
    518
    Likes Received:
    25
    Location:
    Rome, Italy
    It seems to me that the "global data share" could be like a shared link where all SIMD can access in a fast way data in the other SIMD (and access other TMUblocks linked to other SIMDs). that is, it's the "vertical" link between the SIMDs where the "horizontal" link is among shaders in a SIMD and the dedicated TMU block.
    Is it possible?
     
  4. Sunday

    Newcomer

    Joined:
    Feb 6, 2002
    Messages:
    194
    Likes Received:
    6
    Location:
    GMT+1
    it was on this site that TSMC is skipping 45nm and heading directly towards 40nm (presumably in H2 2009)
     
  5. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    I wasn't implying anything so elaborate. Just talking about the strategy of making units much smaller at the cost of some efficiency/functionality.
     
  6. ZerazaX

    Regular

    Joined:
    Oct 29, 2007
    Messages:
    280
    Likes Received:
    0
    Well with GDDR5 supporting:

    That crossbar might just mean shared memory does indeed occur
     
  7. Anarchist4000

    Veteran

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    http://www.qimonda-news.com/download/Qimonda_GDDR5_whitepaper.pdf

    Page 7.

    That does make a lot of sense though. Each GPU would only have half the bandwidth to each memory chip but with twice as many chips all things are equal. Routing on the PCB I see being an utter nightmare. Getting traces around the second GPU and attached to its memory should be interesting.

    You'd have to imagine one of the IHVs requesting a spec like that for it to get included in GDDR5. Since Nvidia likes GDDR3 and is making GPUs roughly 1 Kardashian in size I'd think it's obvious who would have asked for it.

    The only interconnect would be the control bus. That and some link or partitioning between the schedulers would make a little sense.
     
  8. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    I don't get what you guys are describing. Clamshell mode is designed to let one memory channel interface with 1 or 2 DRAM chips. The command bus is common for both configurations, while the data bus is split in two for clamshell configuration.

    Jawed
     
  9. Anarchist4000

    Veteran

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    After further reading it's not quite what I thought.

    Clamshell would cut the data bus in half allowing the possibility of two GPUs connected to each chip. What I'm still digging for is if there is any control logic that determines which half of the data bus gets used. Then they could alternate between controllers every other clock etc. If that was the case they'd just have to work out timing and sharing of the control bus.
     
  10. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,610
    Likes Received:
    825
    The data bus is point to point, you can't just bodge an extra couple of 10s of cm of trace to another GPU on those DRAMs data pins and hope it will still run at 4 GHz.

    Anyway, there is no real point to doing it like that even if you could. You are still losing bandwidth that way ... if you are going to lose bandwidth anyway you could just directly connect one partition of the memory interface on both GPUs, without a DRAM in between.
     
    #3630 MfA, Jun 17, 2008
    Last edited by a moderator: Jun 17, 2008
  11. Anarchist4000

    Veteran

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    After going through a little reading the buses can be tri-stated so the point-to-point might not be necessary. The question is how much it affects performance. I'd agree this is pushing things a bit but it's a possible option.

    My original idea was that you'd have twice the effective bandwidth but available only half the time as it alternated between controllers. Also there is a mirror option but that appears to flip all the pins. Not just the data pins.
     
  12. Arty

    Arty KEPLER
    Veteran

    Joined:
    Jun 16, 2005
    Messages:
    1,906
    Likes Received:
    55
    Thought this was intertesting:

    http://www.extremetech.com/article2/0,2845,2320134,00.asp
     
  13. ZerazaX

    Regular

    Joined:
    Oct 29, 2007
    Messages:
    280
    Likes Received:
    0
    So what's the "Global Data Share" that the supposed-Crossbar leads to for? I just realized it wasn't in the R600 drawing...
     
  14. Anarchist4000

    Veteran

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    My guess would be some sort of global cache for inter-thread communication.

    I don't suppose anyone has done a pin count on RV770? For a unified memory architecture I'd assume there has to be some form of high speed interconnect and an abundance, or lack thereof, might give an idea on just what they're doing.
     
  15. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    If we knew what the Local Data Share is, maybe we could make a decent guess.

    I'm thinking that LDS might be the name for the per SIMD register file.

    But GDS is placed alongside texture caches, which implies it's texture-data related. Hell it might be nothing more than memory used to hold addresses or filtering coefficients or something, stuff that's been computed but needs to be kept around for later usage.

    If that's the case then LDS might just be data that's specific to a batch for the purposes of texturing or vertex data fetching.

    Maybe related to texture arrays and cubemap arrays?

    Jawed
     
  16. v_rr

    Newcomer

    Joined:
    Apr 30, 2007
    Messages:
    147
    Likes Received:
    0
    Does that picture come from AMD?
    It list 40TMU, but by tests so far HD 4850 looks to have 32TMU.

    So True/Fake?
     
  17. satein

    Regular

    Joined:
    Aug 17, 2005
    Messages:
    483
    Likes Received:
    21
    Location:
    Sheffield, UK.
    What's test?

    If it is GPUz, the test will rely on database. So at this point, I think we may need to wait until the launch date to be confirmed all the info about the RV770. It will not be any longer to wait :evil:

    This round, AMD/ATi do playing a good game on keeping infomation well.
     
  18. Wirmish

    Newcomer

    Joined:
    May 4, 2007
    Messages:
    160
    Likes Received:
    0
    Perlin Noise
     
  19. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,059
    Likes Received:
    3,119
    Location:
    New York
    I think no-x posted some 3dmark fillrate numbers a while back that pointed to 32.
     
  20. LordEC911

    Regular

    Joined:
    Nov 25, 2007
    Messages:
    877
    Likes Received:
    208
    Location:
    'Zona
    Ummm... Isn't Perlin Noise a shader intensive bench?
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...