AMD: R7xx Speculation

Discussion in 'Architecture and Products' started by Unknown Soldier, May 18, 2007.

Thread Status:
Not open for further replies.
  1. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,015
    Likes Received:
    112
    I was refering to Cho's numbers here: http://forum.beyond3d.com/showthread.php?t=48630&page=2
    Though these numbers don't make sense for the HD3870 which was used in comparison (I think it could be a 3870x2 instead, then it would make more sense...)
     
  2. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    There will always be niche markets for any product, e.g. Parhelia, but this is NVidia we're talking about. It doesn't have to be literally DOA for it to be effectively so. ~10% more system idle power isn't a big deal, and will take several years of 24/7 usage to even reach half the difference in cost.

    In fact, I would argue the opposite in that the GTX 280 has more usefulness than the 260 in that it is clearly the best and there will always be people who want that. I think SLI tends to scale a bit better than CF, too (not counting X2 and GX2, which aren't here yet).

    Only two games tested on that site used 8xAA, and I ignored them in my assessment, especially because framerates were so high that it didn't matter who won.
     
  3. jimmyjames123

    Regular

    Joined:
    Apr 14, 2004
    Messages:
    810
    Likes Received:
    3
    I'm just confused why people would consider the GTX 260 "DOA" when based on a single review. If one can get higher playable settings some of the time with the GTX 260, in addition to PhysX and CUDA-application support down the road, then some will pay more to get it ($100 difference between the 4870 and GTX 260 is not that bad compared to the cost differential between GTX 260 and GTX 280).

    The real problem is that NVIDIA has to cut the price to keep a competitive price/performance ratio. This really hurts their margins and ROI. Even though GT200 is an expensive chip, they may have to suck it up and bring the price of the GTX 260 down another $50 to make it a more competitive value until GT200b shows up.

    Anyway, all things considered, I'm really impressed with the ATI cards this go-around. They are definitely back in the game!
     
    #4703 jimmyjames123, Jun 24, 2008
    Last edited by a moderator: Jun 24, 2008
  4. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    Yeah, talk about a deep clean.

    I think it's fair to say that, apart from doubled-Z in the RBEs, there was a general expectation that RV770 would increase the counts of certain items and that would be it. More ALUs were guaranteed, more TUs were the popular choice and apart from slightly increased clocks, that was pretty much the end of it.

    Ha, I lead the pessimists' charge, asserting continuation of 16 TUs + 16 ROPs - though not without question.

    For a change that isn't architectural, RV770 is a pretty thorough refresh. Actually trying to define what makes for architectural changes would be tough in light of this :lol:

    I think it was Dave Orton who said that they didn't have the tools they needed to make R600. Maybe that was just post-justification, or maybe it indicates they knew what they couldn't achieve in the R600 timeframe. Much like the uncertainty over which features intended for D3D10 got pushed back into 10.1, I don't think we'll ever know how much of RV770's changes were pushed out of R600.

    If R600 was released on-time then we'd be looking at a 18-20-month refresh period between it and RV770. That length of time could be taken to indicate that RV770 is much as planned and it does not consist of stuff that couldn't make it into R600 due to "tools problems".

    Assuming that RV770 uses screen-space tiling for fragments, then this means that each quad TU now "owns" a region of the screen, since each SIMD owns a tile and there's a 1:1 relationship between the two.

    I think R6xx uses the ring bus to allow the disjoint L2s (and therefore L1s) to share texels, after any one TU has fetched the texel from memory. But I don't remember anyone saying that this is the case. If true, this is extra work for the ring bus. I presume the ring bus also supports the SIMDs in fetching from "foreign" TUs, attached to other SIMDs, since all SIMDs have to use all TUs to get texture/vertex data.

    As far as I can tell R6xx's TUs (L1, L2) each have a local ring stop. This ring stop serves the TU, an RBE and an MC. Connecting them is a crossbar. So not all memory operations by TUs and RBEs travel around the ring, as the local MC is "directly connected".

    RV770 has a dedicated crossbar twixt L2s and L1s to enable texel distribution. But due to screen-space tiling, the volume of texels that need to land in multiple L1s should be much less than in R6xx. This is because texels at the borders of screen-space tiles are candidates for multi-L1 sharing, whereas in R6xx texels in every quad of screen space could be candidates for multiple L1s.

    Vertex data normally consists of one or more streams (1D) that are consumed at roughly equal "element frequency". So it would seem to make sense for there to be a single vertex data cache as in RV770. It's not clear if R6xx had multiple instances of vertex data cache (one per SIMD) though.

    I'm still wondering how a single vertex data cache is going to support 10 TUs though. Perhaps the SIMDs take it in turns, strictly round-robin?

    I have to say I'm confused by the fp16 situation. The amount of design effort that went into making R600 single-cycle fp16, indeed the conversion of int8 texels into fp16 texels, makes me wary of accepting that they've reverted to an int8 setup. Waiting to find out more.

    Like the uncertainty over L2 texture cache, I'm unsure whether R6xx has a single colour buffer cache or multiple instances each dedicated to an RBE. I suspect the latter, since screen-space tiling makes colour (and Z and stencil) essentially private to an RBE.

    I'm pretty sure that RV770's RBEs only use their local MC, whereas R6xx appeared to allow all RBEs to access all MCs. I guess this means a revised way of tiling Colour, Z and stencil data in memory. Though we've never really had much idea how earlier GPUs tiled render targets...

    The hub appears to be for low-bandwidth (or low duty-cycle) data. This makes me wonder if we'll see the "unification" of two GPUs' memory as has been long discussed, for the X2 board.

    I just don't see how there'll be enough bandwidth through the two hubs (one per GPU) to allow anything other than the transmission of completed render targets, i.e. AFR mode.

    LDS is a big deal. I have a suspicion that AMD has configured this as a read-only share between elements in a hardware thread. I wrote my theory here:

    http://forum.beyond3d.com/showpost.php?p=1179619&postcount=4340

    Making it read-only means it's "collision free" and latency-tolerant. I reckon this means that thread synchronisation (in order to be able to share data across elements safely) becomes very cheap and a normal part of the Sequencer's task of issuing threads and load-balancing them.

    I dare say it's notable that, like GT200, RV770 is lower-clocked. I reckon this reflects the process/yield/die-size scaling issues that lead AMD to a multi-chip GPU strategy.

    Global data share seems fiddlesome, now that's asking SIMDs to cooperate, I presume. Though if SIMDs are cooperating in their use of vertex data cache (taking it in turns) perhaps there's a higher level thread synchronisation mechanism in RV770. Something more interesting than the mundane creation and termination of hardware threads by the command processor.

    Jawed
     
  5. AlphaWolf

    AlphaWolf Specious Misanthrope
    Legend

    Joined:
    May 28, 2003
    Messages:
    8,475
    Likes Received:
    325
    Location:
    Treading Water
    Probably because the review doesn't really fall outside of expectations. The 260 wasn't exactly kicking the crap out of the 4850. PhysX and CUDA are just unknowns to most consumers, the small amount of people that will amounts to a niche.

    I don't think it'll stay higher than the 4870 price for long, although they'll target the 1GB model most likely. Or maybe they'll start stripping them to 448MB, but I'm not sure how that will effect performance.
     
  6. Lukfi

    Regular

    Joined:
    Apr 27, 2008
    Messages:
    423
    Likes Received:
    0
    Location:
    Prague, Czech Republic
    I dunno why, but where memory size is a limiting factor for nVidia, it isn't for ATi. IMO 448 MB would essentially kill the GTX 260 just as 256 MB kills the 8800 GT.
     
  7. toTOW

    Newcomer

    Joined:
    Jun 24, 2008
    Messages:
    2
    Likes Received:
    0
    Location:
    Bordeaux, France
    Thank you mike for pointing this thread to my attention ... I'm surprised to see that my adventures are already spreading all over the world :cool:

    I'm editing the BIOS with an hexadecimal editor (with the help of RBE from to techpowerup to locate the correct offsets). GPU and memory clocks mod work from the BIOS, but I had to physically mod the board for the voltage.

    1.3V for this bench, but none of these values have been fine tuned ... I'm trying to get the maximum form the chip.

    I've just tested 825 MHz @ 1.3~1.32V, but I'm reaching the limits of a component (the GPU or the VRMs ... I don't know yet) : strange lightning flashes in the middle of the screen and gradient halo ...

    Here are some additional pictures : two shots of the modded board

    [​IMG]
    [​IMG]

    The board in the case, with the two extra fans :

    [​IMG]
     
  8. Broken Hope

    Regular

    Joined:
    Jul 13, 2004
    Messages:
    483
    Likes Received:
    1
    Location:
    England
    Always makes me wonder what ATI do differently, they can match Nvidia cards with much more memory, is Nvidia more wasteful of RAM than ATI?
     
  9. digitalwanderer

    digitalwanderer Dangerously Mirthful
    Legend

    Joined:
    Feb 19, 2002
    Messages:
    17,264
    Likes Received:
    1,781
    Location:
    Winfield, IN USA
    Dumb question toTOW, but did you happen to find a way to control fan speeds on the stock cooler whilst playing with that hex editor?

    BTW- Nice job, I bow to your voodoo! :D
     
  10. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,015
    Likes Received:
    112
    There's no indication the hub is really low-bandwidth (pcie 2.0 is already 8GB/s each direction). Not suited as a route-all-traffic-around-the-chip catch-all, yes, but might be sufficient for texture fetch / vertex fetch across the CrossfireX port.
    Also, maybe it would be possible to operate in "mixed mode" - so for instance vertex buffers, compressed textures and render targets used as textures won't be duplicated but reused fp16 textures will be (though of course unfortunately those also take the most space...).
    It certainly shouldn't prevent the use of SuperTiling (assuming vertex work is just all done on both chips).
     
  11. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,059
    Likes Received:
    1,021
    The two tests do confirm each other. In one the HD4850 draws 38W more than a standard HD3870, in the other it draws 22W more than an overclocked sample. This is normal.
    Regarding the power draw of the HD4870, we already have examples here that achieving HD4870 frequencies require hiking the voltage, indicating that AMD is at a point on the curve where increasing the frequency of the part increases the power draw drastically.

    These power draw numbers are at best internally consistent, since different test vectors give different results for different boards, and there is no standardized procedure. Still, there have been several tests that show that the HD4850 behaves as expected in this review, so there is little reason to doubt that the HD4870 data is in the right ballpark. It's disappointing, but not entirely unexpected.
     
  12. A.L.M.

    Newcomer

    Joined:
    Jun 2, 2008
    Messages:
    144
    Likes Received:
    0
    Location:
    Looking for a place to call home

    I think that 50W of TDP for 125MHz on core and for the GDDR5 are more than enough, don't you? :wink:
    I know that you can't compare numbers between the two tests, that's obvious, but it's not my point. My point is: how the hell a GX2 draws only 18W more than a 9800GTX in full load or an HD3870X2 only 21W more than a HD4850? It's impossible, unless you do a very bad job in defining what should be considered as "full load", imho.
    Look at the latest Anandtech review:

    [​IMG]

    This one too looks completely understandable and compatible with the declared TDPs, for example.
     
  13. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,059
    Likes Received:
    1,021
    Don't forget the 12W(!) fan on the HD4870. :)
    I haven't looked into nVidias typical power draws over different reviews, I only checked for consistency over the RV770 data. Anandtech reports by far the smallest increase going from the HD3870 to the HD 4850, so I wonder a little at how they measured it.
    GDDR5 shouldn't add much or anything over GDDR3 according to the spec sheets. I really would have appreciated if the rumoured HD4850 clocked GPU with GDDR5 had materialized, even though it probably didn't provide much performance advantage to justify the additional cost.
     
  14. ChronoReverse

    Newcomer

    Joined:
    Apr 14, 2004
    Messages:
    245
    Likes Received:
    1
    Interesting how Anandtech shows 8800GT load consumption lower than 3870 load while HardOCP shows about the same.
     
  15. toTOW

    Newcomer

    Joined:
    Jun 24, 2008
    Messages:
    2
    Likes Received:
    0
    Location:
    Bordeaux, France
    No I didn't had a look to these parameters ... when automatic regulation started to be the limit, I plugged the fan directly to 12V ... and then I replaced it with the Zalman cooler.
     
  16. digitalwanderer

    digitalwanderer Dangerously Mirthful
    Legend

    Joined:
    Feb 19, 2002
    Messages:
    17,264
    Likes Received:
    1,781
    Location:
    Winfield, IN USA
  17. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    I really doubt any other review will show anything different. When looking at 4850 vs. the rest, this review is very much in line with every other one.

    I don't even know if that's enough, and I doubt NVidia has any desire to sell chips with low/negative margin. From the rumours, it's already priced lower than they wanted. Board cost is probably similar to the 3870X2, which would have similar RAM cost, board complexity, cooling, power, etc. The 3870X2 is generally slower than the 4870 but can't be priced below $299 due to cost (except for clearance purposes, of course, as it will soon be EOL).

    IMO NVidia would rather bleed a little market share and live with low sales of the 260. There's still the 9800 GX2 and 9800 SLI for their loyal users to get a more competitive product near that price point.

    Indeed, and it was really needed. Just look at how willing NVidia was to drop $100 off the price. They must have halved the price they were charging AIB partners for G92 chips, because all the other components remained the same cost and retailers/AIBs still want their share of the profit.

    It just shows you how NVidia was feeling zero pressure from ATI and was acting almost like a monopoly. Can't say I can blame them, though.
     
  18. AlphaWolf

    AlphaWolf Specious Misanthrope
    Legend

    Joined:
    May 28, 2003
    Messages:
    8,475
    Likes Received:
    325
    Location:
    Treading Water
    so newegg had the sapphire 4870 512MB up for a bit at $309.99. Doesn't seem like we'll be waiting until July.


    As for the memory performance making a difference on the 4870,

    If you look at this chart, you'll see that the 4870 performs better than its 20% clockspeed advantage in 33 out of 36 comparisons. Therefore I think its safe to say that memory speed is helping its performance, unless there's something hidden we don't know about.
     
    #4718 AlphaWolf, Jun 25, 2008
    Last edited by a moderator: Jun 25, 2008
  19. Karoshi

    Newcomer

    Joined:
    Aug 31, 2005
    Messages:
    181
    Likes Received:
    0
    Location:
    Mars
  20. tacopaco

    Newcomer

    Joined:
    May 15, 2008
    Messages:
    98
    Likes Received:
    0
    Location:
    indiana
    Hmmm when was that up? I didn't catch it....
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...