GF100 evaluation thread

Discussion in 'Architecture and Products' started by rpg.314, Mar 27, 2010.

?

Whatddya think?

Poll closed Apr 6, 2010.
  1. Yay! for both

    13 vote(s)
    6.5%
  2. 480 roxxx, 470 is ok-ok

    10 vote(s)
    5.0%
  3. Meh for both

    98 vote(s)
    49.2%
  4. 480's ok, 470 suxx

    20 vote(s)
    10.1%
  5. WTF for both

    58 vote(s)
    29.1%
  1. aaronspink

    Veteran

    Joined:
    Jun 20, 2003
    Messages:
    2,641
    Likes Received:
    64
    Ahh so 250W TDP is Nvidia's version of AMD's ACP! Unless your design has technology to prevent going over the TDP for any thermally significant periods, you better not exceed your TDP.

    BTW 110C is well into electro migration range. If the 95C people are measuring are really Tj temps, these chips likely will have issues in short order.
     
  2. aaronspink

    Veteran

    Joined:
    Jun 20, 2003
    Messages:
    2,641
    Likes Received:
    64
    They just have to hold up for 1 year, then it isn't Nvidia nor their partners problem!
     
  3. Bouncing Zabaglione Bros.

    Legend

    Joined:
    Jun 24, 2003
    Messages:
    6,363
    Likes Received:
    83

    Nvidia continues to burn bridges (if not houses down with the high temps). At least with customers, because I can't see many OEMs being interested in these things beyond niche special order products.

    Bumpgate and drivers that melt cards can be ascribed to mistakes/incompetence, do it again and Nvidia will be cementing a reputation for faulty products by design.
     
  4. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    If you want to normalise to scalar MAD ALU operations per cycle of texture fetching, then GF100 is nominally 16:1, while Cypress is 20:1.

    Evergreen doesn't have R/W L2. It has some small R/W caches dotted around (most attached to the ROPs).

    Why bother? HD5970 is faster than GTX480 at tessellated workloads.

    Or AMD could just change the driver so that it lays out the data to avoid bank conflicts.

    Rasterisation is distributed. The problem is that AMD didn't distribute it enough. If Cypress had 4 banks of SIMDs each with 8 fragments per clock rasterisation, that might have been a start. But I suspect there are more fundamental problems in ATI's architecture to do with the way triangles of fragments are packed per hardware thread (are they? I have my doubts) and the way that hardware threads are globally launched and scheduled, rather than doing so locally.

    See my signature :wink:

    Though it also seems like the era of TSMC's tick-tocking node/half-node has ended, which looks like a really serious problem.

    Jawed
     
  5. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    I wonder, if tessellation, especially adaptive stuff would lead to an increase in necessary inter-GPU-communication?
     
  6. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    It seems adaptive factors are generated intra-frame in current techniques. Though you hint at something that sounds like a good idea, as I suppose it's possible to update these factors progressively across all geometry, i.e. in phases, over successive frames, with factors changing in clumps. Varying the frequency of factor update for a clump of geometry depending on distance to camera?

    I don't have any idea of the cost of storing these factors or transmitting them...

    Jawed
     
  7. chiadog

    Newcomer

    Joined:
    May 21, 2008
    Messages:
    21
    Likes Received:
    0
    I am pretty indifferent about the gf100. Of course, I feel the exact same way about the HD58xx series*. These two architectures' performance gains over previous generation are laughable in comparison to how many SP/CC/(whatever they're calling it these days) they've added. Ah well, looking forward to the next round as this match up is pretty underwhelming. I hope for the next round that they will put more strict guideline on the power consumption as they are getting to obscene levels. It's like we're back in the Prescott days.

    *I own a 5850.
     
  8. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    So nv's almost there in alu:tex apart from the t unit.

    It has write combining caches attached to each MC. Cache lines are read in, and entire cache lines are evicted. Leading to wider write transactions. If it has a lru eviction policy, then how are they not L2's?

    To stagger the changes over many chips.

    Yes, but it seems more capacity induced to me. I think it would have been reduced by now if it was so easily resolvable. Any way, we'll know for sure when B3D does it's hecaton/10.3 analysis. :wink:

    Fair enough. But there's no way in hell they won't pack fragments from multiple triangles into one hw thread. 64 wide threads are too wasteful otherwise.

    Yeah, good bye to yearly gpu launches. :sad:
     
  9. Rys

    Rys Graphics @ AMD
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,182
    Likes Received:
    1,579
    Location:
    Beyond3D HQ
    When the number comes direct from the lead architect, you can forgive me for having confidence in it :wink:
     
  10. Florin

    Florin Merrily dodgy
    Veteran Subscriber

    Joined:
    Aug 27, 2003
    Messages:
    1,707
    Likes Received:
    345
    Location:
    The colonies
    I just came back from holiday and I haven't read most reviews yet, but from what I've seen, this launch wins a sympathy vote, so yay/yay
     
  11. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Jawed,
    Yes and no. Some kind of grouping depending on z-distance could be as useful as not updating tessellated geometry every frame - but that would inevitably lead to a more coarse level-of-detail system with geo-detail popping up.

    But why I was wondering: Is every GPU in current systems only given the base mesh and expands it every frame again on it's own? Since every GPU does only every other frame, the differences in drawn geometry would be noticeably larger compared to a single GPU which has to do every frame by itself. I could imagine this isn't too helpful with caching performance, don't you think? And also the expansion has to be done on many more triangles, as many more change from one LoD into another.

    Instead, would every GPU transmit their Frames' geometry state to the other to avoid that, you'd end up with a lot of geometry data which has to pass the busses hence and forth for every frame.
     
  12. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    which is also there in GF100, but sans FMA.
     
  13. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,541
    Likes Received:
    964
    Indeed. Is this where the 1700MHz figure came from as well?
     
  14. PeterAce

    Regular

    Joined:
    Sep 15, 2003
    Messages:
    490
    Likes Received:
    10
    Location:
    UK, Bedfordshire
    Early Fermi thourghts...

    While I have not read and digested all the reviews out there (and there are many datapoints missing at this early time in the analysis, I'm looking at you B3D and Tech-report ;)) So far, what I am impressed/plesantly surprised with :

    - Minimum frame rates seem great.

    - Low res frame rates seem great.

    What I'm disapointed with :

    - Heat/Noise

    As an enthusiast I replaced 8800 GTX SLI setup with a 280 GTX SLI (using the same case) I'm unsure if 480 GTX SLI will allow the same! Without extra side case fans or maybe an new case.

    Anyway as avaliability is still 'pre order' I've got a little time to think about cooling changes :)
     
  15. Bouncing Zabaglione Bros.

    Legend

    Joined:
    Jun 24, 2003
    Messages:
    6,363
    Likes Received:
    83
    Is it the sort of thing you would expect the lead architect to keep to himself as company confidential? If so, you've got to expect any such information offered as suspect.

    It just goes to show that everyone in the company will be under orders to either keep quiet or spread misinformation.
     
  16. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    If you assume ATI has a typical best case of 80% ALU utilisation...

    L2 implies there's an L1. But it's not a two-level hierarchy.

    We need evidence that TS throughput is a bottleneck in games, first...

    That's not an argument though, I want evidence one way or the other.

    AMD says 8 fragments per triangle is the reasonable lower limit. There's ~ an order of magnitude between 2 quads occupying a 64-capacity hardware thread and 1 fragment occupying the same thread.

    In games with no tessellation I imagine there's quite a bit of that 64-capacity going unused - even with fairly high fragment counts per triangle on average. But, I am just guessing at the workings here...

    I think it basically means that AMD's GPUs will creep up in size - sweet spot isn't about making the smallest die possible, anyway. Cypress is the second biggest ATI chip, after all (though there's some argument about the size of R580 :???:). Indeed Cypress could have ended up the biggest ATI chip ever, depending on how you interpret the comments on the shrinkage it suffered.

    Jawed
     
  17. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,059
    Likes Received:
    3,119
    Location:
    New York
    Well you're looking at a card that's burning up even at idle clocks of 50Mhz core, 100Mhz memory. It's barely running and still using a lot of juice. And there's nowhere to go but up.
     
  18. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    From what we've measured, R580 was about 10 sqmm larger than Cypress (with equal parameters, thus the error should be more or less equal too).
     
  19. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    You can trade LOD for the reduced effort in computing LOD per frame, therefore avoiding popping.

    That's the basic concept. No different from texturing each triangle each frame, etc.

    Time-coherent shading is possible, e.g.

    http://developer.amd.com/media/gpu_...ith_Reverse_Reprojection_Caching(GH07).ppt#35

    and things like ambient occlusion approximation can be generated progressively over multiple frames. So it seems reasonable to do tessellation factors the same way.

    AFR spoils lots of things, which is why I don't like it. I dislike the "X2 to compete with NVidia's top-end" strategy.

    Yes. Early days yet as we don't know how expensive the data involved in adaptive tessellation factors is.

    Jawed
     
  20. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Well, this sort of analysis is always modulo these inconveniences...

    It's L1 in the sense that there is no level of cache above it. It's L2 in the sense that there are caches above it which are not unified with it. At any rate, it does seem to have r/w caches.

    TS might not be a bottleneck, but in tessellated scenes setup/raster quite likely are.

    More reasons to go deferred. Yeah, I am a sucker for deferred shading. :smile: With tessellation however, it might be more beneficial to rasterize all and then shader instead of just buffering up all the geometry.

    Or may be they can go to GF and ask 'em to do half nodes.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...