NVIDIA GF100 & Friends speculation

Discussion in 'Architecture and Products' started by Arty, Oct 1, 2009.

  1. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    Sorry, was going by memory. The 8500 was released 7 months after the GeForce3, and 6 months before the GeForce4. This makes it, I suppose, more similar in terms of execution to the current generation than any of the others. The only major difference, as near as I can tell, is that there wasn't, at the time, any strong reason to believe that the 8500 had been delayed by 6 months.
     
  2. Sontin

    Banned

    Joined:
    Dec 9, 2009
    Messages:
    399
    Likes Received:
    0
    A shame that everybody is ignoring Tessellation. :sad:
     
  3. Picao84

    Veteran

    Joined:
    Feb 15, 2010
    Messages:
    2,109
    Likes Received:
    1,196
    Well, its not really tesselation you mean is it? Because it was "created" by ATI, AFAIK.
    What I think you mean is the reorganisation of the GPU structure to better cope (in theory) with geometry tasks.
     
  4. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Yeah they would. In the past, features make old architectures obsolete. But that doesn't really apply anymore, as the shader model just undergoes tweaks with new DX revisions.

    Regarding scaling, the only thing that hasn't scaled 2x with evergreen are portions of the scene that are limited by geometry, bandwidth, or the CPU/PCIe. Anything that needs flops, rops, or texture sampling has scaled as expected.

    We may see ATI move to a scalar architecture, as I've often advocated in the past, but it's not very difficult to do so while keeping the branching and issue rate the same per SIMD and thus most of the architecture intact.
     
  5. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Linux is fairly ISA agnostic, but if I may appeal to Top500, x86-64 has it locked down hard. All around me, I see x86 dominating other ISAs by orders of magnitude in no. of cores deployed. And no matter what, performance-porting apps across ISAs is hard. And please don't forget that they'll need all the drivers ported to new isa for all the hw they'll ever use. And sure as hell, that ain't an easy task.

    Porting away from x86 is like expecting entropy of an open system to decrease. Theoretically, yes (with a small probability, practically, well, you know.....). Just ask designers of PA-RISC/Alpha/PowerPC/Mips/68000/yadda-yadda...

    Yup, Linux has a monopoly in HPC about as big as Windows has on desktop. Don't know why but it won't go away anytime soon.
     
  6. Sontin

    Banned

    Joined:
    Dec 9, 2009
    Messages:
    399
    Likes Received:
    0
    I could call it "DX 11 tessellation", but yes. :wink:
    It's ironic that cypress has this huge amount of flops but it's unable to use it with tessellation because of the implementation. And GF100 is so fast with tessellation that it is limited by the pixel calculations...
     
  7. air_ii

    Newcomer

    Joined:
    May 2, 2007
    Messages:
    134
    Likes Received:
    0
    I think you should look at architectural efficiency as a whole, not some randomly picked numbers (perf/$, perf/sq.mm). Even if nVidia's flops are more "efficient", if AMD can pack more in the same sized chip, which one is better (for the record, I'm not implying either is, just don't get what he's on about)?
     
  8. air_ii

    Newcomer

    Joined:
    May 2, 2007
    Messages:
    134
    Likes Received:
    0
    I think the entire conversation is pointless... It has been rumbled on several times now...
     
  9. thatdude90210

    Regular

    Joined:
    Aug 9, 2003
    Messages:
    937
    Likes Received:
    6
    We don't know that for sure, that the GF100 being some sort tessellation monster compared to Cypress. If the info in this post is correct, for all we know, this 1.1 is simply optimized to their hardware on a different (other than tessellation) level. Could be that the GF framerates also take big hits in Dx 11. Wait for some real benchmarks first.
     
  10. Picao84

    Veteran

    Joined:
    Feb 15, 2010
    Messages:
    2,109
    Likes Received:
    1,196
    I stand corrected in my extreme affirmation :smile:
    But you understood what i mean :razz:

    So, my inquire about whether HD5870 has more FLOPS than it really needs can be right after all. Thanks.

    And geometry, the thing nVIDIA (theoretically) worked on, seems the right move. Shame they had to spoil it with something else...
     
    #4790 Picao84, Mar 24, 2010
    Last edited by a moderator: Mar 24, 2010
  11. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    As I said, x86 is quite popular right now. But x86 isn't nearly as "locked-in" in the HPC space as it is in the consumer space. Yes, performance-porting is challenging, but usually that's managed by porting a relatively small number of libraries, such as BLAS and LAPACK, where most of the compute time is spent. Such libraries are typically the only parts of HPC applications that are architecture-optimized anyway.

    The #1 reason why x86 is popular in the HPC space right now is because AMD and Intel are leveraging the huge (by HPC standards) volume of consumer-space products to basically out-R&D their competitors.

    Of course, if nVidia is going to get into the HPC space with a hybrid CPU-GPU part after Fermi, they're going to have the same uphill battle that all non-x86 manufacturers have. Their best bet would probably be to focus on producing effective CPU-GPU parts for markets like cell phones, and, if they can become successful there, use that to leverage larger HPC CPU-GPU parts.

    Because Linux is vastly superior for this space. Some of the biggest benefits off the top of my head are the ssh interface (which is essential for working remotely), the more-or-less standardized compiler and library setup (which is really important for porting between different machines), and the extremely powerful commandline (which is essential for saving time in executing large numbers of jobs).
     
  12. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Initially, I used to think like that as well, but now I am definitely on the fence for this one. vec4 totally borks the register allocation, but that could probably be worked around in compiler. In view of the efficiency of VLIW overall, and the cheap (almost free) DP cost makes me feel that they should stay the course. Even if you factor out the 5x ILP, amd's alu's appear to be more efficient than nv's.
     
  13. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Tessellation doesn't need lots of flops. Even on GF100, I'm sure under 20% of its ALUs will be working on tessellation at any time. Four triangles set up per clock means it can't use do more than one vertex generated per hot clock. The ALUs have 1024 flops to offer in that time, and probably more because even NVidia said something like 3.2 tri/clk is the fastest they can achieve.
     
  14. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    What about all the drivers for all the hw? They are heavily ISA dependent by design.

    For that, they'll have to sell soc's with 2 arm A9's and ~30 fermi SM's to the mobile phone industry. :runaway:
     
  15. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    Fortunately in the HPC space you can build your machine for the task at hand, which means you can stick to a relatively small variety of hardware that has the drivers you want.
     
  16. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Basically, think of AMD retaining the same register structure and modifying the data flow so that it allows dependent data flow in each xyzw instruction group. Instead of a four-instruction group being executed on 16 pixels each clock, a single instruction is executed on 64 pixels each clock. Instead of flipping between two wavefronts to fill the ALU pipeline, it goes round-robin on eight wavefronts. All the issue rates stay the same, register granularity actually decreases, and most of the ALU stays intact.
     
  17. LordEC911

    Regular

    Joined:
    Nov 25, 2007
    Messages:
    877
    Likes Received:
    208
    Location:
    'Zona
    What? I don't think you understand what you are asking...

    G80 = Nov '06
    RV770 = June '08

    Since RV770 "caught up" how do you get to three years? It is less than two years.
     
  18. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    IOW, quadruple the per clock throughput of each wavefront to issue it in one clock instead of 4, just like per warp throughput was doubled in fermi to one every 2 clocks. OTOH, eight wavefronts = 512 threads for minimal coverage of simd. That is quite high...

    What you just described is your scheme. I want to know the rationale.
     
  19. Archaeolept

    Newcomer

    Joined:
    Feb 26, 2003
    Messages:
    52
    Likes Received:
    0
    just a minor rectification of this error - the geforce 3 predated the 8500 by over 6 months. The 8500 was meant to be a geforce 3 killer, but early driver issues meant it took a while to achieve it's true potential. In the end, at very high AF, it was able to challenge the gf4 4200.

    edit: meh, didn't notice chalnoth's post, and can't delete :)
     
  20. Silus

    Banned

    Joined:
    Nov 17, 2009
    Messages:
    375
    Likes Received:
    0
    Location:
    Portugal
    Do you think that repeating that "real tessellation" thingy, somehow makes it true ?

    It's incredible that at this stage, with architecture specs and all, some people are still hanging on to some FUD spreading"articles"...
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...