NVIDIA Fermi: Architecture discussion

Discussion in 'Architecture and Products' started by Rys, Sep 30, 2009.

  1. Eolirin

    Regular

    Joined:
    Apr 28, 2003
    Messages:
    256
    Likes Received:
    178
    I'm sure they can, but ATI has parts in the mainstream segment as well, and the relative yields/price/performance ratios are still going to be in their favor, regardless of how nvidia decides to scale their architecture. Basically, at any given performance point, the GF100 part will probably be more expensive to *make*, just like the G200 vs R770 situation.

    You can do many more things with the GF100, and it's use in HPC seems pretty awesome, but for mainstream gaming and consumer use, the market needs can't be met at the same price points without accepting a lower margin. That's generally not a good thing to do if you want to be competitive in that market.
     
  2. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    I can't think of a single transistor in G80 that's CUDA specific. GT200 manages to squeeze in some double precision. Maybe I'm just tired and forgetful :???:

    Jawed
     
  3. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    That's simple, AMD put in really cheap DP. Insanely cheap. It isn't very good, either. Just adequate if you're willing to work around it.

    Jawed
     
  4. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    That's a good point, but GF100 DP is prolly "real man's" DP, not the half-cocked ATI variant.

    Subnormals and exception processing on GF100 would appear to be much better than on Larrabee too. Do all these things make GF100 compelling?

    Does the lack of SSE compatibility in Larrabee make it a dead duck in HPC?

    Jawed
     
  5. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    32 bit MUL wants to produce a 64-bit result in general - that's still pretty expensive. If it wasn't, we'd have more INT MULs in earlier GPUs.

    I'm not willing to believe that. How would DP function if the bandwidth wasn't there, since it uses both units concurrently.

    We have no details on the ATI 24-bit INT-MUL - is that just the 24 lowest bits? I suspect its for addressing type calculations.

    Jawed
     
  6. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,610
    Likes Received:
    825
    What's so inefficient about putting circular append buffers between producer/consumer kernels and branching to consumers when they have full warps? It takes storage, but running strands on Larrabee with only a few active fibers won't be efficient either ... the storage is a necessity.
     
  7. Andrew Lauritzen

    Andrew Lauritzen Moderator
    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,629
    Likes Received:
    1,227
    Location:
    British Columbia, Canada
    Huh? I can *maybe* see them dropping ROPs but TMUs?? Where do you get that impression?
     
  8. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    24-30th August, apparently.

    Jawed
     
  9. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    So you think adding full IEEE compliance for DP will hurt AMD (a lot) when it takes that step?
     
  10. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    To summarize, (in no particular order)

    1) Real DP, no really, full IEEE dp without apologies

    2) function pointers, recursion

    3) C++ style new delete, exception handling

    4) a c++ debugger for vs, dunno if there will be a cuda fortran compiler

    5) cpu style caches for better performance in irregular workloads

    6) more int32 performance than what anybody needs. Why? Why?

    7) More shared memory.

    8) simultaneous kernel execution, compute, cpu->gpu memcpy, gpu->cpu memcpy, all 3 can go in parallel

    9) full ECC, all the way from reg file to off chip ram

    10) unified mem space, but memspace must be known at compile time.

    11) supports both DDR3 and GDDR5. Why bother with DDR3 if you have spent trannies and effort to hack in ECC on GDDR5?

    Have I left out anything?
     
  11. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,610
    Likes Received:
    825
    It has a read/write cache on the memory bus, very nice ... but that's not really a CPU style architecture. CPUs tend to be read/write and coherent across the entire cache hierarchy.
    Because they have huge multipliers for DP around anyway, 50% of which will be idle when not doing DP ... no point in sweating the small stuff.
    Have they even said they would do ECC on GDDR5?
     
  12. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    It's inferior to Larrabee, which has integer vector operations (and an additional scalar one as well).

    FP and INT share a data path (and possibly more hardware for all we know) and cannot coissue. Having something like that thrown back on the shader core would effectively end the decoupled texturing that lead to such efficiencies in earlier GPUs. I'd worry that shader work couldn't progress until texturing was done.


    How did you derive this rate for Larrabee, particularly the Z rate?

    So one or more threads on the core will wait around for the producer to complete, then pick up, or is it multiple working threads, then a context switch to pull in a consumer?
     
  13. Scali

    Regular

    Joined:
    Nov 19, 2003
    Messages:
    2,127
    Likes Received:
    0
    Shared memory? :)
     
  14. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    Sorry, but what makes you think that Fermi is any more or less IEEE than Cypress?
     
  15. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    I'm sure Cypress would take exception to the claim it is isn't fully compliant, if it can.
     
  16. Scali

    Regular

    Joined:
    Nov 19, 2003
    Messages:
    2,127
    Likes Received:
    0
    If G92 vs R670/G200 vs R770 has taught us anything, it's not how expensive it is to make, it's how well it performs.
     
  17. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,610
    Likes Received:
    825
    Probably not exception handling outright, ie. with the hardware fully handling the branching/states/etc. with no overhead when the exceptions don't occur ... but as long as the flags are there they should be able to support exceptions with a performance hit (not a huge one either AFAICS).
     
  18. Tim Murray

    Tim Murray the Windom Earle of mobile SOCs
    Veteran

    Joined:
    May 25, 2003
    Messages:
    3,278
    Likes Received:
    66
    Location:
    Mountain View, CA
    edit: why do I even bother, believe whatever FUD you want to believe
     
  19. SiliconAbyss

    Newcomer

    Joined:
    Mar 28, 2004
    Messages:
    75
    Likes Received:
    0
    Location:
    Canada
    What hardware?
     
  20. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,462
    Location:
    Finland
    Which demo was that? And why was the fluid demo reportedly running on G2xx-hardware if there was real hardware to be used?
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...