NVIDIA Fermi: Architecture discussion

Discussion in 'Architecture and Products' started by Rys, Sep 30, 2009.

  1. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    I couldn't be bothered to read that closely, but I suspect what's going on there is that the compression is used to cater for horizontal or vertical register file addressing. i.e. registers can be allocated in either direction, depending on access pattern in instructions. One or the other layout then plays ball with compression.

    Oh and the main instruction issue patent document, since I stumbled upon it:

    http://v3.espacenet.com/publication...7214343A1&KC=A1&FT=D&date=20070913&DB=&locale=

    The title uses "out of order" to refer to hardware threads, rather than intra-thread instruction ordering.

    Compilation can be used to re-order instructions for minimal hazards per issue clock:
    But, regardless, because instruction issue is keyed upon register-dependency:

    intra-thread instructions can issue out of order. With the proviso that the document isn't a 100% guarantee of what's inside a GPU...

    Actually it's possible to read that as merely stating that the hardware-thread ordering isn't necessarily maintained by issuer 506.

    Anyway we agree, whichever way we take it, the fine-grained register-dependency and operand-readiness scoreboarding is relatively costly.

    Jawed
     
  2. PSU-failure

    Newcomer

    Joined:
    May 3, 2007
    Messages:
    249
    Likes Received:
    0
    In fact, it's not totally true.

    Cypress shows a bad scaling compared to Juniper CF (-> AFR), but at the same time playability is more in line with Cypress than CF, pointing out to system limitation.

    Try a frametimes record on some scene, and you'll probably see huge variations on the CF while the single board will only show a big variation. From my numbers on Heaven, Juniper XT managed to render each frame in between 5 to 200ms, that's what I call big... imagine what it will give with AFR since each GPU has to wait for the other one to complete to "finalize" its frame... that will probably give like 0-200ms, so a perceivable stuttering while average framerate is almost doubled.

    Now, if you take the exact same benchmark with one Cypress, you'll end up with 5-100ms render time per frame, so average framerate is almost equal to Juniper CF but stuttering will be much less visible although it's still there.

    A simple 2D example of this situation is a scrolling pattern, if you scroll 1 pixel every 1/60th second it's perfect, if you scroll 2 pixels it's barely acceptable, but if you scroll alternatively 1 and 3 pixels it'll be ugly. Unfortunately, many engines seem to render in a way causing such a pattern, even with only one GPU (perhaps some shaders data are only updated on 50% of the frames or even less), and adding one or more GPU(s) won't bring you anywhere as the slowest rendering frame will always imply a stuttering effect even if scaling "seems" to be perfect.

    Anyway, Cypress driver doesn't seem to be mature enough to conclude, I ran the old X3-Reunion benchmark and just found very disappointing numbers, almost twice slower than my old RV770 on the exact same PC.

    So, while this is quite interesting to investigate, what remains to be seen is how Fermi will manage to handle DX11 rendering compared to Cypress, with a probably (very high chance) slower rasterizer, about half the raw shading power and its unified L1/shared memory. Note that RAM bandwidth doesn't seem to be a bottleneck for Evergreen GPUs btw, doubling it only marginally improves performance, be it on Juniper or Cypress.

    I'd be very happy to find a tool to deactivate SIMD blocks in Cypress and Juniper to investigate further, as that would allow to test a GPU with 10 SIMD blocks and 32 ROP, for example. Talking about SIMD, I'm still not sure if NV will go for 14 clusters for the "360" and 16 for the "380", it could very well be 12/14 with half-year refreshes having 14/16 after more tweaking.
     
  3. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Well the obvious one is triangle setup rate. Making a GPU setup two triangles per clock is a significant architectural change.

    Also, adding a second GPU (CrossFire) doesn't necessarily help frame-rate minima, so it's not much of a win and mostly invalidates such comparisons.

    Jawed
     
  4. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    It occurred to me that it might be because of the sheer number of scoreboard contexts that must be maintained per thread, and that it is possible for an implementation to have scoreboard hardware that is underspecified for the max number of scoreboarded threads with a max number of non-contiguous register hazards.
    Maybe having max threads that have instructions writing to every other register can force a scoreboard stall.

    Yes, it is somewhat ambiguous.
    The later part of claim 77 would indicate that at least for some embodiments, there is a more explicit attempt to make sure intra-thread ordering is respected.
     
  5. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    I think that's to reduce consumption of scoreboard - if intra-thread re-ordering is done then minimisation of that re-ordering will constrain the number of scoreboard entries consumed by the thread.

    Jawed
     
  6. XMAN26

    Banned

    Joined:
    Feb 17, 2003
    Messages:
    702
    Likes Received:
    1
    How about the "Damn that card is freakin fast" factor. Nvidia was pretty quite about G80 before, less not for leaks once they started sending out cards to AIBs 2-4 weeks before launch. We could very well be looking at the same thing all over again. And to date, they have been trumpeting the GPGPU side of Fermi, not the gaming side. Two entirely different uses.
     
  7. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    17,884
    Likes Received:
    5,334
    I agree, if i was nv and fermi is fast as f*** the best thing would be to shut up about it, let the world think it's gonna suck, then wammo shock and awe the world into submission (would create a huge buzz)
     
  8. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,541
    Likes Received:
    964
    Nvidia wasn't being killed by the competition when they released G80...
     
  9. ShaidarHaran

    ShaidarHaran hardware monkey
    Veteran

    Joined:
    Mar 31, 2007
    Messages:
    4,027
    Likes Received:
    90
    I disagree. Every day they don't have a product on the market is money lost. Why not try to prevent potential customers from purchasing a competing product by telling them how awesome your product is? Marketing 101.
     
  10. Bouncing Zabaglione Bros.

    Legend

    Joined:
    Jun 24, 2003
    Messages:
    6,363
    Likes Received:
    83
    Nvidia have already been running spoilers against AMD's DX11 cards. If they could crow about better Fermi performance right now, then that is what they would be doing. For a card they've been claiming will be out in the next month, they must know what they've got. Even if Fermi was three months away, Nvidia should know what they've got - if they are stockpiling for launch. If Nvidia are not stockpiling and don't know performance, then they will be late and/or in severe shortage during their supposed launch period.

    I suspect that with it's problems, Fermi's marketing will concentrate on GPGPU, PhysX, CUDA etc if it doesn't meet performance expectations. The problem is that if Fermi doesn't manage to hit it's performance targets, Nvidia will be forced to sell it more cheaply than they would like against a competitor that already has a die size advantage.

    If AMD drops their price, then Fermi can't just match 5870/5890 performance and still expect to get a higher price - Nvidia will have to drop prices or be substantially faster/better to get a higher price.
     
  11. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY

    They already kinda stated it when they said Fermi is going to have the fastest chip in every segement :grin:. If you think that doesn't mean performance, well I don't know what else to say! To give out concrete numbers, wait till CES its just about a week away.

    And marketing 101, its always better to show it then just crow about something, remember the fake fermi board.... that kinda backfired, showing leaves no doubts.
     
  12. Squilliam

    Squilliam Beyond3d isn't defined yet
    Veteran

    Joined:
    Jan 11, 2008
    Messages:
    3,495
    Likes Received:
    114
    Location:
    New Zealand
    Im really curious, but from your perspective what is the ideal mixture of CPU and GPU hardware to extract ideal performance per $ from AMD hardware? I know this is slightly off topic but also relevant in a grander sense as well.

    So say you've got an HD 5850, then whats the AMD CPU to go with that? Is it the Phenom 945?
     
  13. Squilliam

    Squilliam Beyond3d isn't defined yet
    Veteran

    Joined:
    Jan 11, 2008
    Messages:
    3,495
    Likes Received:
    114
    Location:
    New Zealand
    At the general sense of the performance improvement being unnecessary, I don't believe that can be considered the case. The ideal frame-rate on a mouse driven interface is roughly 60 frames per second to get a truely responsive game. I don't believe we quite have that across the board at the 24" 1920/1080 or 1200 resolution monitors we have. Some people may like to turn the eye candy up which kills frame-rate but I would suspect most here would like their cake too and having 60FPS is the best of both worlds.
     
  14. Bouncing Zabaglione Bros.

    Legend

    Joined:
    Jun 24, 2003
    Messages:
    6,363
    Likes Received:
    83
    Depends how you define "fastest" and "segment". Fermi could be +30 percent faster than 5890 in the high-end, but if it comes in at $5000, it isn't going to sell. It could be +1 percent faster, in which case a price cut from AMD still makes it a hard choice. I might be "fastest" at DP GPGPU calculations or ECC support - something that does not benefit the mainstream. I'm sure Nvidia PR have their weasel words ready.

    But marketing 101 tells us it's better to show... but when you've got nothing to show, you crow about what you are going to show when you've got it. You don't just let the competition eat your lunch when they've got a newer, better product selling hand over fist, and you're scrambling around trying to get your late one out the door.

    Unless of course, you can't show, and you can't say anything good about your upcoming product - in which case you are right, it's better to keep quiet on the specifics and hope you can BS your way through the launch of an under-performing product with the use of DP, HPC, PhysX, etc.
     
  15. XMAN26

    Banned

    Joined:
    Feb 17, 2003
    Messages:
    702
    Likes Received:
    1
    I dont see them as being killed now. Sure they dont have the fastest thing out, But the performance lead isn't that great single gpu vs single gpu.
     
  16. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY

    That wasn't PR that stated that, it was a product manager :roll:. And its a very fast card, you wouldn't see them stating that if it wasn't, they would go through the route of value for the money or something else. Thats marketing and PR for ya. There is a fine line between a lie and spin. If that statement is false, they are just lieing.

    Yes so you can't wait a week and half?

    Oh they were talking in the context of gaming :wink:
     
  17. spigzone

    Banned

    Joined:
    Dec 26, 2009
    Messages:
    45
    Likes Received:
    0
    Location:
    North Dakota
    Added to that AMD is soon releasing a dozen Cedar and Redwood boards, which will rapidly take over and own the value segment of the market for the forseeable future. The top end, small as it is, is the only segment Nvidia will even have a card to compete in and without a compelling reason (read - substantial performance advantage) to do otherwise, by march another several hundred thousand of those potential (and highest margin) customers will have spurned Fermi and turned to AMD for their GPU fix.
     
  18. Vincent

    Newcomer

    Joined:
    May 28, 2007
    Messages:
    235
    Likes Received:
    0
    Location:
    London
    Has anyone heard of GF104( Derivative from Fermi ) ?

    256SPs +256bit GDDR5

    Btw, Happy New Year.
     
  19. Kowan

    Newcomer

    Joined:
    Sep 6, 2007
    Messages:
    136
    Likes Received:
    0
    Location:
    California
    That part cracked me up. :lol:
    I'm curious how much information will be shown at CES and if any hands on displays will there.
     
  20. spigzone

    Banned

    Joined:
    Dec 26, 2009
    Messages:
    45
    Likes Received:
    0
    Location:
    North Dakota
    Except they kinda DIDN'T say Fermi IS going to be the fastest chip in every segment, they said they EXPECT it to be the fastest chip in every segment.

    There's a world of difference in that word choice.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...