GF100 evaluation thread

Whatddya think?

  • Yay! for both

    Votes: 13 6.5%
  • 480 roxxx, 470 is ok-ok

    Votes: 10 5.0%
  • Meh for both

    Votes: 98 49.2%
  • 480's ok, 470 suxx

    Votes: 20 10.1%
  • WTF for both

    Votes: 58 29.1%

  • Total voters
    199
  • Poll closed .
The problem is that intial rumor/info was less than 10k cards at launch, worldwide.
Why would they not be getting a slow but steady shipment of cards out the door?

The 10k cards total before a respin is just a misinterpretation of the ~9-10k wafers, i.e. risk/hotlots.

I think the problem is and was that it was a rumor. After two weeks of selling the cards are not out of stock everywhere. This is not a Cypress and Hemlock disaster.
 
Meh to both - too hot and too power hungry.

NVidia blew it this round - I can't see buying one of these and then worrying about potentially needing a new power supply (mine is 700 watt, but I have an OC'd processor and 5 HDs in my box) AND having more heat and noise.
 
Meh to both - too hot and too power hungry.

NVidia blew it this round - I can't see buying one of these and then worrying about potentially needing a new power supply (mine is 700 watt, but I have an OC'd processor and 5 HDs in my box) AND having more heat and noise.
Well, I'm hoping that improvements to the memory controller combined with the requirements of mid-range and below GPU's will help to get better performance/watt for later products.
 
This was interesting post by jasonelmore on another forum:
I just got my gtx 480 today, supposedly the 2nd or 3rd batch's that etailers are getting. (bought it April 23rd) It's a EVGA model, and the serial number ends 000000314

kind of makes you wonder just how many cards they had at launch.

You can explain that in many ways, but the most obvious one supports low supply theory.
 
I don't see the rumor of 10,000 period until the respin being true. Maybe, initial available was 10K, but total? We'd be hearing about more shortages by now.
 
I don't see the rumor of 10,000 period until the respin being true. Maybe, initial available was 10K, but total? We'd be hearing about more shortages by now.

Again... it was less than 10k cards at launch...
9-10k A3 hotlot/risk wafers(190-320k GPUs total if you believe the 20-30% yields) over the next 8-12months depending on the launch of the respin(Bx) or successor(Fermi II).
 
I wonder what the yield is when you throw the gtx 460 into the mix. The yield might go up greatly.

If they can keep the price between the 5830 and the 5850 (as performace shows tis is where it falls) they might be able to sell alot of chips even if they are takin a loss on each one. It is surely less than just having chips sitting arund.
 
I am the only one thinking that whether the inital supply was in thousand or ten of thousands, the real problem is that if the cards were successful they would still pretty much be sold out and really difficult to find even in US.
 
On the GTX460, I haven't seen anybody mention this and I feel like wasting a few minutes, so here's a point: with the GTX470, they cannot disable any GPC logic but here they could so they need this SKU let alone for that kind of redundancy.

And before someone tells me that GPCs are just marketing and don't really exist, you really really need to learn how to examine a die shot properly and understand the difference between copy-pasted silicon and identical silicon that is resynthesised. In this case, for interesting (but in retrospect, perfectly obvious) reasons the GPC logic is copy-pasted three times and resynthesised once (free cookie to anyone who can figure out why)

Similarly, there are three memory partitions, each serving two 64-bit channels. Some of that silicon must surely be channel-specific (let alone internally to a block!) so they might get a bit of redundancy in the GTX470 (much more importantly, they get 64-bit of redundancy on the analogue/PHY, which is a harder part to get yielding nicely on 40nm AFAIK) but in the GTX460 they get a full partition of redundancy. Once again, no way they can get that with the GTX470.

So the existence of the 460 should not be a surprise and doesn't say as much about about yields as you might expect (which might be very bad anyway, I don't know, I don't care, that's not my point). It's arguably a bit disappointing how coarse-grained redundancy is for some parts of Fermi though. Maybe derivatives will move to finer-grained MCs or there's something I'm missing there, but for GPCs that'll remain very coarse until 28nm where presumably their number will just naturally increase as transistor count goes up.
 
And before someone tells me that GPCs are just marketing and don't really exist, you really really need to learn how to examine a die shot properly and understand the difference between copy-pasted silicon and identical silicon that is resynthesised. In this case, for interesting (but in retrospect, perfectly obvious) reasons the GPC logic is copy-pasted three times and resynthesised once (free cookie to anyone who can figure out why)
Actually, I think you should explain your reason, since you have a habit of saying things about chip-level stuff and not explaining.

It's arguably a bit disappointing how coarse-grained redundancy is for some parts of Fermi though. Maybe derivatives will move to finer-grained MCs or there's something I'm missing there, but for GPCs that'll remain very coarse until 28nm where presumably their number will just naturally increase as transistor count goes up.
HD5830 is quite a kludge, I don't think there's anything particularly surprising about the concept of GTX460. The G92/G94 shenanigans for the last couple of years (is that a GSO or is that a GSO?) are evidence enough.

Of course, all this to make a SKU that marginally betters HD5770, whose chip is < 1/3 the size (166mm²), well...

Will NVidia increase the count of GPCs beyond four or is that the architectural limit?

Jawed
 
On the GTX460, I haven't seen anybody mention this and I feel like wasting a few minutes, so here's a point: with the GTX470, they cannot disable any GPC logic but here they could so they need this SKU let alone for that kind of redundancy.

Well I would think that's an advantage versus other designs with only one geometry pipeline. If that blows the entire chip is dead, not so with Fermi.
 
trinibwoy said:
Well I would think that's an advantage versus other designs with only one geometry pipeline. If that blows the entire chip is dead, not so with Fermi.
Agreed, but keep in mind the size of that single geometry block is also lower than GF100's four blocks, so it's probably not that big a deal in their case.
Jawed said:
Actually, I think you should explain your reason, since you have a habit of saying things about chip-level stuff and not explaining.
Oh come on, you're no fun! :)

Here's a quick picture that should help: http://dl.dropbox.com/u/232602/GPC.jpg
Purple = 3 x Copy-Pasted
Cyan = Resynthesised 4th block
[strike]Dark Blue = Only part of 4th block that was copy-pasted from the three purple blocks[/strike]
EDIT: Jawed is perfectly right, Dark Blue should be part of Cyan. That'll teach me (not much, but still a bit!) to make such images in a hurry when I hadn't even thought of this for a few months.
Yellow = Stuff that seems to also be identical but resynthesised four times, might or might not be part of the GPCs.

The reason why the fourth block is resynthesised is that in the first three blocks, it's actually part of the copy-pasted MC blocks. Because all the 'unique logic' is near that fourth block, routing and power distribution nearly certainly would need to be substantially different for that part of the chip, so copy-pasting wouldn't be possible (or at least not practical). If you look at the SRAM though, it's very clearly identical functionality-wise. Anyway, enough wasted time now...

EDIT: In case that wasn't obvious, I couldn't resist explaining this anyway because I really wanted the free cookie I promised to anyone who'd figure it out ;) Mmmm, cookie. I love biscuits and cookies!
 
I've pointed out part of the region you highlighted as evidence of a 4x replication, sort of at the ends of that rotated H, but not the middle portions.

I had thought the H could be extended just a smidgen, there's a sliver of silicon past the end of each leg that looks like it is very similar at all four points, but then with the higher res it looks different.
 
Here's a quick picture that should help: http://dl.dropbox.com/u/232602/GPC.jpg
Purple = 3 x Copy-Pasted
Cyan = Resynthesised 4th block
Dark Blue = Only part of 4th block that was copy-pasted from the three purple blocks,
Yellow = Stuff that seems to also be identical but resynthesised four times, might or might not be part of the GPCs.
Dark Blue isn't a copy-paste of anything.

The reason why the fourth block is resynthesised is that in the first three blocks, it's actually part of the copy-pasted MC blocks. Because all the 'unique logic' is near that fourth block, routing and power distribution nearly certainly would need to be substantially different for that part of the chip, so copy-pasting wouldn't be possible (or at least not practical). If you look at the SRAM though, it's very clearly identical functionality-wise. Anyway, enough wasted time now...
Architecturally you have these mappings:

control 1 <-> 4 GPC <-> 3 MC-ROP

There's a fairly substantial amount of data running across the GPC/MC-ROP boundary. The "orphaned" GPC, "un-matched by an MC-ROP" (is that a hang-over from when GF100 was supposed to be 512-bit/64-ROP?) raises a question over how the GPC/MC-ROP blocks were originally defined and the resynthesised 4th block.

At the same time it's reasonable to expect NVidia to define the architure to scale across un-matched GPCs versus MCs, i.e. that resythesising is normal, simply because turning off MC-ROPs is normal for NVidia.

The other side of the coin, though, is the nature of the libraries NVidia is using. It seems to me that copy-paste is something of a misnomer in a design that's fundamentally library-based: everything is synthesised - it looks copy-pasted but that's just because there's no reason for the instances to vary.

But I don't know anything about the gory details of library-based implementations.

Jawed
 
On the GTX460, I haven't seen anybody mention this and I feel like wasting a few minutes, so here's a point: with the GTX470, they cannot disable any GPC logic but here they could so they need this SKU let alone for that kind of redundancy.
Actually thinking about this, it isn't surprising then that 5 SMs are disabled and not 4. If there really is a defect in the GPC area (though it isn't all that big), that would otherwise mean all SMs have to be functional, unless the defective ones happen to be in the GPC which is broken. In fact, even with 5 disabled SMs, that still leaves only one "free to choose" SM to disable (of course, if more than one SM in the same GPC would be defective, nvidia could also just disable that GPC too even if GPC logic would be ok). At least I'm assuming the SM to GPC assignment is fixed - speaking of that can any SM be disabled, so some GTX470 could end up with 2 disabled SMs in a GPC and others with 2 disabled in different GPCs? Are there any performance implications if that were to be the case?

Similarly, there are three memory partitions, each serving two 64-bit channels. Some of that silicon must surely be channel-specific (let alone internally to a block!) so they might get a bit of redundancy in the GTX470 (much more importantly, they get 64-bit of redundancy on the analogue/PHY, which is a harder part to get yielding nicely on 40nm AFAIK) but in the GTX460 they get a full partition of redundancy. Once again, no way they can get that with the GTX470.
So you can reroute the MCs (even the PHY) to different pins? Otherwise you'd end up with different PCB depending what MC is broken, which doesn't sound like a good idea...
 
At least I'm assuming the SM to GPC assignment is fixed - speaking of that can any SM be disabled, so some GTX470 could end up with 2 disabled SMs in a GPC and others with 2 disabled in different GPCs? Are there any performance implications if that were to be the case?
The GPU is a state machine -- it will enumerate all the resources with initialization and re-balance the dynamic load across the device, so any salvage configuration should be transparent for the run-time, at least performance wise.
 
The other side of the coin, though, is the nature of the libraries NVidia is using. It seems to me that copy-paste is something of a misnomer in a design that's fundamentally library-based: everything is synthesised - it looks copy-pasted but that's just because there's no reason for the instances to vary.

Um, no. Think of it this way, given the nature of layout everything is copy pasted based of the min dimension rectangle. Sure that could be considered true but its besides the point. When people talk about copy/paste/replication/instantiation they are talking about work saving. if 3 "boxes" (some people use other names, like fubs, macroblocks, etc) are the same or almost the same, it makes sense to make them the same, use mirroring/rotation and do synthesis, P&R, physical validation, etc, once. This significantly reduces the amount of work required by a design team. This is also commonly down at lower levels as well, if you have a couple different arrays all roughly around the same size, it makes sense to make use array and reuse it even if in some cases not all the bits or entries will be used.
 
Back
Top