NVIDIA Fermi: Architecture discussion

there's no indication that C2050 and C2070 have actual different gigaflops ratings.

So nVidia is charging $1500 for 3GB of memory?

Edit: the product guide pdf does NOT say "FLoating Performance Range" but separates Peak values with a dash. According to that Doc, Win7 is also not supported.
 
Last edited by a moderator:
Back on topic- Are we still expecting A3 to be final silicon? I would hope so and it seems like either way Nvidia will have to launch something with it since they stated a Q1 launch on facebook/twitter.

Doesn't their Q1 end at the end of April?

***edit:

http://www.facebook.com/NVIDIA?ref=nf

NVIDIA Happy Holidays GeForce fans! Fun fact: GF100 supports a brand new 32x anti-aliasing mode for ultra high-quality gaming!

8xMSAA+24xCSAA? If my speculation should be on track, 8xMSAA performance needs to be quite a bit higher for it to make sense.
http://www.facebook.com/NVIDIA?ref=mf
 
So you didn't say this?

Yes, you are correct, you didn't say anything about expecting it... you "speculated" it.
Thanks for pulling a technicality out of your...

That's right. Expect != Speculate

LordEC911 said:
Back on topic- Are we still expecting A3 to be final silicon? I would hope so and it seems like either way Nvidia will have to launch something with it since they stated a Q1 launch on facebook/twitter.

What leads you to question if A3 will be the final silicon ?
 
Based on this document, http://www.nvidia.com/docs/IO/43395/BD-04983-001_v01.pdf , we know that the processor "core" clocks for C2050 and C2070 are 1.25GHz and 1.40GHz, respectively.

That given, there are two possibilities:

It seems to me that rjc's 416 ALUs for Tesla C2050 seems to be spot on, given the specs.

Doubtful that NVIDIA made that kind of mistake on a document that is viewed by potential customers.

jimmyjames123 said:
With respect to NVIDIA's emphasis more on clocks, Rys appears to have been referring mainly to GeForce, not Tesla.

He actually said it in so many words :)

I do have doubts though, that Rys's 1700 Mhz speculation for the ALUs is correct. I'm still thinking of 1500 Mhz. 1700 Mhz seems more likely for a "refresh" product.
 
Yep, but if A3 yields are poor at 1400Mhz, what are the odds that we see Rys' 1700Mhz prediction on Geforce parts?
1400MHz yields might only be poor for the voltage/power level that NVidia is targetting for Tesla? Geforce has room to breathe in this respect, particularly if the board only has 1.5GB of memory.

Jawed
 
1400MHz yields might only be poor for the voltage/power level that NVidia is targetting for Tesla? Geforce has room to breathe in this respect, particularly if the board only has 1.5GB of memory.

Jawed

Poor for 225W? Then how much breathing room do GeForces have? Do you expect the GeForce GTX 380 —or whatever it's called— to have an even higher TDP?
 
So that 520-630GFLOPS would be just an estimate and not a product range...

at least that was an estimate in the beginning.
but now I won't put my hand to the fire on that.


Doesn't their Q1 end at the end of April?

***edit:

http://www.facebook.com/NVIDIA?ref=nf

8xMSAA+24xCSAA? If my speculation should be on track, 8xMSAA performance needs to be quite a bit higher for it to make sense.
http://www.facebook.com/NVIDIA?ref=mf

I hope there's sparse grid supersampling (I wonder if DX10.1 or DX11 compliance implies it by the way).

I'm an AA nut yet I can't see much the difference between 4x MSAA and the MSAA + CSAA modes. but I sure wish to see supersampling, pure or mixed. Maybe that's a new, better 32xS mode ;)
 
Would sparse supersampling reduce the blur effect of their current ordered grid approach? It looks pretty but some detail is lost.
 
Poor for 225W? Then how much breathing room do GeForces have? Do you expect the GeForce GTX 380 —or whatever it's called— to have an even higher TDP?
The TDP for a Tesla with 6GB of memory has to be pretty strictly adhered to. The wishy-washy TDP of a Geforce with only 1.5GB of memory will be much less of a constraint. 300W is an option for a consumer card, isn't it?

So I expect higher clocks for Geforce, though there may well be a only benchmark edition with the full ALU count and top clocks. Whether they're high enough for the halo effect is a whole different question.

Jawed
 
Would sparse supersampling reduce the blur effect of their current ordered grid approach? It looks pretty but some detail is lost.

there's still blur but with the sparse sampling it's more worth it.
most noticeable on small text, I lived with it when playing the old counterstrike but you probably wouldn't want it in some RPG, RTS, simulation etc. games.

LOD adjustment gives you sharper textures, so you get to both gain and lose detail.
you would be able to selectively enable it with nHancer (that program is the number one reason I stick to nvidia)
 
I take it you're absolutely devastated by the bandwidth/flop ratio in Cypress? :smile: But is there a reason why you are comparing Tesla to Geforce? The GT200 based Tesla parts had 800Mhz GDDR3.

No, but I am a little down about it. I do understand why it isn't a big deal, I asked at the Evergreen launch (two days before actually), and the memory controller designers told me why it wasn't a problem. I agree with what they said.

In Cypress, memory bandwidth is the most easily reached bottleneck, but for a normal to slightly overclocked card, it shouldn't be a problem.

-Charlie
 
will OpenCL support Fermi features? (pointers, memory hierarchy and what not)

for HPC I believe OpenCL doesn't matter much, any code will be custom and OpenCL code would need to be rewritten anyway if ran on another architecture. we don't hear of Radeons in 1U racks or ECC memory either.

I believe it will matter more in the consumer space, especially with AMD Fusion. Maybe we'll even see code running on the sandy bridge IGP, even if reluctantly :), and on a future Fermi-based nvidia Tegra.

OpenCL will be a good starting point to flesh out an HPC program with, then customize it from there.

-Charlie
 
Yeah it's interesting that they can get to 416 @ 1.25Ghz but 448 @ 1.40Ghz is pushing too hard.

Edit: Thinking about it for an hour, if the bin worked out like suggested they probably should find ways to increase demand of C2050(ie reduce the price) and lower demand on the C2070(ie raise the price), would be much easier than going back and trying to fight the chips natural binning. Am thinking the Fermi based Telsa business will be slow growing at first anyway, lots of hand holding and other incentives needed, better to get the chips out now to get developers comfort level up so they actually write software and create some demand for the product. It doesn't really matter that performance wasn't quite what was promised, the programming model is still the same, got to sell that now. Is very similar to when GT200 was first introduced - had real trouble getting enough of the top bin(GTX280) part to begin with.

You don't think a $1500 price difference between the two is designed to do just that? :)

-Charlie
 
Following up my own post, i was just trying to confirm the shader clock on the C2050. There is the Board Document here:


and the Product Brief:


For the C2070 630Gflops / 448 = 1.40Ghz which is fine.

But for the C2050 520GFlops / 448 = 1.16Ghz
or 520GFlops / 1.25Ghz = 416 shaders

So the C2050 has an extra unit disabled? or do i need to go back to elementary school to do the divide and multiply thing again?

If they have to disable another unit for the C2050, that means one of two things. Either defects are massive, or power is barely making the 225W cut/leakage is bad bad bad. Given they were trying to make a 1500MHz shader part (I got those numbers last spring when I was first briefed on the chip), the fact that they are having trouble binning more than 12-1300 is pretty telling.

-Charlie
 
Back on topic- Are we still expecting A3 to be final silicon? I would hope so and it seems like either way Nvidia will have to launch something with it since they stated a Q1 launch on facebook/twitter.

They also said it would launch in 2009, but hey, I never believed that, especially since they were telling people March at the same conference if they signed the right NDAs. :)

I think they have to launch with A3s, mainly because the problems they need to fix, power, clocks and memory controller issues, are likely to require a full respin, not just metal layer. If they need to do that, it won't be Q1, most likely won't be Q2, and then you have to wonder if they will bother at all?

I am not sure Fermi would make a dandy Windows 8 part.

-Charlie
 
Doesn't their Q1 end at the end of April?
I don't think they meant financial Q1, so by the end of March is how I took it.

That's right. Expect != Speculate


What leads you to question if A3 will be the final silicon ?
Correct, but they have similar connotations...
Due to the info coming out about bad bins due to leakage, low clocks, disabled parts, high power consumption. If things were really as bad as it seems, I was wondering if they would try A4 though it was answered below.

They also said it would launch in 2009, but hey, I never believed that, especially since they were telling people March at the same conference if they signed the right NDAs. :)

I think they have to launch with A3s, mainly because the problems they need to fix, power, clocks and memory controller issues, are likely to require a full respin, not just metal layer. If they need to do that, it won't be Q1, most likely won't be Q2, and then you have to wonder if they will bother at all?

I am not sure Fermi would make a dandy Windows 8 part.

-Charlie

Good point. So another G200/G200b cycle.
Get GF100 parts out and work on getting GF100b, hopefully doesn't need 2 respins, out as fast as possible which best case is Q3/Q4 '10.
Would 28nm even be a possibility at TSMC in that timeframe? I know GF is expected to have limited 28nm production before the end of '10 but haven't really heard much from TSMC on the same subject.
 
Last edited by a moderator:
Back
Top