NVIDIA Fermi: Architecture discussion

air_ii · Dec 23, 2009

Silus said:
That was precisely the point I made in the post you quoted me from. CUDA will always be "under the hood" if you will. It will be transparent to any developer that wants to use any other abstraction layer, since translation will exist in drivers.

And my point was that no one cares about that CUDA, as opposed to a dev platform/tools. If that went away (and that's a big if) and people moved over to OCL or whatever other developer solution, I think CUDA would cease to exist as a marketing term.

Razor1 · Dec 23, 2009

air_ii said:
And my point was that no one cares about that CUDA, as opposed to a dev platform/tools. If that went away (and that's a big if) and people moved over to OCL or whatever other developer solution, I think CUDA would cease to exist as a marketing term.

CUDA isn't a direct competitor to Ocl, or Direct compute, there are parts that over lap yes, but CUDA is much more flexible.

ShaidarHaran · Dec 23, 2009

Razor1 said:
CUDA isn't a direct competitor to Ocl, or Direct compute, there are parts that over lap yes, but CUDA is much more flexible.

I would say the proprietary nature of CUDA makes it less flexible than open source alternatives, by definition.

Razor1 · Dec 23, 2009

ShaidarHaran said:
I would say the proprietary nature of CUDA makes it less flexible than open source alternatives, by definition.

I'm talking about programability, compute shaders access is only a part of what CUDA is about.

compres · Dec 23, 2009

Razor1 said:
I'm talking about programability, compute shaders access is only a part of what CUDA is about.

And am saying why program for glide when you can program for opengl.

mczak · Dec 23, 2009

Silus said:
No cut down version. My speculation involved a new chip, with half verything that the full Fermi chip allows. But then evolved to something like: 256 ALUs, 32 ROPs and 80 TMUs.
Pretty much a Fermi based version of the GTX 285 with more ALUs and on a 256 bit memory interface with GDDR5.

Small nitpick, despite we don't know much about Fermi I think it's pretty safe to say it will have 8 tmus per shader cluster, hence 128 tmus for the 512 alu version and 64 units for your proposed version with 256 ALUs. Unless you assume tmus are fully decoupled from the shader clusters...

Razor1 · Dec 24, 2009

compres said:
And am saying why program for glide when you can program for opengl.

because when you have flexibility of programmability in different languages, it opens up to different programmers and comfortability in different industries and increases productivity when porting.

Novum · Dec 24, 2009

compres said:
And am saying why program for glide when you can program for opengl.

Because OpenGL lacks a very comfortable IDE like Nexus

Silus · Dec 24, 2009

mczak said:
Small nitpick, despite we don't know much about Fermi I think it's pretty safe to say it will have 8 tmus per shader cluster, hence 128 tmus for the 512 alu version and 64 units for your proposed version with 256 ALUs. Unless you assume tmus are fully decoupled from the shader clusters...

No, you're right. When I first speculated about it a few pages back, I actually mentioned that this chip would be exactly half everything that the full Fermi chip is. But when I wanted to do a parallel with this chip and the GTX 285 (since it would basically take its place and also because someone else said that a 64 TMU, 24 ROP chip wouldn't have a chance against the HD 5850) I didn't do the math properly and just used the 285's TMU and ROP count as a base. So thanks for the correction.

And so for the actual 80 TMU chip based on Fermi, 10 TPCs (320 ALUs) would do the trick.

rjc · Dec 24, 2009

From chiphell:

Post #66 cfcnc:
顺路发个消息，TSMC的圣诞礼物- Fermi A3已经顺利出样
Conveniently to issue this news, TSMC to give Christmas present -Fermi A3 already smoothly to come out shape(?)

Post #69 tomsmith in reply:
有个不太好的消息，符合2070 频率目标的比例还不理想，做成2050 的稍微多一点
To have this not so good news, in accordance with the 2070 frequency goal proportion still not ideal, to turn into 2050 a little bit too many.

Quick analysis: A3 out in time for christmas, plenty chips of C2050 ability(~1200Ghz shaders), not enough of C2070 quality available(~1400Mhz shaders).

mapel110 · Dec 24, 2009

Okay, now lets be skeptical. Performance will be not as good as expected, when this is true. Hope, nvidia will have enough new features so fps doesn't matter that much.

rjc · Dec 24, 2009

rjc said:
Quick analysis: A3 out in time for christmas, plenty chips of C2050 ability(~1200Ghz shaders), not enough of C2070 quality available(~1400Mhz shaders).

Following up my own post, i was just trying to confirm the shader clock on the C2050. There is the Board Document here:

Number of processor cores: 448
Processor core clock: 1.25 GHz to 1.40 GHz

and the Product Brief:

Double Precision floating point performance (peak): 520GFlops - 630 GFlops

For the C2070 630Gflops / 448 = 1.40Ghz which is fine.

But for the C2050 520GFlops / 448 = 1.16Ghz
or 520GFlops / 1.25Ghz = 416 shaders

So the C2050 has an extra unit disabled? or do i need to go back to elementary school to do the divide and multiply thing again?

trinibwoy · Dec 24, 2009

Nope I was trying to figure it out earlier too. The 448 @ 1.25Ghz doesn't vibe with 520Gflops so it does look like C2050 is a 416 shader part. That's looking pretty rough. It seems Rys was mistaken (or misled)

Rys said:
Let's just say that we'd urge more focus on our clocks, at the very least for GeForce products.

rjc · Dec 24, 2009

trinibwoy said:
Nope I was trying to figure it out earlier too. The 448 @ 1.25Ghz doesn't vibe with 520Gflops so it does look like C2050 is a 416 shader part. That's looking pretty rough. It seems Rys was mistaken (or misled)

Yeah it's interesting that they can get to 416 @ 1.25Ghz but 448 @ 1.40Ghz is pushing too hard.

Edit: Thinking about it for an hour, if the bin worked out like suggested they probably should find ways to increase demand of C2050(ie reduce the price) and lower demand on the C2070(ie raise the price), would be much easier than going back and trying to fight the chips natural binning. Am thinking the Fermi based Telsa business will be slow growing at first anyway, lots of hand holding and other incentives needed, better to get the chips out now to get developers comfort level up so they actually write software and create some demand for the product. It doesn't really matter that performance wasn't quite what was promised, the programming model is still the same, got to sell that now. Is very similar to when GT200 was first introduced - had real trouble getting enough of the top bin(GTX280) part to begin with.

Originally Posted by Rys
Let's just say that we'd urge more focus on our clocks, at the very least for GeForce products.

Good quote! I think i might have had way too much christmas celebrations - i completely lost the 'l' character from clocks in the above

Razor1 · Dec 24, 2009

trinibwoy said:
Nope I was trying to figure it out earlier too. The 448 @ 1.25Ghz doesn't vibe with 520Gflops so it does look like C2050 is a 416 shader part. That's looking pretty rough. It seems Rys was mistaken (or misled)

Think you are reading too much into it

or I might be miss understanding where you are coming from

. From my point of view, they hit those flop range for Tesla

Edit Geforce is quite a bit different though.

jimmyjames123 · Dec 24, 2009

trinibwoy said:
Nope I was trying to figure it out earlier too. The 448 @ 1.25Ghz doesn't vibe with 520Gflops so it does look like C2050 is a 416 shader part.

Based on this document, http://www.nvidia.com/docs/IO/43395/BD-04983-001_v01.pdf , we know that the processor "core" clocks for C2050 and C2070 are 1.25GHz and 1.40GHz, respectively.

That given, there are two possibilities:

1) C2050 has 448 processor "cores" with 32 disabled, and C2070 has 448 processor "cores" with none disabled,

or

2) NVIDIA made a mistake on their website spec here http://www.nvidia.com/object/product_tesla_C2050_C2070_us.html, which should read 560 GFlops - 630 GFlops, instead of 520 GFlops - 630 GFlops.

ChrisRay, care to comment on this discrepancy?

That's looking pretty rough. It seems Rys was mistaken (or misled)

With respect to NVIDIA's emphasis more on clocks, Rys appears to have been referring mainly to GeForce, not Tesla.

LordEC911 · Dec 24, 2009

Silus said:
I did ? You're putting words in my posts now ?

No, I did not say that. I did however speculate that it might be the case, given how late Fermi is and the fact that the mid-range market is where the big profits are.

So you didn't say this?

Silus said:
And if you've read the last few pages of this thread, you would see that I've speculated myself, that NVIDIA may very well release a GeForce 340/350 with the release of the GeForce 380, instead of the usual GeForce 380/360, to have something in the most profitable section of the consumer graphics market: the mid-range.

Yes, you are correct, you didn't say anything about expecting it... you "speculated" it.
Thanks for pulling a technicality out of your...

Edit- Since english may not be your first language;
expect-
1. to look forward to; regard as likely to happen; anticipate the occurrence or the coming of: I expect to read it. I expect him later. She expects that they will come.
2. to look for with reason or justification: We expect obedience.
3. Informal. to suppose or surmise; guess: I expect that you are tired from the trip.

speculate-
1. To meditate on a subject; reflect.
2. To engage in a course of reasoning often based on inconclusive evidence. See Synonyms at think.
3. To engage in the buying or selling of a commodity with an element of risk on the chance of profit.

-To assume to be true without conclusive evidence

Back on topic- Are we still expecting A3 to be final silicon? I would hope so and it seems like either way Nvidia will have to launch something with it since they stated a Q1 launch on facebook/twitter.

Blazkowicz · Dec 24, 2009

there's no indication that C2050 and C2070 have actual different gigaflops ratings.

Scrat · Dec 24, 2009

Blazkowicz said:
there's no indication that C2050 and C2070 have actual different gigaflops ratings.

What's the point in two products with different names and identical specs? :???:

Blazkowicz · Dec 24, 2009

from nvidia's marketing back then : "big data sets" is what you use the 2070 for.

NVIDIA Fermi: Architecture discussion

air_ii

Razor1

ShaidarHaran

hardware monkey

Razor1

compres

mczak

Razor1

Novum

Silus

rjc

mapel110

rjc

trinibwoy

Meh

rjc

Razor1

jimmyjames123

LordEC911

Blazkowicz

Scrat

Blazkowicz

Similar threads